Modeling Atmospheric Ammonia using Agricultural Emissions with Improved Spatial Variability and Temporal Dynamics

Ammonia emissions to the atmosphere have increased substantially in Europe since 1960, primarily due to the intensification of agriculture as illustrated by enhanced livestock and the use of fertilizers. These associated emissions of 10 reactive nitrogen, particulate matter and acid deposition have contributed to negative societal impacts on human health and terrestrial ecosystems. Due to the limited availability of reliable measurements, emission inventories are used to assess largescale ammonia emissions from agriculture, creating gridded annual emission maps and emission time profiles both globally and regionally. The modeled emissions are subsequently used in chemistry transport models to obtain ammonia concentrations and depositions. However, current emission inventories usually have relatively low spatial resolutions and coarse 15 categorizations that do not distinguish between fertilization on various crops, grazing, animal housing, and manure storage in its spatial allocation. Furthermore, in assessing the seasonal variation of ammonia emissions, they do not take into account local climatology and agricultural management, which limits the capability to reproduce observed spatial and seasonal variations in the ammonia concentrations. 20 This paper describes a novel ammonia emission model that quantifies agricultural emissions with improved spatial details and temporal dynamics in 2010, in Germany and Benelux. The spatial allocation was achieved by embedding the agricultural emission model Integrated Nitrogen Tool across Europe for Greenhouse gases and Ammonia Targeted to Operational Responses (INTEGRATOR) into the air pollution inventory Monitoring Atmospheric Composition and Climate -III (MACCIII), thus accounting for differentiation in ammonia emissions from manure and fertilizer application, grazing, animal houses 25 and manure storage systems. The more detailed temporal distribution came from the integration of TIMELINES which provided predictions of the timing of key agricultural operations, including the day of fertilization across Europe. The emission maps and time profiles were imported into LOTOS-EUROS to obtain surface concentrations and total columns for validation. The comparison of surface concentration between modeled output and in situ measurements illustrated that the updated model had been improved significantly with respect to the temporal variation of ammonia emission, and its performance was more 30 stable and robust. The comparison of total columns between remote sensing observations and model simulations showed that some spatial characteristics were smoothened, and there was an overestimation in Southern Germany and underestimation in

its spatial allocation. Furthermore, in assessing the seasonal variation of ammonia emissions, they do not take into account local climatology and agricultural management, which limits the capability to reproduce observed spatial and seasonal variations in the ammonia concentrations. 20 This paper describes a novel ammonia emission model that quantifies agricultural emissions with improved spatial details and temporal dynamics in 2010, in Germany and Benelux. The spatial allocation was achieved by embedding the agricultural emission model Integrated Nitrogen Tool across Europe for Greenhouse gases and Ammonia Targeted to Operational Responses (INTEGRATOR) into the air pollution inventory Monitoring Atmospheric Composition and Climate -III (MACC-III), thus accounting for differentiation in ammonia emissions from manure and fertilizer application, grazing, animal houses 25 and manure storage systems. The more detailed temporal distribution came from the integration of TIMELINES which provided predictions of the timing of key agricultural operations, including the day of fertilization across Europe. The emission maps and time profiles were imported into LOTOS-EUROS to obtain surface concentrations and total columns for validation.
The comparison of surface concentration between modeled output and in situ measurements illustrated that the updated model had been improved significantly with respect to the temporal variation of ammonia emission, and its performance was more 30 stable and robust. The comparison of total columns between remote sensing observations and model simulations showed that some spatial characteristics were smoothened, and there was an overestimation in Southern Germany and underestimation in Northern Germany. The results suggested that updating ammonia emission fractions and accounting for manure transport are the direction for further improvement, and detailed land use is needed to increase the spatial resolution of spatial allocation in ammonia emission modeling.

Introduction
Ammonia ( 3 ) emission to the atmosphere has risen substantially on a global scale during the twentieth century following 5 the demand for food of a rapidly growing population (Erisman et al., 2008). Increases are especially large in areas with intense agricultural activities, such as Europe, the US and China. The annual European Union emission inventory report  shows that even though 3 emission of EU-28 countries fell by 23% between 1990 and 2015, Germany, Spain, Sweden and the EU as a whole exceeded their 3 emission ceilings in 2015 (EEA, 2017). The main source of 3 emission is agriculture, contributing to more than 90% of the total emissions in EU-28 (Monteny and Hartung, 2007).
3 from agriculture is emitted 10 to the atmosphere during the application of manure and inorganic mineral fertilizers, as well as from animal houses and manure storage systems (Velthof et al., 2012). Meanwhile, emission from traffic and road transport occupies less than 2% (EEA, 2017).
Additional minor sources include food processing, biomass burning and fossil fuel combustion, making up about 4% of the 3 emissions (Erisman et al., 2008;Galloway et al., 2003;Krupa, 2003). 3 concentrations are highly variable in space and time because of its short atmospheric residence time as it is effectively 15 removed by dry and wet deposition several hours after emission (Fangmeier et al., 1994). In addition, 3 reacts with sulfuric ( 2 4 ) and nitric acid ( 3 ) in the atmosphere, leading to the transformation from 3 to fine ammonium salts (( 4 ) 2 4 , • Local agricultural practices o Type and amount of manure and inorganic fertilizer applied to land o Method of manure and fertilizer application o Animal type, housing type, manure storage type • Meteorological conditions (air temperature, wind speed, humidity, precipitation) 5 • Soil conditions (soil temperature, texture) • Regulation of agricultural practice Several emission inventories have been developed to improve the spatial details of 3 emission in different countries. Hutchings et al. (2001) introduced a nitrogen flow approach to model annually averaged 3 emission for Denmark, taking into account animal types or different amount of fertilizers applied on various regions. In their study, 3 emissions are 10 calculated as a percentage of the total N in manure, which means that the model will be valid as long as the chemical and physical characteristics of the manure remain the same. It also indicates that the model can be easily adapted as long as the only parameters that change are the number of animals or their distribution between the manure handling systems. Similar methodology has been adopted by Gac et al. (2007) in France, Webb and Misselbrook (2004) in the UK and Hyde et al. (2003) in Ireland. In the air pollution model Monitoring Atmospheric Composition and Climate -III (MACC-III), emission factors 15 and proxy maps are utilized to obtain the spatial distribution of annual emissions from emission totals officially reported by countries (Kuenen et al., 2014;Velthof et al., 2012).
Subsequently, temporal distribution profiles are used to obtain temporally resolved emissions. Skjøth et al. (2004) demonstrated an implementation of a simplified version of the dynamic parameterization for Denmark in the model ACDEP, by correlating temperature with emission functions for 15 agricultural subsectors. The method takes into account physical 20 processes like volatilization and agricultural production activities such as the timing of fertilization. Significant improvements have been achieved compared to the results obtained with simplified time profiles of agricultural emissions. Based on the work of Skjøth et al. (2004), Gyldenkaerne et al. (2005) improved the parameterization by including the effect of ventilation rates inside buildings, ambient wind speeds and a more realistic description of temperatures inside animal houses.
Current emission inventories used in European chemistry transport models (CTMs) usually distinguish sectors defined by 25 EMEP SNAP Level 1 Category which has a single sector for agriculture. They do not indicate crop types and fertilizer types that are important for the interpretation of the results and future applications such as policymaking. Furthermore, in most European regional scale CTMs, such as LOTOS-EUROS (Hendriks et al., 2016;Schaap et al., 2008), the accompanying time profiles that allocate gridded emission in time are mostly generated by simplified and static seasonal functions, without taking into account local climatology and agricultural practices. However, it is a challenge to improve this situation for European 30 scale applications as 3 emission modeling requires detailed information on land use, number of different livestock, and the spatial distribution of farmhouses and storages (Gyldenkaerne et al., 2005;Skjøth et al., 2004).
In view of the above shortcomings, we developed a novel 3 emission model that quantifies agricultural emissions with better spatial details and gives insight into the temporal dynamics. Integrated Nitrogen Tool across Europe for Greenhouse gases and Ammonia Targeted to Operational Responses (INTEGRATOR) assesses greenhouse gases and nitrogen fluxes from agricultural sectors at high spatial resolution and accounts for differences in crop types, fertilizer types, animal housing and manure storage (Kros et al., 2018;De Vries et al., 2011). The improvement of the spatial emission allocation was realized by 5 embedding the results of the INTEGRATOR model into the MACC-III emission inventory. The more detailed temporal distribution came from the emission functions in the work of Gyldenkaerne et al. (2005) and Skjøth et al. (2004) with the integration of the TIMELINES model. TIMELINES provides predictions of the timing of key agricultural operations across Europe (Hutchings et al., 2012). These new emission products were then used in LOTOS-EUROS for validation by comparing modeled outputs with measurements. In this work, the improvements in 3 emission estimates were made for Germany and 10 Benelux in the year of 2010 as a first test case.
In this paper, we first describe the methodology of 1) the new emission model which generates spatially and temporally resolved emission products; 2) the chemistry transport model LOTOS-EUROS that translates emission into concentrations and total columns; 3) data processing of the available measurements. Then we assess the model by comparing the simulated total columns and surface concentrations with remote sensing and ground-based observations, respectively. Finally, we evaluate the 15 model performance in terms of improvements and shortcomings of the modeled results for future perspectives for this work.

Methodology and Data
A schematic overview of the methodology and workflow is presented in Figure 1. The new emission model is composed of two parts, a spatial allocator which produces gridded maps of 3 annual emissions from various categories and a temporal allocator that disaggregates the annual emission within a grid cell over a year, creating emission distributions in space and 20 time. The spatial allocator integrates the detailed agricultural emission information from INTEGRATOR into MACC-III. The temporal allocator, with the help of the agricultural management model TIMELINES, characterizes the temporal variation as hourly time series according to land use, agricultural practice and climate. The emission estimates were then imported into the CTM LOTOS-EUROS to derive 3 concentrations which were subsequently compared with Infrared Atmospheric Sounding Interferometer (IASI) observations on 3 total columns and in situ measurements of surface concentrations for verification. 25 Normalized root mean square error (NRMSE), normalized mean absolute error (NMAE), model efficiency (EF) and index of agreement between modeled output and measurements were calculated to determine the performance of the models (Appendix A).

Model Parameters
In this study, the spatial domain of the area of interest was set to be 2°-16° in longitude with a step of 0.125° and 47°-5 55° in latitude with a step of 0.0625°, which corresponds to a spatial resolution of approximately 7km × 7km. Two model runs were conducted to identify the influence brought by the new method. In the first simulation, the original MACC-III annual emission distribution and LOTOS-EUROS time profiles were used. The second model run utilized the improved spatial distribution and the dynamic time profiles obtained with the updated model. It has to be noted that a European scale run was conducted priorly to ensure the same boundary conditions for both model runs. 10

The MACC-III inventory
MACC-III is a spatially explicit emission inventory with a resolution of 0.125° × 0.0625° longitude-latitude (approximately 7km × 7km), providing Europe-wide annual emission inputs for , 2 , , 4 , 3 , , 10 and 2.5 for air quality models (Kuenen et al., 2014). The inventory is based on national emission total per sector officially reported by the 15 countries themselves. In case emission data for a sector/country are unavailable for a particular year, estimates from GAINS are used to make sure that the emission inventory is complete and applicable for every country in Europe (Kuenen et al., 2011).
Emission totals are spatially disaggregated across the countries in the form of point or area sources, using point source locations and proxy maps (e.g., population density, traffic intensity), respectively (Kuenen et al., 2014). MACC-III provides the spatial distribution of annual 3 emissions from agriculture and non-agricultural sectors including traffic and industry. However, 20 due to the top-down nature of the inventory, it does not distinguish agricultural 3 emission sources between animal housing, manure storage, and fertilization on crop lands. Instead, it differentiates emissions by animal types, which includes the application and storage of certain animal manure, housing of this animal.
The aim is to improve the inventory towards a more detailed categorization to provide more in-depth information on the impact of various agricultural activities on emission. Besides, the available information in the inventory does not fulfill the requirements of the TIMELINES model for temporal allocation. The disadvantages are the reason why we introduced the INTEGRATOR model in this study.

The INTEGRATOR Model 5
The INTEGRATOR model is a static N cycling model, and an adapted, more detailed version of the former MITERRA-Europe model (Velthof et al., 2009). It calculates land system budgets at EU-27 level, including N uptake, N emissions (in the forms of 3 , 2 , and 2 ) from housing and manure storage systems, N accumulation in or release from the soil (due to manure and mineral fertilizer application) and N losses by leaching and runoff (De Vries et al., 2011). The emissions of 3 and other gases ( 2 , and 2 ) to the atmosphere are estimated by multiplying N inputs with emission factors (De Vries 10 et al., 2011). In this study, we focus on the modules of the model that estimate 3 emissions from animal housing and manure storage systems as well as the application of manure and mineral fertilizer to arable land and grassland.
Unlike the MACC-III inventory which provides emission distributions on longitude-latitude grids in the reference system World Geodetic System 1984 (WGS84), INTEGRATOR estimates emissions in NitroEurope Classification Units (NCUs).
These NCUs are multi-part polygons composed of several 1 km × 1 km grid cells in ETRS89/LAEA Europe coordinate 15 system. The polygons sharing one NCU number have the same administrative unit (NUTS2 or NUTS3), soil type (SGDB classification), similar slopes (CCM DEM 250 in five classes) and altitude (with differences less than 200m) (De Vries et al., 2011). Therefore, the area of one NCU varies from several square kilometers (mostly in Western and Southern Europe) to hundreds of square kilometers (in Northern Europe).

and
, represent emission fractions of grazing, animal housing, manure storage, manure application, and fertilizer application, respectively.

5
A schematic overview of the 3 emission module of the INTEGRATOR model is presented in Figure 2. The emission model starts with the calculation of N excretion by multiplying the number of animals at NCU level with N excretion rate per animal per country for eight animal categories (dairy cows, other cows, pigs, laying hens, other poultry, horses, sheep and goats, and fur animals) (Kros et al., 2012). The livestock data were obtained from the FAO database at country level, using CAPRI data for distribution at NUTS 2 level. The data on livestock numbers of various animal categories at NUTS2 level were downscaled 10 to a 1km × 1km resolution using expert-based judgment with spatial data sources on land use, slope, altitude and soil characteristics influencing the livestock carrying (Neumann et al., 2009). A major distinction was made between grazing animals and other animals. Dairy cows, other cattle, and sheep and goats were assumed to be highly dependent on local land resources for grazing or feed production. Pigs and poultry were assumed to be held in more land independent systems. For more detailed information on the downscaling of livestock, we refer to Neumann et al. (2009). The N excreted in housing 15 systems is the multiplication of N manure excretion and the housing fraction ( ℎ in Figure 2), while the N excreted from grazing on land is obtained by subtracting N excreted in housing systems from total N manure excretion. The total manure production is derived by subtracting gaseous emissions and leaching in housing and manure storage systems from the N excretion, while the gaseous emission from housing is calculated by multiplying N excretion with the emission fraction per housing system ( 3 ,ℎ ). Ammonia emission fractions for housing and manure storage are distinguished per animal type 20 and manure type. The emissions of ammonia from agricultural land are calculated by multiplying the N input by grazing, manure application and fertilizer application with ammonia emission fractions for grazing 3 , ), manure application ( 3 , ) and fertilizer application ( 3 , ), respectively (Kros et al., 2012;De Vries et al., 2020 Finally, 3 emissions in each NCU are available for fertilization on 32 croplands (31 CAPRI arable crop types and grassland) with 5 types of manure (poultry, cattle liquid/solid, pig liquid/solid) and mineral fertilizer, as well as for grazing, housing of three animal types and manure storage of 5 manure types, in total 201 categories. 10

The MACC-INTEGRATOR Combined Inventory
We replaced the agricultural emissions in the original MACC-III inventory with the INTEGRATOR emissions, which significantly increases the level of details. For simplification, the 31 CAPRI crop types in INTEGRATOR were aggregated using the Indicative Crop Classification (ICC), into cereals, root crops, industrial crops, vegetables, grass and fodder.
Consequently, there were 36 categories regarding emissions from fertilization on croplands. Grazing, animal housing, and 15 manure storage were kept as they were, resulting in 45 categories in total in the combined emission inventory (see Fig. C1 in Appendix C).
Since the two inventories use different coordinate systems, coordinate transformation was performed to resample INTEGRATOR emissions onto the grid utilized in MACC-III. The resampling was conducted by 1) averaging the emission in one NCU evenly over the whole polygon; 2) dividing each square kilometer grid cell into 25 subpixels and calculating the 20 coordinate of the center of each subpixel in latitude/longitude; 3) locating the calculated coordinate of each subpixel of NCU in MACC-III grid and assigning emission to the corresponding MACC-III grid.
It has to be pointed out that the 3 emission estimates from INTEGRATOR differ from the officially reported national emission totals which are used in the MACC-III inventory. This is because each country uses its own emission inventory methodology, whereas INTEGRATOR uses a uniform methodology for all countries. To assess the impact of the different 25 spatial (and temporal) allocation and be in line with officially reported emissions, we scaled the 3 emissions from INTEGRATOR with the country totals of 2010 officially reported in 2018. The scalar is computed per country per animal type, namely the division of INTEGRATOR emission and officially reported emission to EMEP.

Temporal Allocator
The usual approach to characterizing the temporal variability in 3 emissions is to use time profiles that distribute the annual 30 emission total in a grid cell over a year. Fixed and oversimplified temporal profiles (monthly, daily, or hourly resolved) are often used (Van Pul et al., 2009). In this section, we explicitly described the temporal allocation of 3 emissions from manure and fertilizer application based on the concepts of Skjøth et al. (2004), Gyldenkaerne et al. (2005) and Hutchings et al. (2012).
The temporal distribution functions of ammonia emission from grazing, animal housing and manure storage were taken from Gyldenkaerne et al. (2005), which are presented in Appendix D.
The temporal distribution of 3 emission from fertilization is dependent on the timing of manure and fertilizer application 5 on arable lands and grassland, weather conditions, as well as legislative constraints. We first followed the methodology as outlined by Gyldenkaerne et al. (2005) to characterize the temporal variation of the emission strength as a function of time, temperature, and wind speed. The emission function used may be described as Eq. (1) where , , is the emission strength after application of fertilizer on crop in NCU , , , is the annual total emission 10 (kg/ha), T(t) and W(t) are the air temperature (Celsius) and wind speed (m/s) for the applied time step (t), is Julian day with peak emissions, and is the standard deviation to represent spread and uncertainty in the application activities and emission timing.

The improvement of fertilization day
The first challenge was to update the estimated central day (the day with peak emissions) for manure and fertilizer 15 applications. The timing of these field operations was obtained by the methodology of the TIMELINES model that was developed to assess the timing of field operations, including the Julian day of fertilization on a wide range of crops (Hutchings et al., 2012). It was calculated at the 50 km × 50 km MARS meteorological grid level in Europe (Goot, 1998). Hutchings et al. (2012) took the weather conditions over a year into account when simulating crop calendars by introducing a thermal time approach. Thermal time is the sum of the positive differences between daily mean air temperature and a base temperature and 20 is written as Eq. (2): where is the thermal time (in Celsius) over time t (day), is the daily mean air temperature at 2 meters, is the base temperature (0 degree Celsius), 0 is the starting time of calculation 1 January. As soon as thermal time on Julian day t reaches the reference thermal time for sowing (or harvesting), it is considered that sowing (or harvesting) occurs on this day. All other 25 field operations including plowing, and manure and mineral fertilizer operations are related to it.
We back-calculated the reference thermal times , (ℎ ) for various crops based on the sowing and harvesting dates provided by Hutchings et al. (2012). ECMWF meteorological data for the years between 1985 and 1995 and the respective days (ℎ ) were inserted into Eq. (3): The period between 1985 and 1995 was selected as Hutchings et al. (2012) followed a similar proceeding based on the CGMS dataset and used obtained reference thermal times to calculate sowing and harvesting days for 1995 onwards. The sowing and harvesting dates derived in this paper are in good alignment with the work of Hutchings et al. (2012), as shown in Fig. E1 in Appendix E. The sowing day estimates of winter wheat and spring wheat in 2010 are shown in Appendix F Fig. F1. The timing of manure application is based on sowing dates and varies from one manure type to another. Mineral fertilizer is applied in 5 two applications, with the first application (20% of the annual amount) conducted five days prior to sowing for spring crops and at the start of the growing season for winter crops. The second application is made after 20% of the growing season has elapsed (Hutchings et al., 2012).
We assumed that the peak of emission after application occurs at noon on the second day after the estimated central fertilization day. This is based on field experiments that show the emission from mineral fertilizers has its maximum in the first days after 10 application (Loubet et al., 2009;Schjoerring and Mattsson, 2001;Whitehead and Raistrick, 1993). Søgaard et al. (2002) observed that half of the 3 emission takes place within the first 30 hours. Plöchl (2001) looked into 227 experimental trials and found that 80% of the emission was reached within two days. However, in some cases (e.g., urea applied in dry conditions resulting in slow hydrolysis), fertilizer emission may proceed for over a month after application, which is unlikely in our study area (Sutton et al., 1995). We assumed that the peak of emission after application occurs at noon on the second day after the 15 estimated central fertilization day.
Even though the TIMELINES model indicates a single day of fertilization in an NCU, in practice, farmers certainly would not operate precisely at the same time. The central estimate of fertilization day is uncertain due to other influencing parameters such as soil conditions as well as the availability of machinery and labor. Also, Gyldenkaerne et al. (2005) argued that there would still be variation in the timing of fertilization because it would take time for farmers to complete these operations. As a 20 consequence, normal distribution around the central estimate was used here to characterize it.
The standard deviation around the central value is given in a fixed number of days since it is determined by the agricultural practice of farmers (independent of the thermal sum approach) and includes a random uncertainty. Gyldenkaerne et al. (2005) assumed there are four times of manure application in a year: early spring, late spring, spring-summer, summer-autumn. The standard deviation of the spring-summer application is 16 days, while that of the remaining applications was nine days. The 25 standard deviation of the timing of the mineral fertilization applications in early spring and summer were 9 and 16 days, respectively. We made a similar assumption in this paper: for fertilizations that lie between mid-May and mid-August, the standard deviation of the corresponding emission function is 16 days, while for the remainder, the standard deviation is considered to be nine days.

The inclusion of legislative conditions 30
The next step is to implement legislative constraints on manure and fertilizer application. In Germany, manure application is not allowed from 1 November to 31 January on arable land, and from 15 November to 31 January on grassland (Kuhn, 2017).
In Flanders of Belgium, manure spreading is not allowed in the winter period from 15 October till 15 February (Vlaamse Landmaatschappij, 2016b). We expanded this period to Belgium and Luxemburg in this study due to a lack of knowledge in these regions. As for the Netherlands, solid manure is prohibited from 1 September to 31 January, while other manures are banned between 16 September and 15 February on arable land and between 1 September and 15 February on grassland. Mineral fertilizer is prohibited from 16 September to 31 January on both grassland and arable land (Rijksdienst voor Ondernemend Nederland, 2019). Furthermore, Vlaamse Landmaatschappij (2016a) pointed out that in Flanders, it is not allowed to fertilize 5 on Sundays, nor when the soil is frozen or covered by snow. Normally, frozen soil and snow cover appear outside permitted dates. The ban on fertilization outside permitted dates and on Sundays is the most significant constraint and was applied to all regions in the area of interest by setting the emission strength to zero in Eq. (1).

The impact of excessive precipitation
The occurrence of excessive precipitation was also accounted for, since the soil can become water-saturated, with a negative 10 impact on the infiltration rate of liquid manures and the risk of strongly enhanced surface runoff. Furthermore, trafficking the wet soil surface with heavy machinery is likely impossible. We used the weekly De Martonne-Index to capture the characteristics related to precipitation or soil water content. The index describes the ratio between precipitation sums and average 2-meter temperature (Croitoru et al., 2012). Here, the index is computed on a weekly basis to represent more real-time humidity. For weekly values, it is written as Eq. (4) where is weekly total precipitation in millimeter, is weekly mean temperature in Celsius, and C is a constant (10) that assures that negative mean temperatures do not result in negative indices. The introduction of temperature parameterizes the impact that higher temperature will lead to faster evaporation and more effective infiltration. Baltas (2007) defined that when the annual De Martonne-Index exceeds 55 (namely 55/52.143 ≈ 1.055 in the case of the weekly index), the air is considered 20 extremely humid. One example of the weekly De Martonne-Index time series is given in Fig. G1 in Appendix G. Kranenburg et al. (2013) used visual inspection to set up a threshold of 1.7, above which precipitation and soil water content are not suitable for fertilization, and farmers will have to postpone application. Therefore, for each day on which the threshold is violated, ammonia emission is set to zero, and the remaining part of the function is moved forward by a day.

The finalization of the emission time profile 25
Moreover, a baseline in the time profile was introduced. Due to some application techniques, especially injection, manure and fertilizer stay underneath the soil for a much more extended period before ventilation. Thus, 5% of annual emission is allocated throughout the year as a baseline to represent background emission. Since the emission time profile needed by LOTOS-EUROS has an hourly temporal resolution and a mean of 1, the temporal distribution of emission strength for fertilization was normalized to derive the final emission time profiles. In comparison with the original time profiles used in LOTOS-EUROS, 30 the newly developed ones are spatially and dynamically explicit based on land type, amounts of emission and local climatology.

Examples of
3 emission time profiles during construction at location (47.41°, 10.98°) in latitude/longitude in 2010 are presented in Fig. H1 in Appendix H.

The LOTOS-EUROS Model
The annual emission distribution and gridded hourly time profile were then imported into LOTOS-EUROS to obtain modeled surface concentrations and total columns. They were compared with satellite observations and in situ measurements for model 5 evaluation. LOTOS-EUROS is a 3-dimensional regional CTM that uses a description of the bidirectional surface-atmosphere exchange of 3 (Manders et al., 2017;Wichink Kruit et al., 2010). In the previous studies, the model showed a good agreement with yearly averaged 3 measured concentrations, except that there is slight underestimation in agricultural source areas and slight overestimation in nature areas (Wichink Kruit et al., 2012). The version of LOTOS-EUROS in this study includes the labeling module by Kranenburg et al. (2013), which tracks the contribution of emission sources from specific 10 categories to the final simulated products. The categories that we wanted to label, namely all agricultural sectors, were defined accordingly before the model runs. As a result, besides the regular outputs, the fractional contribution of each labeled category was also calculated.

Available Measurements
Among the outputs of LOTOS-EUROS, surface concentration and 3-d concentration were compared with in situ measurement 15 and satellite observations for verification. Both in situ and satellite observations have their advantages and disadvantages.
Since the transport of 3 in the atmosphere and the reaction with other atmospheric components are rapid, its emission and deposition dynamics affect concentrations on the scale of hours to days. Ground-based stations measure 3 surface concentration consistently at fixed locations and some of them have relatively high temporal resolutions (hourly or daily), which offers the possibility to study the behavior of 3 emission. However, the measurements lack vertical information as 20 most instruments only measure surface concentrations (Van Damme et al., 2015;Erisman et al., 2007). Horizontally, the setup of station networks is rather coarse, and the representativeness is an issue since measurements of all monitoring sites will be influenced by local and regional agricultural activities and other local sources. Consequently, we need to carefully take into account the locations of the stations when comparing in situ measurements with simulated results. Airborne measurements have been carried out, but only occasionally with limited spatial coverage during campaigns (Dammers et al., 2016;Leen et 25 al., 2013;Nowak et al., 2010). Satellite observations have the advantage of global coverage and the possibility to calculate area-averaged observations which are in much better correspondence with the size of the grid cells in regional/global models (Flechard et al., 2013). Recently, remote sensing products with a higher spatial and temporal resolution have become available for better 3 concentration monitoring in the lower troposphere (Clarisse et al., 2009;Van Damme et al., 2015).

In situ measurements
The Umweltbundesamt (UBA) research foundation sets up monitoring stations, providing governments and the public with information on the concentration of air pollutants (Schleyer et al., 2013). It measures species, including 3 , that are essential for the improvement of knowledge about air quality and climate change. The UBA also collects the data from the network of the German federal states. In addition to the German networks, the Measuring Ammonia in Nature (MAN) network monitors 5 monthly mean values of 3 concentrations in Natura2000 areas in the Netherlands to detect the spatial pattern in concentration or to assess the influence of local sources (agriculture activities but also traffic) (https://man.rivm.nl/) (Lolkema et al., 2015;Noordijk et al., 2020). The network aims to be representative of different habitat types, 3 concentration levels, area size and shape, as well as the geographical distribution (Lolkema et al., 2015). When illustrating the comparison of concentrations time series, we selected several stations that are not close to local agricultural sources (as shown in Table I1 in 10 Appendix I) so that the local influences on measurements could be minimized. Besides, by comparing all individual measurements at all available stations, the overall performance of the updated model can be determined.

Satellite Observations
Infrared Atmospheric Sounding Interferometer (IASI) is a Fourier transform infrared (FTIR) spectrometer that measures the thermal infrared (TIR) radiation emitted by the Earth's surface and the atmosphere. It circles in a polar Sun-synchronous orbit 15 and operates in nadir mode. It has a wide swath width of 2 x 1100 km, which corresponds to 2x15 mirror positions, while the spatial resolution is 50 km x 50 km, composed of 2 x 2 circular pixels. Each circular pixel is a 12 km diameter footprint on the ground at nadir (Clerbaux et al., 2009). Regardless of the improvement of 3 column retrieval from satellite observations, there is still substantial variability in measurement uncertainty, varying from 5% to over 1000 % . Measurements with small magnitude tend to have larger relative uncertainties. Due to considerable uncertainties and the requirement of clear-sky conditions, IASI data is insufficient for real-time monitoring but sufficient if used to calculate monthly or yearly average distributions. In this study, the annual mean was compared with LOTOS-EUROS output for verification. The monthly mean was calculated to 30 investigate the feasibility of being used for validation of temporal variability. For each IASI observations, the modeled results that are closest in space and time were selected.
In this paper, we used ANNI-NH3-v2.2R-I IASI dataset which was obtained with ECMWF ERA-Interim meteorological data and surface temperature data retrieved from a dedicated network. After the dataset was downloaded from the AERIS portal (https://iasi.aeris-data.fr/NH3R-I_IASI_A_data/), we only selected satellite observations with daytime overpass because daytime is the better time to measure 3 (Van Damme et al., 2017). Area-weighted annual mean was derived by resampling the circular footprints of IASI onto the grid used in LOTOS-EUROS. Area averaging was also applied to the calculation of 5 mean relative error of each grid cell. Finally, post-filtering was carried out to obtain more reliable distributions: all grid cells with less than ten measurements were rejected.

Comparison between MACC-III and MACC-INTEGRATOR annual emission
Because of the less detailed EMEP SNAP Level 1 categorization in the MACC-III inventory, comparisons were made at 10 country level for cattle, pig, poultry related emissions (the sum of housing, manure storage and application), as well as mineral fertilizer emissions. Table 1 shows that country emission totals from the updated inventory MACC-INTEGRATOR are all larger than those from the original MACC-III inventory because it uses a different version of reported emission totals. Germany witnesses the largest positive difference in absolute value, while Luxemburg shows the most significant relative change.
Compared to MACC-III, MACC-INTEGRATOR estimates more emissions from cattle and mineral fertilizer in all countries 15 except for the Netherlands. Pig emissions in Germany and the Netherlands rise by 24.7% and 36.4%, respectively, while that in Belgium slightly decreases. Poultry emission drops by more than 20% in Germany, whereas the amount increases in other countries. It implies that the scaling we applied per country based on animal types and mineral fertilizer plays an essential role.
For example, INTEGRATOR estimates less agricultural emission in Germany than MACC-III, but after scaling the combined inventory reveals the opposite trend which indicates 14% more emission in Germany than MACC-III. 20 The spatial distributions of 3 emissions from the two inventories are presented in Figure 3. Figure 3(a) and Figure 3(b) are the maps of annual total agricultural emissions. In Germany, the new spatial allocator assigns more emissions in the southeast near the border with Austria. The two hot spots in Bremen and Ruhr in the original inventory merge into one located in the Ostwestfalen-Lippe region. In Schleswig-Holstein in Northern Germany, the original model indicates that most of the emissions are concentrated in the center of the state. At the same time, the updated one tells emissions are situated along the 5 eastern coastline in the state. In the southeast of the Netherlands, the updated inventory allocates more emissions and smoothens the spatial details into larger blocks. After looking into NCU polygons, we found out the sizes of polygons at this location are much larger than the others. Because we evenly allocated emission within an NCU polygon over the polygon, it is possible to lose spatial characteristics, especially when a polygon has a larger size. Figure 3

Observed and modeled total columns
After filtering IASI measurements, the number of valid daytime overpass measurements in each month is illustrated in Fig.   J1(a) in Appendix J. The month in which the most valid observations (more than 7500) occurred is April, followed by July and June in which there were nearly 6100 and 5600 measurements, respectively. The measurements in these three months occupy more than half of the daytime measurements in the whole year. Figure J1(b) in Appendix J shows the spatial distribution 5 of measurement counts over the area of interest.
3 is measured most frequently in Western Germany, Southern Germany bordering Austria, the Netherlands, Belgium and Northern France. The influence of satellite footprint on the availability of data leads to the strips which are more visible in Germany and France.
The spatial characteristics of area-averaged relative error are shown in Figure 4(a). The regions with fewer measurements tend to have a higher relative error, while low errors (less than 80%) appear in the Netherlands, Belgium and Western Germany 10 where lots of observations are available. Figure 4(b) represents annual area-averaged 3 total columns after post-filtering which excludes gird cells that have less than ten measurements. One can see that 3 level is considerably high in the Netherlands, Belgium and Western Germany.

Figure 4 The map of area-averaged relative error of IASI daytime measurements in 2010 (a). The map of area-averaged total columns after filtering out grid cells with less than ten valid measurements and an averaged relative error larger than 75% (b).
The modeled annual averaged total columns from LOTOS-EUROS simulations are shown in Figure 5. Overall, the updated result ( Figure 5  at higher latitudes and tell that both models underestimate ammonia at these locations. Weighted linear regression was performed, with weight being inversely proportional to the square of the averaged relative error. The outcomes obtained by 15 the new model have been improved, but both performed rather poorly.

Figure 6 Scatter plots comparing annual averaged total column from IASI measurements and LOTOS-EUROS. The color of the points indicates latitude. The left panels and right panels use original and updated modeled results, respectively. (a) a nd (b) include all valid grid cells. (c) and (d) show grid cell with lower latitude (< 49°N) while (e) and (f) focus on points with latitudes
larger than 49°N.

5
The performances of the original and updated model comparing with IASI observations were investigated for all grid cells within Germany and Benelux, as well as separately in each country (Table 2). Every indicator has improved for the new modeled results. Both NRMSE and NMAE have dropped, with the largest deductions from Luxemburg. Regarding model efficiency, even though the new modeled output gives values closer to one, they are still negative. In addition, the index of 5 agreement witnessed the largest increase in Germany and the Netherlands. The feasibility of verifying emission estimates by comparing weekly or monthly time series derived from IASI measurements and simulations was also investigated. However, the majority of valid data are in April, June and July (see Fig. J1(a) in Appendix J), the number of valid measurements per month is not sufficient for most grid cells to obtain reliable continuous time series. Consequently, two alternatives could be considered to resolve this issue. First, several years averaging is required for a better trend analysis within a year. It is also possible to look at a longer time frame with coarser temporal resolution. 15  Once again, the four indicators and correlation coefficient were calculated to determine the performance of the original and updated model (Table 3). All indices illustrate that the updated model has improved surface concentration estimates. The improvement in the Netherlands is much larger than that in Germany. The reason might be that the setup of ground stations is 10 more consistent in the Netherlands. The locations of the Dutch stations are in the nature areas, making them more representative of the overall emission temporal variation of a grid cell.  Mecklenburg-Vorpommern, Germany. The station is located in an agriculturally active region with cereals, industrial crops and animal housing. As can be seen in Figure 8(a), the original model does not correspond with the measurements well. There is almost no 3 measured before Julian Day 64, but the original model estimates that there are two peaks on Day 38 and 59. Besides, the first two peaks in the measurement on Julian Day 80 and 110 are not captured by the original model. On the contrary, the updated model manages to simulate these two peaks, even though they are slightly delayed by ten days. The first 10 and larger peak of the two in spring is mainly explained by cattle manure application, followed by pig and poultry manure application, and mineral fertilizer contributes to a lesser extent. In the summer between Day 150 and 275, the new modeled result also does a good job distributing 3 emission temporally, with animal houses, cattle storage and mineral fertilizer application dominating 3 emission. A similar situation applies to station DEUB005 in Lüder-Langenbrügge, as shown in Figure 8 improves the estimates a lot, even merged peaks from spring mineral fertilizer and manure application are detected. However, there still exist two issues. One is that the peaks in spring between Day 64 and 140 are overestimated. The other one is that the whole time series is delayed by five days. A possible reason for the delay of emission from fertilization is that the reference temperature sum in TIMELINES to estimate fertilization day is too large at this location. Another cause could be that the 20 threshold of De Martonne-Index (1.7) is too low at this location. Some days in February are considered to have excessive rain, so the whole curve is shifted to the right direction of the x-axis.

5
Another station in the region of Hanover, Lower Saxony, is demonstrated in Figure 9. The measurements at this station only have a monthly temporal resolution. The updated model has shown much better correspondence with measurements than the original one, except that the average surface concentration in May is almost 50 percent higher. Figure 9(b) is able to point out that most of the agricultural activity at this location is related to fertilization, among which cattle and pig manure applications 5 have dominance. Thus, the overestimation in spring is probably linked to cattle or pig manure application. There are two possible contributions to this behavior. One is the emission fractions used in INTEGRATOR. The INTEGRATOR model uses country dependent emission fractions, which have been updated and detailed through others' studies. However, they do not account for differences in manure characteristics, climatology and soil properties. Another reason is the way of resampling emission from NCU polygons to the grids in LOTOS-EUROS, which leads to misallocation of emission to places without any 10 sources. Last but not least, Lower Saxony is one of the states in Germany which has the highest density of livestock in the country. INTEGRATOR model calculates 3 emission based on proxy maps of animal number and excretion input, without considering the fact that manure from this high production region could be transported to other regions where manure is in demand. This will also lead to an overestimation in areas with excessive livestock excretion.

Discussion and Conclusions
The comparison with in situ surface concentration measurements The time series of surface concentrations from the updated model show better alignment with in situ measurements than those from the original model, making it possible to detect the 3 temporal variability brought by various agricultural activities. This is achieved by making adjustments to the method in Gyldenkaerne et al. (2005) using TIMELINES, implementing 5 legislative constraints, and including the impact of excessive precipitation. Nevertheless, there are occurrences of inconsistency.
First, the modeled time series could be delayed with respect to in situ measurements. A possible reason is that the reference temperature sum in TIMELINES to estimate fertilization day needs correction. Agricultural models, including TIMELINES, usually work from the perspective of maximizing the efficiency of nitrogen use. However, farmers are likely to choose to apply 10 manure and mineral fertilizer when labor and machinery are both available and are unlikely to finish manure application in one day on the farmlands. This leads to the inaccuracy between the fertilization day estimate and reality, and an extended manure application period. Moreover, the TIMELINES model heavily depends on the empirical data on sowing and harvesting dates currently used within CGMS to calculate the thermal time thresholds. The data need updates and are limited regarding the variety of crops, making it capable of simulating the timing of field operations for some but not all arable crops at different 15 locations across Europe. Consequently, a more thorough analysis is needed to refine the relationships between various field operations (Hutchings et al., 2012). There are other factors related to the timing of fertilization. For example, soil moisture, workability and trafficability were neglected in TIMELINES, but they might affect the prediction of plowing and sowing. In addition, solid manure applications for spring crops could be made in autumn of the previous year. Another reason for the delay could be the threshold of De Martonne-Index (1.7) which was decided with a visual inspection for Flanders by 20 Kranenburg et al. (2013) and expanded to the whole area of interest. When the threshold is too small, the time profile will be delayed because precipitation is too often considered to be excessive for fertilization operations. Further improvement of the De Martonne-Index algorithm is in need to account for regional differences. More studies about De Martonne-Index should be done to correlate excessive precipitation and its impact on agricultural practices. Furthermore, sometimes the magnitude of surface concentration is not in accordance with measurement, or the time series 25 completely mismatches measurements. This could be caused by emission reallocation from NCU to the LOTOS-EUROS grid as well as the restricted spatial representativity of measurement locations. During the resampling of emissions from NCU level to the LOTOS-EUROS grid, emission estimates within an NCU from INTEGRATOR are evenly distributed all over the polygon, regardless of the actual locations of crops, animal houses, and manure storage facilities. In addition, some NCUs are composed of multiple disconnected polygons, within only some of which a certain crop, animal house or manure storage is 30 present. Hence, emissions are wrongly allocated to areas without any sources. Besides, spatial characteristics such as hot spots will be smoothened out for NCU polygons of larger sizes. Therefore, high-resolution crop maps can help allocate emission from fertilization inside polygons onto where arable land and grassland appear, and detailed information on animal housing locations can transform housing emissions into point emissions. What's more, in situ measurements represent the 3 emission characteristics of a point source, but the spatial resolution of the updated model is around 7km × 7km which is relatively coarse. A station next to animal houses or manure storage facilities will result in a constant high level of 3 over the year, while a station next to farmlands will be highly affected by agricultural operations on the farmlands. Therefore, stations in remote areas are more representative of a wider region. This is the reason why the updated model performs better 5 at Dutch stations than at German stations (Table 3). MAN stations are set up to measure nature emission of ammonia, so their measurements represent better the emission variability in the grid cells. However, there are always stations next to sources given the size of the country. Ideally, in order to accurately verify the temporal allocation of emission from fertilization and housing, the spatial resolution should be increased with the help of a detailed crop map and animal housing information so that grid cell can represent local agricultural activity more. 10 As a result, a detailed crop map is key to the improvement of ammonia emission estimates. Inglada et al. (2015) assessed the state-of-the-art supervised classification methods and produced more accurate crop type maps with high resolution multitemporal optical imagery from SPOT4 (Take5) and Landsat 8. Surface reflectance, NDVI, NDWI and brightness were chosen as features, random forest and support vector machine (SVM) were selected as classifiers. Belgiu and Csillik (2018) proposed a time-weighted dynamic time warping (TWDTW) method that uses NDVI time series obtained by Sentinel-2 data for 15 classification. It was proved to be more efficient in terms of computational time and less sensitive concerning the training samples, which is essential for regions where inputs for training samples are limited. Besides Sentinel-2 optical images, Giordano et al. (2018) also included Sentinel-1 radar measurements for crop classification using the complementarity between the multi-modal images, because Sentinel-1 radar images allow getting more information where Sentinel-2 suffers from cloud cover. We will make use of the above methods to obtain crop maps with high spatial resolution. The maps will be used to 20 update manure distribution according to N demand of different crops and subsequent ammonia emissions. They are helpful in allocating emission from manure and fertilizer application in a more precise way.

The comparison with IASI total column data
The quality of the modeled annual averaged total column relies on the assumption that the spatial distribution of the 3 emission in LOTOS-EUROS closely represents reality. The temporal distribution is also of great importance because only 25 modeled columns at overpass time were selected for averaging.
There are large inconsistencies in the comparison between IASI observations and the modeled results from both the original and updated model. One reason for the inaccurate emission allocation could be that land use data and local agricultural activity inputs such as animal numbers and N excretion in INTEGRATOR are inaccurate. Local agricultural activity data are more accessible in countries like the Netherlands, Denmark, and Portugal. And land use data can be updated with a detailed crop 30 map as discussed previously to achieve more accurate estimates of N demand, manure and fertilizer distribution, and subsequent ammonia emission. Another factor that could cause spatial inconsistencies is the emission fractions used in INTEGRATOR. Emission fractions are nation-wide averages that describe the linear relation between emission and N input (excretion in animal housing and manure storage, applied manure and fertilizer). But in reality, they could vary from region to region due to application methods, manure properties, soil properties, and weather conditions. Those impacts have been studied by Huijsmans for both arable land and grassland (Huijsmans, 2003;Huijsmans et al., 2001). He defined the formula to describe the relationship between 3 volatilization rate and the method of application and incorporation, total ammoniacal nitrogen (TAN) content of the manure, manure application rate, wind speed and ambient temperature. Additionally, the empirical modeling of the emission process is carried out by RIVM and WUR using Volt'air approach (Huijsmans et al., 2014). 5 Preliminary results show that the variations of weather conditions over the past 20 years lead to different emission fractions per month, and soil and manure characteristics also influence emission fraction. As a result, emission fraction differs at farm scale, contributing to inhomogeneous emission fraction on a regional or national scale. Therefore, there are two steps for improvement in terms of land use, local agricultural activity data and emission fraction. In the short term, we will implement detailed land use and local activity data for the Netherlands, Denmark, and Portugal, and investigate the difference brought by 10 the refinement of the input. Next, a meta-analysis will be performed for the parameterization of spatially and temporally explicit emission fractions, taking into account local climatology, soil properties, fertilizer characteristics and application method.
A possible source of overestimation in lower latitudes and underestimation in higher latitudes in Germany is the neglect of possible manure transport. INTEGRATOR assumes that the emissions from a certain animal, including housing, storage and 15 manure application, occur where the animal is located, ignoring manure transport from regions with excessive manure to those with shortages. The role of manure transport is more significant when there is a lot of animal livestock. Hendriks et al. (2016) looked into manure transport data in Flanders and found that the manure transport data account for roughly one third of the amount of manure used in Flanders each year, while the remaining two thirds consists of manure that farmers apply on their own land. Hansen-Kuhn et al. (2014) showed that southern Germany is one of the areas in the country which has the highest 20 density of cattle and pig livestock. It is likely that the neglect of manure transport contributes to the overestimation in the lower latitudes. Therefore, manure transport data can be used as a proxy to improve the spatial distribution, and the pattern of manure transport can additionally help construct the temporal pattern of 3 emissions from manure application, under the assumption that manure is applied to the fields on the day of transport.
Moreover, the uncertainty in IASI measurements also has an impact on the comparison. Dammers et al. (2016) found that the 25 validity of the IASI product is quite limited because the satellite retrievals are biased. The retrieval of 3 columns from IASI is still an on-going process, with a few studies having examined the quality of the products. Further development and validation of the IASI retrieval are very much in need for the understanding of the satellite's product. It remains poorly validated with only a few dedicated campaigns performed with limited spatial, vertical or temporal coverage. The key finding of the previous studies on the retrieval is that vertical profiles of 3 distribution has lots of uncertainties and need to be improved. Dammers 30 et al. (2016) suggested that tower measurement campaigns are crucial and helpful towards a better understanding of the vertical profile. Li et al. (2017) showed that there is an apparent seasonal variation in the vertical distribution of 3 and that the slope of the 3 concentration gradient varies throughout the year, with relatively high 3 ground concentrations during winter. His reasoning was that boundary layer is shallower in winter, which will potentially trap 3 emissions and reduce concentrations higher up the column. As a result, IASI could miss high 3 ground concentrations in winter because of the lack of sensitivity to the lower parts of the boundary layer. On the contrary, most of the measurements used in this paper to calculate annual average are in April, June and July in which weather is relatively warmer and the boundary layer is thicker, especially during clear-sky daytime condition. Recently, new products become available, making it possible to cross-check results among satellites. Cross-track Infrared Sounder (CrIS) is one of the new products that deserve attention, having the 5 advantage of acquiring more explicit information on the sensitivity of the satellite (averaging kernel).

Conclusions
In summary, this paper is an attempt to build a new 3 emission model which is composed of a spatial allocator and a temporal allocator. The spatial allocator provides more spatial details and can distinguish various agricultural sectors, including crop types, fertilizer types, animal houses and manure storages. The distribution of annual emission obtained from MACC-10 INTEGRATOR demonstrates more emissions overall, with country totals 14% higher in Germany, and 6.6% higher in Benelux. Extra new hot spots appear in southeastern Germany, while the spatial characteristics in the east of the Netherlands are smoothened due to the allocation algorithm. The temporal allocator is spatially explicit and dynamic based on land use, local climatology and legislative constraints. The labeling module of LOTOS-EUROS helps to trackback the emission sector of the modeled 3 surface concentration and total columns for better interpretation and future improvement. Despite the 15 limitations in modeling and data for validation, LOTOS-EUROS performed better with the updated emission products, especially in the representation of the temporal behavior of 3 concentrations. Comparison between updated modeled results and observed 3 levels show much better correspondence and more robust performance, especially the temporal variability is captured better as the new methodology successfully differentiates regional variability in seasonality in 3 emissions. When reliable and detailed input datasets are available, and the methodology is further improved as described, we can expect 20 to extend this approach to Europe.

Statistical indices used to assess the performance of the models
To evaluate the performance of the updated model and compare it with that of the original model, we calculated the normalized root mean square error (NRMSE), the normalized mean absolute error (NMAE), the model efficiency (EF) and the index of agreement between the modeled results (predictions) and measurements. 5 The root mean square error of n predicted values of a regression's dependent variable, with ̂ being the i-th prediction and being the i-th estimate, is computed as the square root of the mean of the squares of the deviations: The NRMSE indicates RMSE in a relative sense, by dividing RMSE by the difference between the maximum and minimum observed values: 10 The normalized mean Absolute Error (MAE) is interpreted as the average absolute difference between and ̂, with reference to the mean of observations: The model efficiency coefficient is used to illustrate predictive power. It can range from −∞ to 1. An efficiency of 1 indicates 15 a perfect match of simulations to observations (Ritter and Muñoz-Carpena, 2013). The closer the model efficiency is to 1, the more accurate the model is.
Last but not least, the index of agreement (d) statistic was also employed, which represents the ratio of the mean square error and the potential error (Willmott, 1981). The agreement value of 1 indicates a perfect match, and 0 indicates no agreement at 20 all. However, it is overly sensitive to extreme values due to the squared differences (Willmott, 1981).
Appendix B

Methodology to allocate manure application over grassland and arable crop groups
In the INTEGRATOR model, manure is distributed over grassland and different crop groups using various allocation rules.
Manure produced by grazing animals and in housing systems by sheep and goats all enters grassland. For other manure, a fraction is applied to arable land and the remaining fraction is applied to grassland/fodder crops, distinguishing (i) liquid 5 manure of dairy cattle, other cattle and pigs, (ii) solid manure of dairy cattle, other cattle and pigs and (iii) poultry manure. For the distribution of manure application on arable land, we distinguish three arable crop groups with (i) a relatively high use of manure (sugar beet, barley, rape, and soft wheat), (ii) an intermediate use of manure (potatoes, durum wheat, rye, oats, grain maize, other cereals including triticale, and sunflower), and (iii) low use of manure (fruits, citrus, olives, oil crops, citrus, grapes and other crops) using weighing, based on Velthof et al. (2009). Finally, no manure is allocated to dry pulses and rice, 10 fiber crops, other root crops and vegetables.
As the last step, mineral fertilizer is distributed over crops on country level using a balanced N fertilization approach: 1. The total N demand in a NUTS 2 region is calculated as the sum of N in harvested products and in crop residues. The N in harvested crops is calculated from the crop yield and the N content in crop yield. The yields of arable crops for each country were derived from FAOSTAT on a country basis, and the N contents of harvested crop products were 15 based on literature. The N in crop residues is calculated by dividing the N removed in harvest with an N index.
2. The fertilizer N demand of each crop was calculated by subtracting the non-fertilizer N input from the total N demand and then divided by the N use efficiency (NUE).
3. The N fertilizer estimates for each NUTS 2 region were aggregated at country level and compared with reported country-level N fertilizer consumption. Scaling factors (the ratio of the known and calculated country-level N 20 fertilizer consumption) were then applied to ensure consistency. where is the actual time of the year, ( , ) is the total emission from fertilization on grassland within a grid cell, is the mean value for the Gaussian distribution, T(t) is the air temperature in Celsius, W(t) is the wind speed (m/s) for the applied time step (t). is the Julian day on which the thermal sum reaches 1400, except that the starting day of thermal time calculation is 1 st March, instead of 1 st January. depends on local climatology, so it differs from grid cell to grid cell. is the spread of the gauss function and is equated to 60 days, which means that grazing occurs in a relatively long period of time. 10 Regarding emissions from grazing on grassland, it is generally dependent on the release time of the cattle, the availability of grass, and the length of the growing season (Gyldenkaerne et al., 2005). The availability of grass is then primarily a function of precipitation, soil humidity, soil fertility, and fertilization. For a region that has a relatively even distribution of the precipitation during summer, such as the study area in this paper, Gyldenkaerne et al. (2005) suggested that a model following grass growth could be used to represent the characteristics of grazing emissions. Therefore, as the work of Skjøth et al. (2004), 15 here emission from grazing is assumed to follow the same pattern as grown grass in Eq. (D1). is a constant emission potential scaling factor for a given grid cell and can be neglected for simplicity (Elzing and Monteny, 1997 higher than the outside temperature used for manure storage (Gyldenkaerne et al., 2005).
represents lower boundary condition for temperature in animal housing and manure storage, below which emission is set to a constant level, and they are 18, 4, and 1 degree, respectively.
Pigs and poultry have a high lower critical temperature (LCT) between 6 to 20 degrees, below which an animal must expend additional energy to maintain normal body temperature and essential body functions. So in colder climates, they are usually kept in insulated buildings with forced ventilation to maintain a fixed temperature throughout the year (Seedorf et al., 1998b).
On the contrary, cattle have a very low LCT and are therefore often kept in open barns (Seedorf et al., 1998a). However, there still might be some insulated cattle barns with forced ventilation in colder climates (Gyldenkaerne et al., 2005). Consequently, 5 the function for forced ventilation is used to represent the temporal variation of pig and poultry housing emission, while the mean of functions of insulated houses with forced ventilation and open houses is calculated to characterize cattle housing emission. In terms of manure storage, it is assumed that the emissions from manure storage of all animal types have the same pattern.
10 Appendix E

Comparison of sowing day estimates
Comparisons between sowing days calculated in this study and by Hutchings et al. (2012) were made for verification. Figure   E1 depicts an example of the calculated sowing days of potatoes. Only the dates for years between 1985 and 2000 are selected for comparison because Hutchings et al. (2012) used predicted temperature data for years after 2000. The sowing days are in 5 good alignment with only a few outliers away from line y=x. 10 Appendix F Spatial variation in sowing day estimates for winter wheat and spring wheat Figure F1 shows that the sowing days of winter wheat and spring wheat generally have the opposite trends. For winter wheat, even though the differences between daily mean temperature and the base temperature are larger in the south, the greater reference thermal sum makes it take a longer time to reach this thermal sum. Whereas for spring wheat, the reference thermal 5 sum in the south is less than that in the north, resulting in earlier sowing day than in the north.

10
Appendix G Time series of the weekly De Martonne-Index Figure G1 shows that the weekly De Martonne-Index at location coordinate (48.98, 8.14) approximately ranges between 0 and 6.5 in 2010. High indices are observed around Day 30 before the first spring application period as well as at the end of the year. On these occasions, the index reaches values well above 3. 5 Figure G1 An example of the time series of the weekly De Martonne-Index at (48.98, 8.14). A threshold of 1.7 is determined, above which precipitation is considered to be excessive.

Examples of ammonia emission time profiles
Examples of 3 emission time profiles during development at the location (47.41°, 10.98°) in latitude/longitude in 2010 are presented in Fig. H1. The left panel represents time profiles of the application of cattle liquid manure on cereals, while the right panel demonstrates that of pig liquid manure application on grass and fodder. Four rows indicate the four phases during 5 the development of the time profiles. First and foremost, the initial emission time profiles (first row) in both panels were obtained using fertilization day estimation from TIMELINES and the emission function in Eq. (1), taking into account local climatology including temperature and wind speed. Subsequently, the emission strengths of Sundays were set to baseline since manure and fertilizer application were prohibited, as is shown in the second row. Furthermore, in the third row, prohibition on fertilization after late fall and before early spring (exact dates vary from country to country) did not affect the time profile on 10 the left panel since the emission function lies within the period where fertilization is allowed. However, for the right panel, part of the third peak exceeded the last allowed date for application. Thus, the part outside the application ban was cut out, and the rest of the peak was scaled accordingly. Finally, the impact of excessive rain on emission was accounted for in the last row. On each day where the De Martonne-Index exceeded the threshold 1.7, the emission curve before this day remained as it is, while the rest was shifted to the next possible day. It is possible that in the final time profile, emission lies slightly outside 15 the permitted period for fertilization. However, it is allowed under the assumption that the government allows a delay in manure and fertilizer application due to weather.

5
Appendix I Land use information of the selected in situ measurement sites Data availability. The updated annual 3 emission distribution and corresponding time profiles are available by request.
Competing interests. The authors declare that they have no conflict of interest.
Author contribution. Xinrui Ge designed and programmed the processing chain, performed the simulations and analyzed the 5 results for discussion and conclusion. Martijn Schaap is the daily supervisor of the project and provided with his expertise in atmospheric modeling and sciences. Martijn Schaap, Richard Kranenburg and Arjo Segers designed the model code of LOTOS-EUROS. Gert Jan Reinds helped with the technical issues of TIMELINES and INTEGRATOR. Wim de Vries is the promotor of the project. He and Hans Kros offered their knowledge regarding nitrogen use and 3 emission from agriculture. 10