Identifying forecast uncertainties for biogenic gases in the Po valley related to model configuration in EURAD-IM during PEGASOS 2012

Forecasts of biogenic trace gases in the planetary boundary layer (PBL) are highly affected by simulated emissionand transport processes. The Po region during the PEGASOS campaign in summer 2012 provides challenging, yet common conditions for simulating biogenic gases in the PBL. As a precursory step to a comprehensive model evaluation and uncertainty estimation, this study identifies and quantifies principal sources of forecasts uncertainties induced by various model configurations. The investigation is based on the EURAD-IM chemistry transport model employing the MEGAN 2.1 biogenic emission 5 module and RACM-MIM as gas phase chemistry mechanism. Isoprene and a composite of higher aldehydes are selected to demonstrate similarities and differences between these compounds. Two major sources of forecast uncertainties are identified in this study. Firstly, biogenic emissions appear to be exceptionally sensitive to land surface properties inducing total variations of local concentrations of up to one order of magnitude. Moreover, these sensitivities are found to be highly similar for different gases and almost constant during the campaign, varying only diurnally. Secondly, the model configuration also highly 10 influences regional flow patterns with significant effects on pollutant transport and mixing. As a result, surface concentrations of biogenic trace gases show large sensitivities to model configurations in this study. While isoprene concentrations are mainly sensitive to model configurations affecting the emission process, aldehydes show varying sensitivities related to both, biogenic emissions and transport processes. Especially in areas with small scale emission patterns, changes in the model configuration are able to induce significantly different local concentrations. This effect was corroborated by diverging source regions of an 15 exemplary airmass and thus applies also to non biogenic gases. The amount and complexity of sensitivities found in this study demonstrate the need to consider forecast uncertainties of chemical transport models with special focus on biogenic emissions and pollutant transport.

. Geopotential height at 500 hPa (color coded+isolines) and horizontal winds at 2 km height (arrows) during the PEGASOS campaign in Po valley as simulated by WRF (initialized with IFS reanalysis, setup denoted as "reference" later on).
ing dominant uncertainties to the model, generalized sensitivities of biogenic emissions during the campaign are presented in Sect. 6. Finally, Sect. 7 evaluates the effects of different model configurations and concludes consequences for model evaluation and uncertainty estimation. Specific cases which are identified to be representative for this study are described in Sect. 2.2.

EURAD-IM chemistry transport model
The atmospheric chemical data assimilation system EURAD-IM (EURopean Air pollution Dispersion -Inverse Model) combines a state-of-the-art chemistry transport model with spatio-temporal data assimilation and inversion methods (Elbern et al., 2007). The chemistry transport model within EURAD-IM provides forecasts of a large set of gas phase and aerosol compounds 170 up to lower stratospheric levels (e.g., Hass et al., 1995). Considered transformation processes include dynamical transformations due to advective and diffusive processes as well as reactive chemistry with other compounds and photolysis. For this study, the RACM-MIM (Regional Atmospheric Chemistry Mechanism, Geiger et al., 2003) mechanism for reactive chemistry is selected which considers 221 chemical-and 23 photolysis reactions of 84 gases including condensed isoprene degradation (Mainz Isoprene Mechanism MIM, Pöschl et al., 2000).

175
Emissions from anthropogenic-and biogenic sources are treated separately, where anthropogenic emissions are provided by the TNO-MACC-II inventory (Kuenen et al., 2014). The MEGAN 2.1 (Model for Emissions of Gases and Aerosols from Nature version 2.1, Guenther et al., 2012) module is used for biogenic emissions from urban, natural, and agricultural sources.
In total, 147 chemical compounds are considered, which are grouped into 19 classes according to their emission properties.
For each of these component classes, vegetation dependent emissions are calculated from standard emissions, multiplied by 180 the local vegetation fraction. These vegetation dependent emissions are scaled by an activity factor to account for variations in the environmental conditions. According to Guenther et al. (2012), effects of radiation, temperature, leaf age, soil moisture, and CO 2 are included in a multiplicative manner. Therefore, required input parameters include fields of plant functional types (PFT), leaf area index (LAI), solar radiation, air temperature, and soil moisture.
Dry and wet deposition is implemented in the model, where wet deposition is included in the treatment of clouds. The dry 185 deposition velocity is modeled by a multiple path resistance scheme according to Zhang et al. (2003) , where aerodynamic resistance and quasi laminar sublayer resistance are a function of friction velocity (Wesely et al., 2002). In addition, different contributions to the overall canopy resistance depend on photosynthetically active radiation, air temperature, water-vapor deficit, LAI, and friction velocity.
The EURAD-IM system includes 4 dimensional variational data assimilation (4Dvar) for initial state and emission rate op-190 timization (Elbern et al., 2007). This assimilation algorithm comprises an adjoint model which integrates a signal backward in time. Being implemented into the modeling system, the adjoint model can be modified to quantify the history of selected airmasses according to Vogel et al. (2020). By switching of chemical conversions, the retroplume operator allows the identification and investigation of source regions of air parcels including the convolution and mixing of different airmasses. 195 In the following, important model configurations and their realizations selected in this study are briefly described. As summarized in Table 1, two representative realizations -a reference and an alternative option -are selected for each model configuration. Most model configurations apply to the meteorological forecasts of WRF and transfer to the atmospheric chemical forecasts via its dependencies on meteorological conditions. Firstly, initial-and boundary conditions for WRF are provided by global meteorological analyses. Here the IFS reanalysis (Hortal, 1998) provided by ECMWF is used as reference and the database. Based on AVHRR (Advanced Very High Resolution Radiometer) observations between April 1992 and March 1993, the surface at each location is classified as a single USGS land use category (Loveland et al., 2000). The USGS data base 205 includes 24 different categories including water, urban, snow and ice as well as various vegetated surface categories (Anderson et al., 1976). Although the database provides unsupervised surface classification, the occurrence of different surface types is treated by mixed categories (e.g. "Cropland/Woodland Mozaic").

Model Inputs
Land use information based on MODIS (MODerate-resolution Imaging Spectroradiometer) observations are selected as alternative input. Vegetation products from MODIS and Sentinel-2 satellites can give more recent information on spatial dis-210 tributions and also temporal evolution of vegetation types. Currently, Sentinel-2 provides vegetation products in the highest resolution (up to 10 m horizontal resolution, e.g., Immitzer et al., 2016;Drusch et al., 2012). However, these data are not available for 2012 as the satellite was launched in 2015. Thus, MODIS fractional vegetation data with 1 km spatial resolution (Friedl et al., 2002) are transferred to land use information for this study. Multiple studies indicate a more detailed and reliable characterization compared to AVHRR based products (e.g., Smirnova et al., 2016;Hansen et al., 2002). However, transfer-

215
ring MODIS data to land use categories requires additional information on urban areas, water, snow and ice. Assuming an appropriate representation of these basic surface types, the related information of USGS were also used in the MODIS based classification. If the MODIS categories do not sum up to 100 %, the missing fraction is defined according to the USGS land use categories, if they are non zero. and differences in the underlying approach. For example, the two layer Pleim-Xiu (Pleim and Xiu, 1995;Xiu and Pleim, 2001)  dependency of emitted gases is still under discussion (e.g., Pegoraro et al., 2004;Lavoir et al., 2009;Wu et al., 2015). In this study, the linear decrease of emissions proposed by Guenther et al. (2012) is implemented as reference for all biogenic gases while no dependency is assumed as alternative option ("no SMOIS"). Regarding transport, Berndt (2018)  existing meteorological ensemble appears to be not sufficient for this application. An investigation of sensitivities from the global GFS (Global Forecast System) ensemble from NOAA did not induce significant differences in the simulated boundary layer in this case (not shown).

Effects on Biogenic Emissions
The effects of different model configurations on biogenic emissions of isoprene and aldehyde are given in Fig. 2. Note that 255 biogenic aldehyde emissions from MEGAN 2.1 include total emissions from acetaldehyde and a set of higher aldehydes which are not treated individually (compare Guenther et al., 2012).
In general, biogenic emissions increase significantly after sunrise due to increasing solar radiation. Differences between nighttime (03 UTC) and daytime (09 UTC) emissions are more significant for isoprene than for aldehyde. This is because isoprene is a direct product of photosynthesis which is mainly limited to daytime conditions. For the reference setup ( and "Crop/Woodland Mosaic", respectively. In contrast to "Dryland Cropland and Pasture" in the rest of the valley, broadleaf trees emit high levels of isoprene. Thus, even small numbers of trees result in significantly increased local isoprene emissions.
In these regions, increased biogenc emissions are also found for aldehyde. However, the differences between different land use 265 types remain small compared to isoprene.
The high dependency on tree coverage is emphasized by comparing reference biogenic emissions to emissions based on MODIS land use (Fig. 2, "land  to soil dryness significantly influences biogenic emissions (Fig. 2, "no SMOIS"). By neglecting this response, emissions are considerably larger than for the reference case, especially in the southern part of the domain. As soil moisture decreases after sunrise, the largest sensitivities are found at 09 UTC for both gases.
In contrast, emissions are reduced to almost zero in the south-eastern parts of the Po valley. This reduction is caused by low soil moisture predicted by RUC LSM in the morning hours which results in drought induced plant stress. The combined effect of boundary layer-(PBL) and surface layer (SL) schemes is found to be small for both gases (Fig. 2, "PBL + SL"). Reduced biogenic emissions are predicted by ACM2 PBL + Pleim-Xiu SL compared to the reference using MYJ PBL + Eta SL schemes.

280
While these differences in isoprene are mainly restricted to areas of high emissions in the central Po valley, the reduction is more extended for aldehydes. Only minor changes in biogenic emissions due to microphysics-and radiation schemes are visible (Fig. 2, "microph.", "rad."). Using TGS microphysics instead of the reference WSM6 does only induce small local effects during nighttime (03 UTC). Although being small, effects of using different radiation schemes after sunrise can be attributed to different formulations of shortwave radiation by the Dudhia and RRTMG schemes.  The selection of the microphysics schemes induces only minor local changes in this case (Fig. 3, "microph."). The largest effect 300 of Dudhia shortwave and RRTM longwave radiation schemes appears at 06 UTC, were dry deposition velocities are decreased in the northern part of the domain (Fig. 3, "rad.").

Effects on Pollutant Transport
This section investigates effects on transport of any atmospheric pollutant. Section 4.3.1 evaluates effects on fields of friction velocity forecasted by WRF-ARW. As friction velocities does only provide information on the horizontal wind speeds deter-

Effects on Friction Velocities
In general, friction velocities does not change substantially between night-and daytime but increase in most areas with increasing local instability. For the reference setup in Fig. 4, high values predicted at 03 UTC over the peaks of the Apennines are  north-eastern part of the domain. Friction velocities appear also to be little sensitive to increased roughness length, inducing 315 slightly increased values after sunrise (Fig. 4, "Z0").
The selection of the LSM as well as boundary layer-and surface layer schemes induce significant effects on friction velocities

Effects on Source Regions
Some studies are available that investigate the history of airmasses in the Po valley by the means of long term backward trajectories (Sogacheva et al., 2007;Pernigotti et al., 2012;Sullivan et al., 2016). However, this approach does not account for local effects and related uncertainties in transport and mixing. Here, these effects are analyzed using retroplume calculations for an exemplary airparcel. The selected airmass is located at the position of the Zeppelin observations (44.7 • N , 11.6 • E, "target  to north-western directions due to slow westerly winds. During the last 3 hours before the target time, the airmass converged horizontally by meredional mixing processes. Thus, contributions of this airmass converge from south-western to north-western directions at this time. 5 hours before, the major source of the airmass is found north-west of the target location. During the 340 entire time interval, the vertical extension of source areas remains below 1 km altitude (Fig. 5(c) ). This is caused by low vertical mixing, which is typical for the early morning hours over flat terrain.
By applying different kinds of model configurations, resulting effects on horizontal source regions are shown in Fig. 5(a).
Horizontal source areas start to diverge already during the first hours before the target time. Three hours earlier, significant contributions span from Bologna in the south-west for RUC LSM to the western central Po valley in the north-west for GFS 345 global meteorology. Five hours before, additional differences in transport distance and vertical mixing become visible. Transport distances from the selected target location varies more than a factor of two at this time. For example, source regions for the ACM2 boundary layer-and Pleim-Xiu surface layer schemes almost extend to the western boundary of the domain within 5 hours. This is caused by increased turbulent transport compared to the reference MYJ boundary layer-and Eta surface layer schemes. In contrast, Dudhia and RRTM radiation schemes as well as MODIS land use information indicate slow horizontal airmass and is thus likely to apply also to non biogenic gases.

450
Although the presented results are specific for the conditions in Po valley, it can be claimed that forecasts of biogenic gases would generally benefit substantially from improved representation of the land surface. While improved land use information could be retrieved from more recent satellite products like Sentinel-2, considerably different effects of land surface models for both, biogenic emissions and pollutant transport underline the significance of soil moisture estimates for air quality modeling.
Furthermore, the large amount and complexity of sensitivities found in this study demonstrate the need to account for forecast 455 uncertainties with special focus on biogenic emissions and pollutant transport. This is especially important for model evaluation with observations as well as chemical data assimilation in the PBL. A follow up study will evaluate probabilistic model forecasts of biogenic gases using airborne PEGASOS observations of biogenic gases. In this context, the sensitivities derived in this study provide a basis for estimating forecast uncertainties with respect to different model processes.
Data availability. The model data produced for this study is stored locally at the Rhenish Institute for Environmental Research as well as at