Multi-model ensemble simulations of olive pollen distribution in Europe in 2014: current status and outlook

. The paper presents the ﬁrst modelling experi-ment of the European-scale olive pollen dispersion, analy-ses the quality of the predictions, and outlines the research needs. A 6-model strong ensemble of Copernicus Atmospheric Monitoring Service (CAMS) was run throughout the olive season of 2014, computing the olive pollen distribution. The simulations have been compared with observations in eight countries, which are members of the European Aeroallergen Network (EAN). Analysis was performed for individual models, the ensemble mean and median, and for a dy-namically optimised combination of the ensemble members obtained via fusion of the model predictions with observations. The models, generally reproducing the olive season of 2014, showed noticeable deviations from both observations and each other. In particular, the season was reported to start too early by 8 days, but for some models the error mounted to almost 2 weeks. For the end of the season, the disagree-ment between the models and the observations varied from a nearly perfect match up to 2 weeks too late. A series of sensitivity studies carried out to understand the origin of the dis-agreements revealed the crucial role of ambient temperature and consistency of its representation by the meteorological models and heat-sum-based phenological model. In particu-lar, a simple correction to the heat-sum threshold eliminated the shift of the start of the season but its validity in other years remains to be checked. The short-term features of the concentration time series were reproduced better, suggesting that the precipitation events and cold/warm spells, as well as the large-scale transport, were represented rather well. Ensemble averaging led to more robust results. The best skill scores were obtained with data fusion, which used the previous days’ observations to identify the optimal weighting coefﬁcients of the individual model forecasts. Such combi-nations were tested for the forecasting period up to 4 days and shown to remain nearly optimal throughout the whole period.

Abstract. The paper presents the first modelling experiment of the European-scale olive pollen dispersion, analyses the quality of the predictions, and outlines the research needs. A 6-model strong ensemble of Copernicus Atmospheric Monitoring Service (CAMS) was run throughout the olive season of 2014, computing the olive pollen distribution. The simulations have been compared with observations in eight countries, which are members of the European Aeroallergen Network (EAN). Analysis was performed for individual models, the ensemble mean and median, and for a dynamically optimised combination of the ensemble members obtained via fusion of the model predictions with observations. The models, generally reproducing the olive season of 2014, showed noticeable deviations from both observations and each other. In particular, the season was reported to start too early by 8 days, but for some models the error mounted to almost 2 weeks. For the end of the season, the disagreement between the models and the observations varied from a nearly perfect match up to 2 weeks too late. A series of sensitivity studies carried out to understand the origin of the disagreements revealed the crucial role of ambient temperature and consistency of its representation by the meteorological models and heat-sum-based phenological model. In particular, a simple correction to the heat-sum threshold eliminated the shift of the start of the season but its validity in other years remains to be checked. The short-term features of the concentration time series were reproduced better, suggesting that the precipitation events and cold/warm spells, as well as the large-scale transport, were represented rather well. Ensemble averaging led to more robust results. The best skill scores were obtained with data fusion, which used the previous days' observations to identify the optimal weighting coefficients of the individual model forecasts. Such combinations were tested for the forecasting period up to 4 days and shown to remain nearly optimal throughout the whole period.

Introduction
Biogenic aerosols, such as pollen and spores, constitute a substantial fraction of particulate matter mass in the air during the vegetation flowering season and can have strong health effects, causing allergenic rhinitis and asthma (D'Amato et al., 2007).
Olive is one of the most extensive crops and its oil is one of the major economic resources in southern Europe. The bulk of olive habitation (95 % of the total area worldwide) is concentrated in the Mediterranean basin (Barranco et al., 2008). Andalusia has by far the world's largest area of olive plantations: 62 % of the total olive land of Spain and 15 % of the world's plantations (Gómez et al., 2014).
Olive pollen is also one of the greatest causes of respiratory allergies in the Mediterranean basin (D'Amato et al., 2007), and in Andalusia it is considered the main cause of allergy. In Córdoba (southern Spain), 71-73 % of pollenallergy sufferers are sensitive to olive pollen (Sánchez-Mesa et al., 2005;Cebrino et al., 2017). High rates of sensitization to olive pollen have been documented in Mediterranean countries: 44 % in Spain and 20 % in Portugal (Pereira et al., 2006), 31.8 % in Greece (Gioulekas et al., 2004), 27.5 % in Portugal (Loureiro et al., 2005), 24 % in Italy (Negrini et al., 1992), 21.6 % in Turkey (Kalyoncu et al., 1995), and 15 % in France (Spieksma, 1990). At the same time, relations between allergy and pollen concentrations are person-and casespecific: allergen content of the pollen grains varies from year to year and day to day, as well as the individual sensitivity of allergy sufferers (de Weger et al., 2013;Galan et al., 2013).
Olive is an entomophilous species that presents a secondary anemophily, favoured by the agricultural management during the last few centuries. This tree is very well adapted to the Mediterranean climate and tolerates the high summer and the low winter temperatures, as well as the summer drought, which is characteristic for this climate.
Olive floral phenology is characterised by bud formation during summer, dormancy during autumn, budburst in late winter, and flowering in late spring (Fernandez-Escobar et al., 1992;Galán et al., 2005;García-mozo et al., 2006). Similarly to some other trees, olive flowering intensity shows alternating years with high and low or even no pollen production. The characteristic quasi-biannual cycles are easily visible in the observations (Ben Dhiab et al., 2017;Garcia-Mozo et al., 2014). This cycle, similarly to that of other trees, e.g. birch, is not strict and is frequently interrupted, showing several years with similar flowering intensity (Garcia- Mozo et al., 2014). Such cyclic behaviour is related to the reproductive development, which is completed in two consecutive years. In the first year, the bud vegetative or reproductive character is determined by the current harvest level, since this is the main factor responsible for the interannual variation of flowering. In the second year, after the winter rest, the potentially reproductive buds that have fulfilled their chilling requirements develop into inflorescences (Barranco et al., 2008).
After budbreak, certain biothermic units are required for the development of the inflorescences. Both the onset of the heat accumulation period and the temperature threshold for the number of positive heat units might vary according to the climate of a determined geographical area. The threshold level was also reported to decrease towards the north (Aguilera et al., 2013). Altitude is the topographical factor that most influences olive local phenology and the major weather factors are temperature, rainfall, and solar radiation, which control plant evapotranspiration .
There is substantial variability of the biological characteristics of olive and its responses to environmental stresses. In particular, the allergen content was shown to be strongly different in pollen from different parts of the Iberian Peninsula .
Forecasting efforts of the olive pollen season were mainly concentrated on statistical models predicting the start of the season and peak using various meteorological predictors. The bulk of studies is based on information from one or a few stations within a limited region (e.g. Orlandi et al., 2006;Moriondo et al., 2001;Alba and Diaz De La Guardia, 1998;Frenguelli et al., 1989;Galán et al., 2005;Fornaciari et al., 1998). Several wider-area studies were also undertaken to aim at more general statistical characteristics of the season, e.g. Aguilera et al. (2014Aguilera et al. ( , 2013, Galan et al. (2016).
Numerical modelling of olive pollen transport is very limited. In fact, the only regular regional-scale computations since 2008 were made by the SILAM model (http://silam. fmi.fi), but the methodology was only scarcely outlined in Galan et al. (2013).
The Copernicus Atmospheric Monitoring Service (CAMS; http://atmosphere.copernicus.eu) is one of the services of the EU Copernicus programme, and it addresses various global and regional aspects of atmospheric state and composition. The CAMS European air quality ensemble (Marécal et al., 2015) provides high-resolution forecasts and reanalysis of the atmospheric composition over Europe. Olive pollen is one of the components which is being introduced to the CAMS European ensemble in co-operation with the European Aeroallergen Network (EAN; https://www.polleninfo.org/country-choose.html).
One of the possible ways of improving the quality of model predictions without direct application of data assimilation is to combine them with observations via ensemblebased data fusion methods (Potempski and Galmarini, 2009). Their efficiency has been demonstrated for air quality problems (Johansson et al., 2015 and references therein) and climatological models (Genikhovich et al., 2010), but the technology has never been applied to pollen.
The aim of the current publication is to present the first Europe-wide ensemble-based evaluation of the olive pollen dispersion during the season of 2014. The study followed the approach of the multi-model simulations for birch (Sofiev et al., 2015a) with several amendments reflecting the peculiarity of olive pollen distribution in Europe. We also made further steps towards fusion model predictions and observations and demonstrate their value in the forecasting regime.
The next section will present the participating models and set-up of the simulations, the observation data used for evaluation of the model predictions, an approach for constructing an optimised multi-model ensemble, and a list of sensitivity computations. The Results section will present the outcome of the simulations and the quality scores of the individ-ual models and the ensemble. Section 4 will be dedicated to analysis of the results, considerations of the efficiency of the multi-model ensemble for olive pollen, and identification of the development needs.

Materials and methods
This section presents the regional models used in the study, outlines the olive pollen source term implemented in all of them, and describes pollen observations used for evaluation of the model predictions.

Dispersion models
The dispersion models used in the study comprise the CAMS European ensemble, which is described in detail by Marécal et al. (2015) and Sofiev et al. (2015a). Below, only the model features relevant for the olive pollen atmospheric transport calculations are described.
The ensemble consisted of six models. The EMEP model of EMEP MSC-West (European Monitoring and Evaluation Programme Meteorological Synthesizing Centre -West) is a chemical transport model developed at the Norwegian Meteorological Institute and described in Simpson et al. (2012). It is flexible with respect to the choice of projection and grid resolution. Dry deposition is handled in the lowest model layer. A resistance analogy formulation is used to describe dry deposition of gases, whereas for aerosols the mass-conservative equation is adopted from Venkatram (1978) with the dry deposition velocities dependent on the land-use type. Wet scavenging is dependent on precipitation intensity and is treated differently within and below cloud. The below-cloud scavenging rates for particles are based on Scott (1979). The rates are size-dependent, growing for larger particles.
EURAD-IM (http://www.eurad.uni-koeln.de) is an Eulerian mesoscale chemistry transport model involving advection, diffusion, chemical transformation, wet and dry deposition, and sedimentation of tropospheric trace gases and aerosols (Hass et al., 1995;Memmesheimer et al., 2004). It includes 3D-VAR and 4D-VAR chemical data assimilation (Elbern et al., 2007) and is able to run in nesting mode. The positive definite advection scheme of Bott (1989) is used to solve the advective transport and the aerosol sedimentation. An eddy diffusion approach is applied to parameterise the vertical subgrid-scale turbulent transport (Holtslag and Nieuwstadt, 1986). Dry deposition of aerosol species is treated size-dependent using the resistance model of Petroff and Zhang (2010). Wet deposition of pollen is parameterised according to Baklanov and Sorensen (2001).
LOTOS-EUROS (http://www.lotos-euros.nl/) is an Eulerian chemical transport model (Schaap et al., 2008). The advection scheme follows Walcek and Aleksic (1998). The dry deposition scheme of Zhang et al. (2001) is used to describe the surface uptake of aerosols. Below-cloud scavenging is described using simple scavenging coefficients for particles (Simpson et al., 2003). MATCH (http://www.smhi.se/en/research/research-de partments/air-quality/match-transport-and-chemistry-model -1.6831) is an Eulerian multiscale chemical transport model with mass-conservative transport and diffusion based on a Bott-type advection scheme (Langner et al., 1998;Robertson and Langner, 1999). For olive pollen, dry deposition is mainly treated by sedimentation and a simplified wet scavenging scheme is applied. The temperature sum, which drives pollen emission, is computed offline from January onwards and is fed into the emission module. MOCAGE (http://www.cnrm.meteo.fr/gmgec-old/site_ engl/mocage/mocage_en.html) is a multiscale dispersion model with grid-nesting capability (Josse et al., 2004;Martet et al., 2009). The semi-Lagrangian advection scheme of Williamson and Rasch (1989) is used for the grid-scale transport. The convective transport is based on the parameterisation proposed by Bechtold et al. (2001), whereas the turbulent diffusion follows the parameterisation of Louis (1979). Dry deposition including the sedimentation scheme follows Seinfeld and Pandis (1998). The wet deposition caused by convective and stratiform precipitation is based on Giorgi and Chameides (1986). SILAM (http://silam.fmi.fi) is a meso-to global-scale dispersion model (Sofiev et al., 2015b), also described in the review of Kukkonen et al. (2012). Its dry deposition scheme (Kouznetsov and Sofiev, 2012) is applicable for a wide range of particle sizes including coarse aerosols, which are primarily removed by sedimentation. The wet deposition parameterisation distinguishes between sub-and in-cloud scavenging by both rain and snow (Sofiev et al., 2006). For coarse particles, impaction scavenging, parameterised following Kouznetsov and Sofiev (2012), is dominant below the cloud. The model includes emission modules for six pollen types: birch, olive, grass, ragweed, mugwort, and alder, albeit only birch, ragweed, and grass sources are so far described in the literature Sofiev, 2017;.
Three ENSEMBLE models were generated by (i) the arithmetic average, (ii) the median and (iii) an optimal combination of the six model fields. The average and median were taken on hourly basis, whereas optimisation was applied at daily level following the temporal resolution of the observational data. For the current work, we used a simple linear combination c opt of the models c m , m = 1. . .M, minimising the regularised RMSE J of the optimal field: Here, i, j , k, t are indices along the x, y, z, and time axes, M is the number of models in the ensemble, O is the number of observation stations, τ = {d −k : d 0 } is the time period of k + 1 days covered by the analysis window, start- (2), the first term represents the RMSE of the assimilated period τ , the second term limits the departure of the coefficients from the homogeneous weight distribution, the third one limits the speed of evolution of the a m coefficients in time. The scaling values α and β decide on the strength of regularisation imposed by these two terms.
The ensemble was constructed to mimic the forecasting mode. Firstly, the analysis is made using data from the analysis period τ . The obtained weighting coefficients a i are used over several days from day d 0 : from d 1 to d nf , which constitute the forecasting steps. The performance of the ensemble is evaluated for each length of the forecast, from 1 to n f days.

Olive pollen source term
All models of this study are equipped with the same olive pollen source term, which has not yet been described in the scientific literature. However, it follows the same concept as the birch source  that was used for the birch ensemble simulations (Sofiev et al., 2015a). The formulations and input data are available at http://silam.fmi.fi/ MACC. The main input data set is the annual olive pollen production map based on the ECOCLIMAP data set (Champeaux et al., 2005;Masson et al., 2003), Fig. 1.
ECOCLIMAP incorporates the CORINE land-cover data for most western European countries with olive plantations as an explicit land-use type (CEC, 1993). For Africa and countries missing from CORINE, the empty areas were filled manually, assuming that 10 % of all tree-like land-use types are olives. This way, Tunisian, Egyptian, and Algerian olive plantations were recovered and included in the inventory. In some areas, such as France (Fig. 1), the olive habitat looks unrealistically low, probably because the large olive plantations are rare but the trees are planted in private gardens, city park areas, streets, etc. Since these distributed sources are not reflected in the existing land-use inventories, they are not included in the current pollen production map. Similarly to birch, the flowering description follows the concept of thermal time phenological models and, in particular, the double-threshold air temperature-sum approach of Linkosalo et al. (2010) modified by Sofiev et al. (2012). Within that approach, the heat accumulation starts on a prescribed day in spring (1 January in the current set-upafter Spano et al., 1999;Moriondo et al., 2001;Orlandi et al. 2005a, b) and continues throughout spring. The cutoff daily temperature below which no summation occurs is 0 • C, in contrast to 3.5 • C for birch. It was obtained from the multi-annual fitting of the start of the season. Flowering starts when the accumulated heat reaches the start threshold ( Fig. 2) and continues until the heat reaches the end threshold (in the current set-up, which is equal to the starting threshold +275 degree day). The rate of heat accumulation is the main controlling parameter for pollen emission: the model assumes direct proportionality between the flowering stage and fraction of the heat sum accumulated to date.
Similarly to the birch parameterisation in Sofiev et al. (2012), the model distinguishes between pollen maturation, which is solely controlled by the heat accumulation described above, and pollen release, which depends on other parameters. Higher relative humidity (RH) and rain reduce the release, completely stopping it for RH > 80 % and/or rain > 0.1 mm h −1 . Strong wind promotes it by up to 50 %. Atmospheric turbulence is taken into account via the turbulent velocity scale and thus becomes important only in cases close to free convection. In stable or neutral stratification and calm conditions the release is suppressed by 50 %. The interplay between pollen maturation and release is controlled by an intermediate ready pollen buffer, which is filled in by maturation and emptied by the release flows.
Local-scale variability of flowering requires a probabilistic description of its propagation (Siljamo et al., 2008). In the simplest form, the probability of an individual tree entering the flowering stage can be considered via the uncertainty of the temperature-sum threshold determining the start of flowering for the grid cell -10 % in the current simulations. The end of the season is described via the open-pocket principle: the flowering continues until the initial available amount of pollen is completely released. The uncertainty of this number is taken to be 10 % as well.

Pollen observations
The observations for the model evaluation in 2014 have been provided by the following eight national networks, members of the EAN: Croatia, Greece, France, Hungary, Israel, Italy, Spain, and Turkey. The data were screened for completeness Pollen monitoring was performed with Burkard 7 day and Lanzoni 2000 pollen traps based on the Hirst design (Hirst, 1952). The pollen grains were collected at an airflow rate of 10 L min −1 . The observations covered the period from March to September, with some variations between the countries. Daily pollen concentrations were used. Following the EAS-EAN requirements (Galán et al., 2014;Jäger et al., 1995), most samplers were located at heights between 10 and 30 m on the roofs of suitable buildings. The places were frequently downtown of the cities, i.e. largely representing the urban background conditions (but not always). With regard to microscopic analysis, the EAS-EAN requirement is to count at least 10 % of the sample using horizontal or vertical strips (Galán et al., 2014). The actual procedures vary between the countries but generally comply. The counting in 2014 was mainly along four horizontal traverses as suggested by Mandrioli et al. (1998). In all cases, the data were expressed as mean daily concentrations (pollen m −3 ).

Set-up of the simulations
Simulations followed the standards of the CAMS European ensemble (Marécal et al., 2015). The domain spanned from 25 • W to 45 • E and from 30 to 70 • N. Each of the six mod-els was run with its own horizontal and vertical resolutions, which varied from 0.1 to 0.25 • of the horizontal grid cell size and had from 3 up to 52 vertical layers within the troposphere (Table 1). This range of resolutions is not designed to reproduce local aspects of pollen distribution; instead it covers the whole continent and describes large-scale transport. The 10 km grid cells reach the sub-city scale but are still insufficient to resolve the valleys and individual mountain ridges. The limited number of vertical dispersion layers used by some models is a compromise, allowing for high horizontal resolution. Thick layers are not a major limitation as long as the full vertical resolution of the input meteorological data is used for evaluation of dispersion parameters (Sofiev, 2002).
The simulations were made retrospectively for the season of 2014, from 1 January (the beginning of the heat-sum accumulation) to 30 June. All models produced hourly output maps with concentrations at eight vertical levels (near the surface, 50, 250, 500, 1000, 2000, 3000 and 5000 m above the surface), as well as dry and wet deposition maps.
All models considered pollen as an inert water-insoluble particle, 28 µm in diameter, and with a density of 800 kg m −3 .
3 Results of the pollen season of 2014

Observed peculiarities of the season
At the French Mediterranean stations (Aix-en-Provence, Avignon, Montpellier, Nice, Nîmes, and Toulon), the mean value of the SPI in 2014 was quite similar to that of 2012 but lower than that in 2013 (see de Weger et al., 2013 for the SPI relevance to allergy). The start of the pollen season was earlier than in the previous 5 years. The duration of the season had been the longest one in Aix-en-Provence, Nice, and Nîmes since 2010. At the Ajaccio (Corsica) station, the SPI was higher in 2014 than at other stations, similarly to the situation in 2012.
In Andalusia, 2014 was the second warmest year of the last few decades but was more humid than usual, at 5 % above the typical relative humidity level (https://www.ncdc.noaa.gov/ sotc/global/201413). However, after an intense olive flowering period in 2013, in 2014 the flowering intensity was lower and similar to 2012, in agreement with the biannual alterations of the season severity.
In northern Italy, the 2014 olive pollen season was less intense than the average of the previous 10 years (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013). In contrast, in southern Italy, the 2014 season was more intense in the first part and less intense in the second part (after the beginning of June) than during previous seasons. No differences were noted with respect to the start and the end of the season in both cases.
In Thessaloniki, Greece, in 2014, the pollen season started at the same time as in the last few decades (first half of April) but ended about 1.5 months later (last half of October). The pollen season peak has been steady in May. The SPI was considerably higher in 2014 (418 pollen day m −3 ) compared to the previous 2 years (approximately 300 pollen day m −3 ). The overall shape of the pollen season in 2014 resembled that of the previous decade, but with a multimodal and smoother pattern.

Model results
The total seasonal olive pollen load (Figs. 3 and 4) expectedly correlates with the map of olive plantations (Fig. 1), which is also confirmed by the observations (Fig. 3). The highest load is predicted over Spain and Portugal, whereas the level in the eastern Mediterranean is not so high reflecting the smaller size of the areas covered by the olive trees and limited long-range transport over the Mediterranean. The model predictions differ up to a factor of 2-4 ( Fig. 4), reflecting the diversity of modelling approaches, especially the deposition and vertical diffusion parameterisations (see Table 1 and Sect. 3.1).
Since the olive plantations are located within a comparatively narrow climatic range, flowering propagates through the whole region within a few weeks, starting from the coastal bands and progressing inland (not shown).
Hot weather during the flowering season leads to strong vertical mixing and a deep atmospheric boundary layer (ABL), which in turn promotes the pollen dispersion. As seen from Fig. 5, the pollen plumes can extend over the whole Mediterranean and episodically affect central Europe. Figs. 4 and 5 illustrate the differences between the models, e.g. substantially higher concentrations reported by EURAD-IM and MOCAGE compared to other models. The shortest transport with the fastest deposition is manifested by LOTOS-EUROS (also showing the lowest concentrations), while the longest one is suggested by MOCAGE.
The most important general parameters describing the season timing are its start and end (Fig. 6). Following Andersen (1991), these dates are computed as dates on which 5 and 95 % of the SPI are reached.
Computations of the model-measurement comparison statistics face the problem of non-stationarity and nonnormal distribution of the daily pollen concentrations (Ritenberga et al., 2016). For such processes, the usual nonparametric statistics have to be treated with a great amount  of care, since their basic assumptions are violated. Nevertheless, they can be formally calculated for both individual models and the ensemble (Figs. 7 and 8). The main characteristic of the ensemble, the discrete rank histogram, and the distribution of the modelled values for the below-detectionlimit observations (Fig. 9) show that the spread of the obtained ensemble is somewhat too narrow in comparison with Atmos. Chem. Phys., 17, 12341-12360, 2017 www.atmos-chem-phys.net/17/12341/2017/ the dynamic range of the observations. The same limitation was noticed for the birch ensemble. The patterns in Figs. 6 and 7 reveal a systematic early bias of the predicted start and end of the season, which is easily seen from normalised cumulative concentration time series (Fig. 10). This bias is nearly identical for all models, except for EURAD-IM, which also shows a higher correlation coefficient than other models. The reasons for the problem and for the diversity of the model response are discussed in the next section.

Discussion
In this section, we consider the key season parameters and the ability of the presented ensemble to reproduce them (Sect. 4.1), the added value of the multi-model ensembles, including the optimised ensemble (Sect. 4.2), the main uncertainties that limit the model scores (Sect. 4.3), and the key challenges for future studies (Sect. 4.4).

Forecast quality: model predictions for the key season parameters
The key date of the pollen season is its start: this very date refers to adaptation measures that need to be taken by al-lergy sufferers. Predicting this date for olives is a significantly greater challenge than, for example, for birches: the heat sum has to be accumulated from 1 January with the season onset being in mid-April, whereas for birches the dates are 1 March and mid-March, respectively. As a result, the prediction of the start of the olive season strongly depends on the temperature predictions by the weather prediction model and the way this temperature is integrated into the heat sum. Inconsistency between these factors over the period of almost 4 months, even if small, can easily lead to a week of error. As one can see from Figs. 7 and 8, there is a systematic albeit spatially inhomogeneous bias of all models by up to 10 days (too early season). The exception is the SILAMos150 sensitivity run, which used the higher heat-sum threshold, by 150 degree days (∼ 10 %), than the standard level (Fig. 2). No other sensitivity runs, including the simulations driven by ERA-Interim fields, showed any significant improvement of this parameter. Importantly, EURAD-IM, which is driven by WRF meteorology fields, also showed a similar bias. Finally, the shift varies among the stations from near-zero (France, some sites in Italy, Croatia, Greece, and Israel) up to almost The end of the season showed an intriguing picture: EURAD-IM, despite starting the season as early as all other models, ends it 2 days too late instead of 5 days too early as all other models (see examples for two stations in Fig. 10). This indicates that WRF, in late spring, predicts a lower temperature than IFS, which leads to a longer-than-observed season in the EURAD-IM predictions. Other models showed the correct season length and, due to initial early bias, end it a few days too early. The de-biased run SILAMos150 shows an almost perfect shape and has within a 1 day accuracy for the start and end, which supports a 250 degree day as a season length parameter. The most divergent model predictions are shown for the absolute concentrations (Fig. 8). With the mean observed April-June concentration of 35 pollen m −3 the range of predictions spans over a factor of four: EURAD-IM and MOCAGE being two times as high and EMEP and LOTOS-EUROS two time as low. Shifting the season by 5 days in the SILAMos150 run also changes the model bias, reflecting differences in the transport patterns and the impact of stronger vertical mixing in later spring. Spatially, the bias is quite homogeneous, except for southern Spain, where a heterogeneous pattern is controlled by local conditions at each specific site (Fig. 7).
Temporal correlation is generally high in coastal areas ( Fig. 7) but at or below 0.5 at terrestrial stations on the Iberian Peninsula (the main olive plantations). This is primarily caused by the shifted season: the simulations with more accurate seasons showed the highest correlations among all models with ∼ 60 % of sites having a significant correlation (p < 0.01, Fig. 8).
Comparison with local statistical models made for single or a few closely located stations expectedly shows that local models are usually comparable to but somewhat more accurate (at their locations) than the European-scale dispersion models; also see discussion in Ritenberga et al. (2016). Thus, (Galan et al., 2001) analysed the performance of three popular local models for Córdoba, with the best one showing the mean error of 4.7 days at the start of the season but reaching up to 14 days in some years. A similar error was found for Andalusia (Galán et al., 2005) and two sites (Perugia and Ascoli Piceno) in Italy (Frenguelli et al., 1989) -4.8 and 4.33 days of the standard error, respectively. A recent study (Aguilera et al., 2014) constructed three independent statistical models for Spain, Italy, and Tunisia and ended up with over 5 days of standard error for the Mediterranean. In another study, the authors admitted the scale of the challenges: "The specific moment for the onset of the olive heat accumulation period is difficult to determine and has essentially remained unknown" (Aguilera et al., 2013).
One of the strengths of continental-scale dispersion models is their ability to predict long-range transport events. However, a direct evaluation of this feature for olive pollen is difficult, since countries without olive plantations usually do not count its pollen. One can, however, refer to Fig. 3 (zoomed map of Spain), which shows that the ensemble successfully reproduces the drastic change of the SPI from nearly 10 5 pollen day m −3 in the south of Spain down to less than 100 pollen day m −3 in the north. Episode-wise, an example of a well-articulated case of olive pollen transport from Italy to Hungary in 2016 was brought up by Udvardy et al. (2017), who analysed it with adjoining SILAM simulations. The episode was also predicted well by the forward computations.

Ensemble added value
Arguably the main uncertainty of the model predictions was caused by the shift of the season start and end -the parameters were heavily controlled by temperature, i.e. least affected by transport features of the models. As a result, application of the "simple" ensemble technologies does not lead to a strong improvement. Some effect was still noticed, but it was less significant than in case of birch or traditional AQ forecasting. Therefore, in this section we also consider the possibility of ensemble-based fusion of the observational data with the model predictions. All ensembles were based on operational models; i.e. the SILAMos150 run was not included in either of them.

Mean ensembles: arithmetic average and median
Considering the mean-ensemble statistics, one should keep in mind that both the meteorological driver and the source term parameterisation were the same for all models (except for EURAD driven by WRF). This resulted in the underrepresentative ensemble (Fig. 9), for which several good and bad   features visible in all models propagate to the mean ensembles.
Among the simple means, the arithmetic average performed better than the median, largely owing to strong EURAD-IM impact. That model overestimated the concentrations and introduced a powerful push towards an extended season, thus offsetting the early bias of the other models. Since the median largely ignored this push, its performance was closer to that of other models. Nevertheless, both the mean and median demonstrated low RMSE, the median being marginally better.

Fusing the model predictions and observations
into an optimised ensemble: gain in the analysis and predictive capacity Developing the ensemble technology further, here we present the first attempt of fusion of the observational data with the multi-model ensemble for olive pollen. In Sect. 3.1, Eq.
(2) requires three parameters: the regularisation scaling parameters α and β, and length of the assimilation window T . For the purposes of the current feasibility study, several values for each of the parameters were tested and the robust performance of the ensemble was confirmed with very modest regularisation strength and for all considered lengths of the analysis window from 1 to 15 days. Finally, α = 0.1, β = 0.1, T = 5 days were selected for the example below as a compromise between the smoothness of the coefficients, regularisation strength, and the optimisation efficiency over the assimilation window.
The optimised ensemble showed (Fig. 11a) that each of the six models had a substantial contribution over certain parts of the period. Over some times, e.g. during the first half of May, only one or two models were used, other coefficients being put to zero, whereas closer to the end of the month, all models were involved. Finally, prior to and after the main season, concentrations were very low and noisy, so the regularisation terms of Eq. (2) took over and pushed the weights to an a priori value of 1/6. The bulk of the improvements came in the first half of the season (Fig. 11b). After the third peak in the middle of May, the effect of assimilation becomes small and the optimisation tends to use the intercept to meet the mean value, whereas the model predictions become small and essentially uncorrelated with the observations. This corroborates with the observed 8day shift of the season, which fades out faster in the models than in the observed time series (Fig. 10).
There was little reduction in the predictive capacity of the optimised ensemble when going out of assimilation window and towards the forecasts. In essence, only the first peak of the concentrations (and RMSE) is better off with shorter forecasts. For the rest of the season (before and after the peak) the 5-day assimilation window led to a robust combination of the models that stayed nearly optimal over the next 5 days.
Comparison with other forecasts expectedly shows that the optimised ensemble not only has significantly better skills than any of the individual models, but is up to 25-30 % better than the mean and median of the ensemble (Fig. 11b). A stronger competitor was the "persistence forecast" when the next-day concentrations are predicted to be equal to the last observed daily value. The 1-day persistence appeared to be the best possible "forecast", which shows an RMSE at the beginning of May that is two times lower than in the 1-day forecast of the optimal ensemble (Fig. 11c). However, 2-day persistence forecast had about the same RMSE as the ensemble, and 3-and 4-day predictions were poor.
The strong performance of the 1-day persistence forecast is not surprising and, with the current standards of the pollen observations, has no practical value: the data are always late by more than 1 day (counting can start only next morning and become available about midday). The second problem of the persistence forecast is that it needs actual data; i.e. the scarcity of pollen network limits its coverage. Thirdly, persistence loses its skills very fast: the day +2 forecast has no superiority over the optimal ensemble, whereas day +3 and +4 persistence-based predictions are useless. Finally, at local scale, state-of-the-art statistical models can outperform it -see discussion in Ritenberga et al. (2016).
One should, however, point out that the 1-day predicting power of the persistence forecast (or more sophisticated statistical models based on it) can be a strong argument for the future real-time online pollen monitoring. It's delay can be as short as 1 h (Crouzy et al., 2016;Oteros et al., 2015b). These data have good potential to be used for next-day predictions for the vicinity of the monitor.

Sensitivity of the simulations to model and source term parameters
The above-presented results show that arguably the most significant uncertainty was due to shifting the start and the end of the season. It originated from the long heat-sum accumulation (since 1 January), where even a small systematic difference between the meteorology driving the multi-annual fitting simulations and that used for operational forecasts causes a significant season shift by late spring. In some areas, the resolution of the NWP model plays a role as well: the complex terrain in the north of Spain and in Italy requires dense grids with which to resolve the valleys. Other possible sources of uncertainties might need attention.
To understand the importance of some key parameters, a series of perturbed runs of SILAM was made: Figure 12. Sensitivity of optimised ensemble to the length of assimilation window. Upper row: optimal weights of the individual models and ensemble score over the 1-(a) and 15-(b) day-long assimilation windows; lower row: RMSE of the of individual models and the optimal ensemble forecasts against those of individual models. Note different time axes. Forecasts are available earlier for the 1-day analysis window.
-os100 and os150 runs with the season starting threshold increased by 100 and 150 degree days (the os150 run is referred in the above discussion as SILAMos150), -ERA run with ERA-Interim meteorological fields, which were used for the source parameters fitting, series of three runs with reduced vertical mixing within the ABL and the free troposphere, smlpoll run with 20 µm size of the pollen grain, -smlpoll_coarse run with 20 µm pollen size and coarse computational grid (0.2 • × 0.2 • ).
The ERA simulations with ERA-Interim reduced the shift of the start of the season by 2 days but increased the shift of the end by 3 days, i.e. making the season shorter by 5 days. At the same time, the os150 run showed that a sim-ple increase of the heat-sum threshold by ∼ 10 % (150 degree days) essentially eliminates the mean shift for 2014, but it remains unclear whether this adjustment is valid for other years.
Variations of the mixing parameterisation (perturbing the formula for the K z eddy diffusivity) did not lead to significant changes: all scores stayed within 10 % of the reference SILAM simulations.
Evaluation of the impact of deposition parameterisations was more difficult since they are model specific. Higher deposition intensity causes both reduction of the transport distance and absolute concentrations. This issue might be behind the low values reported by LOTOS-EUROS and, conversely, the high concentrations of EURAD-IM and MOCAGE. Its importance was confirmed by the SILAM sensitivity simulations with smaller pollen size, smlpoll, and smlpoll_coarse. Both runs resulted in the mean concentra-tions more than doubling but with a marginal effect on temporal correlation. They also differed slightly from each other.
Variations of the fusion parameters showed a certain effect. For a short averaging window (5 days or less), the variations of weighting coefficients increased and the time series became noisier (Fig. 12). On return, the correlation increased almost up to 0.8-0.9 for some analysis intervals, but stayed the same for other periods. Also, the 1-day forecast RMSE decreased for some days but little difference was found for longer predictions.

Main challenges for the future studies
The current study is the first application of numerical models to olive pollen dispersion in Europe. One of its objectives was to identify the most pressing limitations of the current approach and the extent to which the ensemble and data fusion technologies can help in improving the forecasts.
The most evident issue highlighted by the exercise is the shift of the pollen season in some key regions, which is similar in all models, suggesting some unresolved inconsistencies between the heat-sum calculations of the source term and the features of the temperature predictions by the weather model. The issue suggests some factor(s) currently not included or misinterpreted in the source term. One of the candidate processes is the chilling-sum accumulation suggested by some studies, e.g. Aguilera et al. (2014). A switch to different types of phenological models with genetic differentiation of the populations following Chuine and Belmonte (2004) is another promising option.
The second issue refers to the underestimation of the pollen concentration in France, which probably originates from a comparatively large number of olive trees spread, for example, in private gardens but not accounted for in the agricultural maps of olive plantations.
The third set of questions refers to the pollen load prediction, i.e. the possibility of forecasting the overall season severity before it starts. Several statistical models have been presented in the literature, e.g. Ben Dhiab et al. (2017) for total annual load and Chuine and Belmonte (2004) for relative load. Their evaluation and implementation in the context of dispersion models is important.
An issue, mostly addressing the long-term horizon rather than the short-term forecasts, is the validity of the developed models in the conditions of changing climate. The models have to be robust to the trends in meteorological forcing. Purely statistical models are among the most vulnerable in this respect, because they just quantify the apparent correlations observed under certain conditions but do not explore the processes behind these relations.
Finally, the first steps towards the ensemble-based fusion of the model forecasts and pollen observations showed a strong positive effect. Further development of these techniques combined with progress towards near-real-time pollen data has very high potential for improving the forecasts.

Summary
An ensemble of six CAMS models was run through the olive flowering season of 2014 and compared with observational data of eight countries of European Aeroallergen Network (EAN). The simulations showed a decent level of reproduction of the short-term phenomena but also demonstrated a shift of the whole season by about 8 days (∼ 20 % of the overall pollination period). An ad-hoc adjustment of the starting heatsum threshold by ∼ 10 % (150 degree days) on average resolves the issue and strongly improves the model skills, but its regional features and validity for other years and meteorological drivers remain unclear.
The ensemble members showed quite diverse pictures, demonstrating substantial variability, especially in areas remote from the main olive plantations. Nevertheless, the observation rank histogram still suggested a certain understatement of the ensemble variability in comparison with the observations. This partly originates from the synchronised source-term formulations and meteorological input used by all but one model.
Simple ensemble treatments, such as the arithmetic average and median, resulted in a more robust performance, but they did not outrun the best models over significant parts of the season. The arithmetic average turned out to be better than the median.
A data-fusion approach, which creates the optimalensemble model using the observations over preceding days for an optimal combination of the ensemble members, is suggested and evaluated. It was based on an optimal linear combination of the individual ensemble members and showed strong skills, routinely outperforming all individual models and simple ensemble approaches. It also showed strong forecasting skills, which allowed an application of the past-time model weighting coefficients over several days in the future. The only approach outperforming this fusion ensemble was the 1-day persistence-based forecast, which has no practical value due to the manual pollen observations and limited network density. It can, however, be used in the future when reliable online pollen observations become available.
A series of sensitivity simulations highlighted the importance of a meteorological driver, especially its temperature representation, and deposition mechanisms. The data fusion procedure was quite robust with regard to analysis window, still requiring 5-7 days to eliminate the noise in the model weighting coefficients.
Data availability. The model simulations presented in the paper are freely available on-request from the FMI team. The archive of 1.1 TB size is stored in the long-term FMI tape archive.