The work here complements the overview analysis of the modelling systems participating in the third phase of the Air Quality Model Evaluation International Initiative (AQMEII3) by focusing on the performance for hourly surface ozone by two modelling systems, Chimere for Europe and CMAQ for North America.

The evaluation strategy outlined in the course of the three phases of the AQMEII activity, aimed to build up a diagnostic methodology for model evaluation, is pursued here and novel diagnostic methods are proposed. In addition to evaluating the “base case” simulation in which all model components are configured in their standard mode, the analysis also makes use of sensitivity simulations in which the models have been applied by altering and/or zeroing lateral boundary conditions, emissions of anthropogenic precursors, and ozone dry deposition.

To help understand of the causes of model deficiencies, the error components
(bias, variance, and covariance) of the base case and of the sensitivity
runs are analysed in conjunction with timescale considerations and error
modelling using the available error fields of temperature, wind speed, and
NO

The results reveal the effectiveness and diagnostic power of the methods
devised (which remains the main scope of this study), allowing the detection
of the timescale and the fields that the two models are most sensitive to.
The representation of planetary boundary layer (PBL) dynamics is pivotal to
both models. In particular, (i) the fluctuations slower than

The vast majority of the research and applications related to the evaluation of geophysical models make use of aggregate statistical metrics to quantify, in some averaged sense, the properties of the residuals obtained from juxtaposing observations and modelled output (typically time series of the variable of interest). This practice is rooted in linear regression analysis and the assumption of normally distributed residuals and has been proven to be reliable when dealing with simple, deterministic, and low-order models. Led by the rapid pace of improved understanding of the underlying physics, the paradigm is, however, changed nowadays in that models have grown in complexity and non-linear interactions and require more powerful and direct diagnostic methods (Wagener and Gupta, 2005; Gupta et al., 2008; Dennis et al., 2010; Solazzo and Galmarini, 2016).

Evaluation of geophysical models is typically carried out under the theoretical umbrella proposed by Murphy in the early 1990s for assessing the dimensions of goodness of a forecast: consistency (“the correspondence between forecasters' judgments and their forecasts”), quality (“the correspondence between the forecasts and the matching observations”), and value (“the incremental benefits realised by decision makers through the use of the forecasts”) (Murphy, 1993). Since 2010, the Air Quality Model Evaluation International Initiative (AQMEII, Rao et al., 2011) has focused on the quality dimension – the one most relevant to science, according to Weijs et al. (2010) – of air quality model hindcast products, aiming to build an evaluation strategy that is informative for modellers as well as to users.

Our claim is that the

Operational metrics usually employed in air quality evaluation (see Simon et al., 2012, for a review) have several limitations as summarised by Tian et al. (2016): interdependence (they are related to each other and are redundant in the type of information they provide), underdetermination (they do not describe unique error features), and incompleteness (how many of these metrics are required to fully characterise the error?). Furthermore, they do not help to determine the quality problem set above in terms of diagnostic power. Gauging (average) model performance through model-to-observation distance leaves open several questions such as (a) how much information is contained in the error? In other words, what remains wrong with our underlying hypothesis and modelling practice? (b) Is the model providing the correct response for the correct reason? (c) What is the degree of complexity of the system models can actually match? These questions have a straightforward, very practical impact on the use of models, the return they provide (the value), and their credibility. Answers to these questions are also relevant to the widespread practice of bias correction, which aims to adjust the model value to the observed value rather than correct the causes of the bias which might stem from systematic, cumulative errors.

The main aims of this study are to move towards tools devised to enable
diagnostic interpretation of model errors, following the approach of Gupta et
al. (2008, 2009), Solazzo and Galmarini (2016), and Kioutsioukis et
al. (2016), and to advance the evaluation strategy outlined in the course of
the three phases of AQMEII. In particular, the work presented here is meant
to complement the overview analysis of the modelling systems participating in
AQMEII3 (summarised by Solazzo et al., 2017) by concentrating on the
performance for surface ozone modelled by two modelling systems: Chimere for
Europe (EU) and CMAQ for North America (NA). This study attempts to

identify the timescales (or frequencies) of the error of modelled ozone;

attribute each type of error to processes by utilising modelling runs with modified fluxes at the boundaries (anthropogenic emissions and deposition at the surface and boundary conditions at the bounding planes of the domain) and breaking down the mean square error (MSE) into bias, variance, and covariance – this analysis allows us to diagnose the quality of error and to determine whether it is caused by external conditions or due to missing or biased parameterisations or process representations;

investigate the periodicity of the ozone error which can be symptomatic of recursive (either casual or systematic) model deficiencies;

determine the role of the error of precursor or meteorological fields in explaining the ozone error. The significance (or the non-significance) of a correlation between the ozone error and that of one of the explanatory variables can help to understand the impact (or lack of impact) of the latter on the ozone error as well as the timescale of the process(es) causing the error.

The data, model features, and error decomposition methodology are summarised in Sect. 2. Results of the aggregate time series and error decomposition analyses are presented in Sect. 3 and results of the diagnostic error investigation through wavelet, autocorrelation, and multiple regression analysis are presented in Sect. 4. Discussion, conclusions, and final remarks are drawn in Sects. 5 and 6.

Unless otherwise specified, analyses are carried out and results are
presented for the rural receptors of three subregions over each continental
area as shown in Fig. 1. The three subregions have been selected based on
similarity analysis of the observed ozone fluctuations slower than

Continental domains and subregions used for analysis. The networks of ozone receptors are also shown.

The stations used for the analysis are part of European (European
Monitoring and Evaluation Programme: EMEP;

Following the approach used in previous AQMEII investigations, modelled
hourly concentrations in the lowest model layer (

For the analyses conducted in this study, the spatial average of the observed
and modelled ozone time series has been carried out prior to any time
aggregation; i.e. the spatial average is created by averaging the hourly
values over all rural stations in each region. Missing values in the time
series, prior to the spatial averaging, have not been imputed. The analysis
is restricted to stations with a data completeness percentage above 75 %
and located below 1000 m above sea level. Time series with more than 335
consecutive missing records (14 days) have been also discarded. The number of
rural receptors

The configuration of the CMAQ and Chimere modelling systems for AQMEII3 is extensively discussed in Solazzo et al. (2017) with respect to resolution, parameterisations, and inputs of emissions, meteorology, land use, and boundary conditions. For completeness a short summary is provided hereafter.

The CMAQ model (Byun and Schere, 2006) is configured with a horizontal grid spacing of 12 km and 35 vertical layers (up to 50 hPa) and uses the widely applied CB05-TUCL chemical mechanism (carbon bond mechanism; Whitten et al., 2010) for the representation of gas-phase chemistry. Emissions from natural sources are calculated by the Biogenic Emissions Inventory System (BEIS) model. The meteorology is calculated by the Weather Research and Forecast (WRF) model (Skamarock et al., 2008) with nudging of temperature, wind, and humidity above the planetary boundary layer (PBL) height. In CMAQ, dry deposition is used as a flux boundary condition for the vertical diffusion equation. A review of CMAQ dry deposition model as well as other approaches is provided in Pleim and Ran (2011).

Chimere (Menut et al., 2013) is configured with a grid of 0.25

Both models are widely used worldwide in a range of applications such as scenario analysis, forecasting, ensemble modelling, and model intercomparison studies.

The Chimere and CMAQ models have been used to perform a series of sensitivity simulations aiming for a better understanding of the causes of differences between the base model simulations and observed data. In particular, the following set of sensitivity runs was performed:

Average monthly (column

One annual run with zeroed anthropogenic emissions provided an indication of the amount of regional ozone due to boundary conditions and biogenic emissions (referred to as “zero emi”).

One annual run with a constant value of ozone (zero for NA and 35 ppb for EU) at the lateral boundaries of the model domain provided an indication of amount of ozone formed due to anthropogenic and biogenic emissions within the domain (in addition to the constant value for EU) (referred to as “zero BC” and “const BC”). All species other than ozone had boundary condition values of zero for both NA and EU in these sensitivity simulations.

One annual run was performed where the anthropogenic emissions are reduced by 20 %. In addition, the boundary conditions for this run were prepared from a C-IFS simulation (detail in Galmarini et al., 2017, and references therein) in which global anthropogenic emissions were also reduced by 20 % (referred to as a “20 % red”).

One run with ozone dry deposition velocity set to zero was available for the months of January and July (referred to as “zero dep”).

To aid diagnostic interpretation, the mean square (or quadratic) error
(MSE

Average monthly (column

The MSE is a quadratic, parametric metric widely applied in many contexts and occurs because the model does not account for information that could produce a more accurate estimate. Put in an information theory context, the MSE provides a measure of the information about the observation that is missing from a Gaussian model centred at a deterministic prediction (Nearing et al., 2016). Ideally, the deviation of a perfect model from the observation should be zero or simply white noise (uncorrelated, zero mean, constant variance). Various flavours of MSE decomposition have been exploited in several geophysical contexts (Enthekabi, et al., 2010; Murphy, 1988; Wilks, 2011; Wilmott, 1981; Gupta et al., 2009), all stemming from the consideration that the bias, the variance, and the covariance characterise different (although not complementary and not exhaustive) properties of the error – accuracy, precision, and correspondence, respectively.

The relative contribution of each of the MSE components to the overall MSE is
summarised by the Theil coefficients (Theil, 1961):

Figures 2 and 3 show monthly and diurnal curves for the base and sensitivity simulations over the three subregions in each continent. Results show that the monthly averaged curves of the zeroed emission runs peak in April in NA and in July in EU (May to July in EU1 are approximately the same), indicating the periods when the impact of background concentration (boundary conditions) and biogenic emissions on regional ozone is largest: springtime in NA and summer in EU. The monthly curves of “zero BC” and “zero emi” for NA are anticorrelated between the months of April to July–August (“zero emi” curve decreasing and “zero BC” curve raising) and during autumn (“zero emi” curve rising and “zero BC” curve decreasing), framing the interplay among these two factors in terms of total ozone loading: boundary conditions dominate in autumn–winter and biogenic plus anthropogenic emissions are more important during spring–summer. The springtime peak for the zero emissions case over NA is consistent with the springtime peak in northern hemispheric background ozone (Penkett and Brice, 1986; Logan, 1999) and the predominant westerly and north-westerly inflow into the NA domain. The background ozone springtime peak is thought to be caused by a combination of more frequent tropospheric–stratospheric exchange and in-situ photochemical production during that season (Atlas et al., 2003).

MSE decomposition for June–August hourly ozone into bias,
variance, and covariance for the three North American (NA) subregions. Results are presented
separately for daylight hours

The daily averaged profiles of mean ozone for NA show that the observed peak
(occurring between 16:00–18:00 LT in NA1 and NA2 and

The shape of the “zero BC” curve is similar in amplitude to that of the base run, suggesting that the effect of the regional/background ozone represented through boundary conditions in a limited area model is mainly to shift the mean concentration upwards, while it has no major effect on the frequency modulation. By contrast, the absence of anthropogenic emissions has a major effect on the amplitude of the signal as well as its magnitude (“zero emi” curve). As discussed in the next section, these considerations translate into the bias and/or variance type of error due to the boundary conditions and emissions.

As for EU (Fig. 3), the observed daily profiles in EU1 and EU2 are closely matched by the Chimere model between 11:00 and 23:00 LT (underestimated outside these hours), while in EU3 the daily peak (observed at 19:00–20:00 LT) is consistently occurring earlier in the model and its magnitude is overestimated. The morning transition occurs earlier in the model than the observations and follows a significant model underprediction of nighttime and early morning ozone due to difficulties in reproducing stable or near-stable conditions (Bessagnet et al., 2016). In EU3, the model displays the poorest performance, with significant underestimation between midnight and 09:00 LT (5–7 ppb) and overestimation in daylight conditions (7–9 ppb).

MSE decomposition for June–August hourly ozone into bias,
variance, and covariance for the three EU subregions (the zero dep data
refers to the month of July only). Results are presented separately for
daylight hours

As opposed to the CMAQ case for NA, the shape of the “zero emi” curve of Chimere closely follows the shape that of the base case (even when considering only the stations classified as “urban”; Fig. S2 in the Supplement). Due to the long time average (1 year), the daily profiles displayed in Figs. 2 and 3 do not provide information about the exact timing of the minima and maxima for each season throughout the year. Figures S3 and S4 report the seasonal average diurnal profiles for the model predictions and the observations (network average over all stations) and show that the timing of the ozone diurnal cycle varies seasonally.

The plots in Figs. 4 (NA) and 5 (EU) show the MSE decomposition according to Eq. (1) for the summer months of June, July, and August for the base case simulation as well as the sensitivity simulations, distinguishing between daylight (from to 05:00 to 09:00 LT) and nighttime hours (the remaining hours, from 10:00 to 04:00 LT). These plots are meant to aid the understanding of the relative impacts of potential errors in lateral boundary conditions, anthropogenic emissions, and the representation of ozone dry deposition on the total model error by comparing the magnitude and type of model error from these simulations against the model error for the base case.

CMAQ MSE breakdown for summer and winter for the base case (hourly
time series of ozone) over NA. The error coefficients

The plots in Figs. 6 to 15 are complementary to Figs. 4 and 5 and show the
error decomposition for both the summer and winter season in more detail,
including the error coefficients

As in Fig. 6 for the hourly time series of “20 % reduction” scenario.

As in Fig. 6 for the hourly time series of “zeroed anthropogenic emissions” scenario.

As in Fig. 6 for the hourly time series of the “zeroed boundary conditions” scenario.

As in Fig. 6 for the rolling average daily maximum 8 h ozone time series.

The CMAQ results for NA are presented in Figs. 4, 6–10, and 16 and can be
summarised as follows:

The MSE of the base case (MSE

The effect of zeroing the emissions of anthropogenic pollutants on the
summer MSE is a rise by a factor

Furthermore, all the error components deteriorate in the simulations with zero anthropogenic emissions except for the bias in NA3. This is particularly true for the variance, signifying the fundamental role of emissions in shaping the diurnal variation of ozone. Indeed, this suggests that the absence of a variance error in the base case (see above) is due to the correct interplay between the temporal/spatial distribution of the emissions, potentially coupled with the variability due to the meteorology.

The covariance share of the error also increases (although only slightly in NA2) for the zero emissions case, indicating that the emissions play a role in determining the timing of the modelled diurnal ozone signal; this increase is more pronounced during nighttime.

The zeroing of the input of ozone from the lateral boundaries has either no effect or only a limited effect (e.g. daylight summer in NA2; Fig. 4) on the variance and covariance shares of the error, while it has a profound impact on the bias portion. This impact is approximately equal during daylight and nighttime, as expected from the discussion of the daily cycle shown in Fig. 2.

The removal of ozone dry deposition from the model simulations (results based on July only) has the most profound impact, increasing by 1 order of magnitude the MSE of the base case, which is approximately double the combined effect of the emissions and boundary conditions perturbation. This sensitivity gives a gross indication of the relative strength of this process vs. external conditions during summer, while the “zero BC” case has a larger effect than the “zero deposition” case in January (not shown). Similar to the “zero BC” case, the exclusion of ozone dry deposition from the model simulations acts as an additive term to the diurnal curve in NA1, leaving almost unaltered the shape and timing of the signal, while it impacts the variance and covariance error in the other two subregions. The small impact the removal of dry deposition has on the covariance error (timing of the ozone signal) together with the outweighing offsetting bias might suggest that the correct estimate of the deposition magnitude is more beneficial than, e.g. the time dependence of surface resistance. The role of the variance is, however, unclear and deserves further analyses.

The instances where the “20 % red” bias error is lower than the error of the base case occur when the mean ozone concentrations were overestimated in the base case (e.g. daylight for all subregions and NA2 and NA3 over nighttime summer) as illustrated in Figs. 6 and 7.

The maps show that there are stations where the error is reduced with zero anthropogenic emissions (e.g. a reduction of 20–30 % in the southern coast of the US and in the far north-east during summer; Fig. 16d). This suggests the presence of other compensating model errors in both the base and sensitivity simulations that lead to better agreement with observations when prescribing an unrealistic emission scenario. The sources of these compensating errors need to be investigated in future work.

The “zero BC” run has profound negative effects over the whole continental area of NA during winter (Fig. 16e), while the effects are smaller during summer (Fig. 16f), especially over the southern coast, due to the relatively higher importance of photochemical formation of ozone during summer.

The error characteristics of the daily maximum 8 h rolling mean (DM8h, Fig. 10) resemble those of the daylight base case (Fig. 6, left column), but reduced in magnitude during winter, with almost null variance error and the same sign of the bias as the base case. The NA1, NA2, and NA3 standard deviations of the summer DM8h is of 7.6, 5.2, and 8.1 ppb and of 7.6, 6.5, and 7 ppb for the model and the observations, respectively. The model variability is therefore in line with the observed variability. The error of the DM8h for the sensitivity runs is reported in Fig. S5.

On a network-wide average, removing anthropogenic emissions causes a RMSE increase of 25 % during summer and of 0 % (10 % at 75th percentile) during winter while a zeroing out of input from the lateral boundaries causes a RMSE increase of 30 % during summer and of 180 % during winter (median values; Fig. 16).

Chimere MSE breakdown for summer and winter for the base case
(hourly time series of ozone) and sensitivity simulations over EU. The error
coefficients

As in Fig. 11 for the hourly time series of “20 % reduction” scenario

As in Fig. 11 for the hourly time series of “zeroed anthropogenic emissions” scenario.

The summer daylight RMSE

Removing the anthropogenic emissions had almost no effect on the covariance share of the MSE (if not a slight reduction with respect to the base case in EU2 and EU3 and also during nighttime), indicating that the error in the timing of the signal is influenced not by the emissions but rather by other processes. Moreover, the variance portion is left almost unchanged (1 ppb increase in EU1 and EU2), in contrast to the CMAQ results for NA. This would indicate that the variability of ozone concentration is hardly influenced by anthropogenic emissions in Chimere. The bias is the error component most sensitive to emissions reductions, especially in EU2 and less so in EU3. This is in line with the discussion of the daily profiles of Fig. 2b (which showed similar shapes of for the “zero emi” and of the “base” profiles) and contrasts with the NA case where the “zero emi” daily profiles are flatter than the base case.

The effect of imposing a constant ozone boundary condition value of 35 ppb
(and of zero for all other species) has a net small effect on the variance of
the ozone error but significantly reduces the covariance share of the error
in favour of the bias (Figs. 5 and 14). The total MSE is similar to that of
removing the anthropogenic emissions as far as the total MSE and the bias of
EU2 are concerned. It outweighs the latter for the total MSE, bias, and
variance in EU3 and covariance and nighttime bias component in EU1. We can
infer that the variability of the boundary conditions has a significant role
in determining the timing of the ozone signal in EU1 (close to the western
boundary of the domain) as the correlation coefficient degrades from 0.89
(base case) to 0.66 (“const BC”) (Figs. 5, 11, and 14). The bias staying the
same in EU1 daylight summer depends on the magnitude of the constant value
(35 ppb were chosen here) that is in close agreement with that of the base
case while the small variance error (

During summer in EU2 and EU3 changing the ozone boundary condition only influences the bias with marginal impacts on variance and covariance, while in winter (Fig. 14) there is also a significant reduction of the correlation coefficient, meaning that the boundary conditions modulate the timing of the signal. This also implies that the variability of the boundary conditions becomes more important in winter.

EU3 deserves special consideration as the RMSE

With respect to the base case, the DM8h (Fig. 15) shows a reduced share of the covariance error with respect to the mean ozone (Fig. 11) at the expense of an increase in variance error; the timing error is now shifted towards seasonal timescales. The variability of the DM8h is governed by synoptic processes which are likely responsible for the variability error of the DM8h. The EU1, EU2, and EU3 standard deviations of the summer DM8h is of 3, 6.2, and 8.6 ppb and of 6, 11, and 10.2 ppb for the model and the observations, respectively. The model therefore underestimates the observed variability (as indicated by the “minus” sign in the variance share of the error in Fig. 15) by up to 50 % in EU1. A range of processes could be responsible for the lack of variability in Chimere, from emission to chemistry to transport. The error of the DM8h for the sensitivity runs is reported in Fig. S6.

On a network-wide average, removing anthropogenic emission causes an RMSE increase of 21 % during summer and 12 % during winter (median values; Fig. 17c, d).

The effect of setting the dry deposition velocity of ozone to zero (July only, Fig. 5) increases not only the bias error but also the variance and covariance shares of the error. Thus in Chimere the deposition not only acts as a shifting term on the modelled concentration but also influences the variability and timing of ozone more profoundly than for the CMAQ case examined earlier.

As in Fig. 11 for the hourly time series of the “constant boundary conditions” scenario.

As in Fig. 11 for the rolling average daily maximum 8 h ozone time series.

The focus of this section is

The coefficients of the ACFs (Appendix A) can be interpreted as the Fourier
transform of the power spectral density. Frequency analysis of a signal is
often performed by constructing the periodogram (or spectrogram; see e.g.
Chatfield, 2004). This approach has proven useful when dealing with harmonic
processes superimposed on a baseline signal (Mudelsee, 2014) but, at the same
time, periodograms often contain high noise. Therefore, examining a signal at
specific frequencies can be instructive, for instance by resorting to wavelet
transform, which has the further advantage of enabling a 3-D
time–frequency–power visualisation. Compared to a power spectrum showing the
strength of variations of the signal as function of frequencies, wavelet
transformation also allows the allocation of information in the physical time
dimension other than phase space. Here, wavelet analysis of the periodogram
of seasonal

From inspecting Fig. 18 (NA) it emerges that the highest values of spectral
energies for

Annual time series of differences between CMAQ and observed O

NA3 and to a lesser extent NA2 show a high spectral power of the error for
periodicities of 1–2 months and lasting from January to May with a weaker
wake extending up to the end of the year, potentially pointing to errors in
the characterisation of larger-scale background concentrations associated
with boundary conditions. NA3 also exhibits a high spectral power for errors
associated with a periodicity of

Except for the long-term variations of the model error with periodicities greater than 2 months discussed above, NA1 is the only subregion that shows only weak power associated with model errors of shorter periodicities from June to December. This suggests that fluctuations caused by variations in large-scale background and changing weather patterns are better captured in this region compared to the other two subregions.

The energy associated with the daily error is again higher and more
pronounced in NA3 than in the other subregions, where it is most pronounced
during summer (NA1) or between March to October (NA2). While during winter
and autumn the daily error is likely driven by difficulties in reproducing
stable PBL dynamics, during spring and summer it is also influenced by the
chemical production and destruction of ozone, a process entailing NO

For the EU (Fig. 19) a notable feature is the very high daily error energy in
EU3 that is present throughout the year and most pronounced in summer. Such
high energy suggests persistent problems in representing processes having a
periodicity of 1 day. Further, EU3 shows an area of high energy associated
with a period of 1 to 2 months and extending from February, peaking in
April and May, and ending in September (mostly model underestimation;
Fig. 19c), while the error of the winter months in EU3 receives high energy
from slower processes, acting on timescales of

Same as in Fig. 18 for Chimere over the three EU subregions.

The similarity of the wavelet spectra for NA3 (Fig. 18c) and EU1 (Fig. 19a) (both regions are located on the western edge of their domain) at the beginning of the year for periods of 1 to 2 months might be indicative of the periodicity of the bias induced by the boundary conditions. Compared to CMAQ, the error of the Chimere model is more concentrated during spring and early summer, with a periodicity of 10–20 days.

Having identified some relevant timescales for the

In a recent study, Otero et al. (2016) analysed which synoptic and local variables best characterise the influence of large-scale circulation on daily maximum ozone over Europe. The authors found the majority of the variance during spring over the entire EU continent is accounted for in the 24 h lag autocorrelation while during summer the maximum temperature is the principal explanatory variable over continental EU. Other influential variables were found to be the relative humidity, the solar radiation, and the geopotential height. Camalier et al. (2007) and Lemaire et al. (2016) found that the near-surface temperature and the incoming short-wave radiation were the two most influential drivers of ozone uncertainties.

The ACFs and PACFs (partial autocorrelation function) of

CMAQ model: autocorrelation (ACF) and partial autocorrelation (PACF) functions for the differenced time series of residuals of ozone (model–observations). The differentiation is necessary to remove non-stationarity and thus to convey the ACF and PACF values depending on lag only.

As in Fig. 20 for the differenced time series of residual of ozone obtained by filtering out the diurnal fluctuations from the modelled and observed time series.

The PACF plots confirm that the error is not simply due to propagation and
memory from previous hours but rather arises at 24 h intervals and hence stems
from daily processes. On average, for NA corr(

Chimere model: autocorrelation (ACF) and partial autocorrelation (PACF) functions for the differenced time series of residuals of ozone (model–observations). The differentiation is necessary to remove non-stationarity and thus to convey the ACF and PACF values depending on lag only.

As in Fig. 22 for the differenced time series of residual of ozone obtained by filtering out the diurnal fluctuations from the modelled and observed time series.

the ACF analysis repeated for the “zero emi” scenario (Fig. S9);

the ACF of

the ACF of primary species (PM

the ACF of ozone error for the “zero emi” scenario at three stations where isoprene emissions are low (Fig. S12). These stations have been selected by looking at the locations where isoprene emissions accumulated over the months of June, July, and August as provided by the two models analysed here.

Phase shift of the diurnal cycle (in hours). A positive phase shift indicates that the model peak is “late”, while a negative phase shift indicates that the modelled peak precedes the observed peak. This analysis includes urban and suburban stations in addition to rural stations.

Since the individual daily processes directly or indirectly affecting the PBL dynamics cannot be untangled, here “PBL error” is meant to encompass errors in the representation of the variables affecting boundary layer dynamics (i.e. radiation, surface description, surface energy balance, heat exchange processes, development or suppression of convection, shear generated turbulence, and entrainment and detrainment processes at the boundary layer top for heat and any other scalar) and their non-linear interdependencies.

MSE (ppb

By removing the diurnal fluctuations (i.e. by screening out the frequencies
between 12 h and up to

The relative strength of the MSE for the undecomposed ozone time series and
for the ozone time series with the diurnal fluctuations removed and with only
the diurnal fluctuations is reported in Table 1. With the exception of NA1
and EU3, the baseline error (denoted with “noDU”) accounts for

This section explores the nature of the covariance error which occurs, among other reasons, when the two signals being compared are not in phase. The first and second moments of the error distribution are invariant with respect to a phase shift between the two signals (Murphy, 1995); i.e. the mean of the signal and the amplitude of the oscillations with respect to the mean value are not affected by a phase shift, which therefore does not have an impact on the bias and variance components of the error. The correlation coefficient, in contrast, is impacted by a lagged signal, producing a net increase of the covariance error.

The analysis of the phase lag between the daily component of the modelled and observed cycles is reported in Figs. 24 (NA) and 25 (EU), while winter and summer are analysed separately.

As in Fig. 24 for EU.

Normalised MSE produced by lagging the observed diurnal cycle with
respect to itself. The MSE due to such a shift is entirely due to covariance
error. The plots are presented for EU2

To perform this analysis, the modelled and observed ozone time series are
first filtered to isolate the diurnal component using a KZ filter. Then, the
cross covariance between the two time series is calculated. The time at which
the maximum covariance value occurs is taken as the phase shift between the
two signals. The method has an error of

In NA, the modelled diurnal peak occurs 1–2 h earlier than the observed diurnal peak at many stations and up to 3–4 h earlier at some Canadian stations. By taking into consideration the 0.5 h error of the estimate, the receptors at the western border (approximately corresponding to NA3) are least affected by this timing error (especially in summer Fig. 24b), and therefore the covariance share of the error shown in Fig. 4 is not due to daily phase shift in this region but probably due to the shifting of longer (or shorter) time periods induced for example by errors in transport (wind speed and/or direction). Figure S13 in the Supplement reports the same analysis repeated for the “zero emi” and “zero BC” runs.

In the EU (Fig. 25), no phase shift (or a phase shift compatible with the
0.5 h estimation error) is observed in Romania, Germany, or the UK during
winter, while a significant phase shift (the modelled peak occurs up to 6 h
early) is observed in the north of Italy and Austria, with France and Spain
oscillating between positive 3 (model delay up to 5 h in the south of
Madrid) and negative 5 and 6 h phase shifts, with the net effect of a
spatially aggregated daily cycle that is in phase with the observations
(Fig. 3b). During summer the phase shift is larger and extends also to the
countries where the phase shift was null during winter. Moreover, some
country-wise grouping can be detected, as for example at the border between
Belgium and France, Spain and France, and Finland to Sweden, possibly due to the
different measurement techniques and protocols among EU countries (e.g.
Solazzo and Galmarini, 2015). Figure S14 in the Supplement reports the same
analysis repeated for the “zero emi” run. The difference between the time
shift of the base case and the zeroed emission scenario (Fig. S15) reveals the effects
of the timing of the anthropogenic emissions on the covariance error. The
effect is null over EU (median value of the difference of zero) and is very
limited in NA (median value of zero during summer and of

Percentage of variance explained by the regressors (the total

While errors in emission profiles obviously can be one cause of the phase
shift and thus the covariance error of the modelled ozone signal, the
representation of boundary layer processes clearly can be a factor as well.
As discussed in e.g. Herwehe et al. (2011), the parameterisation of vertical
mixing during transitional periods of the day can cause a time shift in the
modelled ozone concentrations due to its effects on the near-surface
concentrations of NO

To quantify the importance of the covariance error caused by a phase shift
relative to other sources of error, Fig. 26 shows the curves of normalised
MSE as the observed ozone time series is shifted with respect to itself
between

The curves in Fig. 26 shows that a phase lag in the diurnal cycle of

Therefore, a modelled ozone peak that occurs 4 to 5 h too early (a feature that is detected at some EU3 and Canadian stations) corresponds to a covariance error of 9.0 ppb (i.e. the standard deviation of the network-average ozone observations in summer in both EU and NA). This result also helps explain the large covariance error in EU3, which can be at least partially attributed to the large phase shift of the daily cycle.

Same as Fig. 27 for EU. The analysis encompasses 61 co-located
stations (the EU stations for ozone, NO, NO

In this section a simple linear regression model for the error of ozone

The available regressors (explanatory variables) are the errors of the
variables for which measurements have been collected within AQMEII, i.e. NO
(EU only), NO

Linear correlation coefficient between the diurnal residuals of the
regressors of Eq. (3). The residuals are calculated by
removing fluctuations faster the

The errors of temperature and wind speed explain about a third of the
daylight winter ozone error of CMAQ, while

A straightforward limitation of Eq. (3) is that it assumes that successive
values of the error terms are independent, while in practice this is not the
case. Table 2 reports the correlation coefficient of the diurnal fluctuations
of the residuals, obtained by filtering out fluctuations faster than

In addition to the collinearity issue, there are other endogenous variables
that are not part of the regression analysis but whose error contributes to
total

Linear correlation coefficient between the residuals of the
regressors of Eq. (3), when the diurnal fluctuations
are filtered out. The residuals are calculated by removing fluctuations faster the

However, since we are not in a position to estimate the errors associated
with PBL variables (radiation, temperature, turbulence), an alternate approach
is to filter out the diurnal process from the modelled and observed time
series and repeat the analysis based on Eq. (3) (Figs. S16 and S17). The
correlation coefficients of the residuals with the diurnal component
filtered out are reported in Table 3. The collinearity has been largely removed, especially for NA,
while for EU some strong correlation persists (

The

A strong daily error component is common to all variables investigated here.

This error manifests itself in the correlation coefficient and thus is due to a
variance/covariance type of error (otherwise, if it was a bias-type error,
the

By inspecting the “no-DU” case, at least in NA (Fig. S16), the bias error
discussed in Sect. 3 cannot be explained simply in terms of the fields
NO

The impact of

The application of several diagnostic techniques in conjunction with
sensitivity scenarios has allowed in-depth analysis of the timescale
properties of the ozone error of CMAQ and Chimere, two widely applied
modelling systems. The main results, as stemming from various aspects of the
investigation, are that the largest share of MSE (

By excluding other plausible causes, and assuming that observational data are “correct” (not affected by systematic errors), we can conclude based on multiple indicators that the dynamics of the boundary layer (which in turn depend on the representation of radiation, surface characteristics, surface energy balance, heat exchange processes, development or suppression of convection, shear generated turbulence, and entrainment and detrainment processes at the boundary layer top for heat and any other scalars) are responsible for the recursive daily error. The most revealing indicator is the analysis of the ACF and PACF of the time series of ozone residuals that shows a daily periodicity: the 24 h errors are highly associated throughout the year; i.e. the error repeats itself with daily regularity. This could be caused by multiple processes occurring on a daily timescale, such as chemical transformations, the timing of the emissions, and PBL dynamics. However, analyses of the error periodicity of primary species (to exclude the role of chemical transformations) and of the scenario with zeroed anthropogenic emissions (to exclude the role of emissions) have shown the same error structure, pointing to PBL processes as the main cause of daily error.

Due to the spatial aggregation of these analyses and the non-linearity of the models' components, it is possible that the periodicity of the error could be due to a combination of multiple processes at specific sites. However, the absence of a spatial or emission dependence and the persistence of the daily periodicity indicate that the main cause of the daily error stems from PBL dynamics. Furthermore, the analogies of the time shift of the diurnal component of the base and zeroed emission cases suggest that the timing error (pure covariance error) is not caused by anthropogenic emissions (with the possible exception of winter in NA where some small differences are present).

This study is part of the goal of AQMEII to promote innovative insights into the evaluation of regional air quality models. This study is primarily meant to introduce evaluation methods that are innovative and that move towards diagnosing the causes of model error. It focuses on the diagnostic of the error produced by CMAQ and Chimere applied to calculate hourly surface ozone mixing ratios over North America and Europe.

We argue that the current widespread practice (although with several exceptions) of using time-aggregate metrics to merely quantify the average distance (in a metric space) between models and observations has clear limitations and does not help target the causes of model error. We therefore propose to move towards the qualification of the error components (bias, variance, covariance) and to assess each of them with relevant diagnostic methods. At the core of the diagnostic methods we have devised over the years within AQMEII is the quality of the information that can be extracted from model and measurements to aid understanding of the causes of model error, thus providing more useful information to model developers and users than can be gained from aggregate metrics. Applying such approaches on a routine basis would help boost the confidence in using models prediction for various applications. At the current stage, the methods we propose help identify the timescale of the error and its periodicity. The step to link the error to specific processes can only be reached by integrating the analysis with sensitivity model runs. For instance, we can infer that the timing error of the diurnal component is (at least partially) associated with the dynamics of the PBL, but further analyses are necessary to isolate the components of the PBL responsible for that error.

While remarking that the analyses carried out are not meant to compare the
two models but are rather meant to show how the two models, applied to
different areas and using different emissions, respond to changes, the main
conclusions of this study are as follows:

While the zeroing/modification of input of ozone from the lateral boundaries causes a shift of the ozone diurnal cycle in both CMAQ and Chimere, the response of the two models to a modification of anthropogenic emission and deposition fluxes is very different. For CMAQ, the effect of removing anthropogenic emissions causes a shift and a flattening of the diurnal curve (bias and variance error), while for Chimere the effect is restricted to a shift. In contrast, setting the ozone dry deposition velocity to zero causes a shift (bias error) for CMAQ, while a profound change of the error structure occurs for Chimere with significant impacts on not only the bias but also the variance and covariance terms.

The response of the models to variations in anthropogenic emissions and boundary conditions show a pronounced spatial heterogeneity, while the seasonal variability of this response is found to be less marked. Only during the winter season does the zeroing of boundary values for North America produce a spatially uniform deterioration of the model accuracy across the majority of the continent.

Fluctuations slower than

A recursive, systematic error with daily periodicity is detected in both models, responsible for 10–20 % of the quadratic total error, possibly associated with the dynamics of the PBL.

The modelled ozone daily peak accurately reproduces the observed one, although with significant exceptions in France, Italy, and Austria for Chimere and with the exceptions of Canada and some areas in the eastern US for CMAQ. Assuming the accurateness of the observational data in these regions, the modelled peak is anticipated by up to 6 h, causing a covariance error as large as 9 ppb. The analysis suggests that the timing of the anthropogenic emissions is not responsible for the phasing error of the ozone peaks but rather indicates that it might be caused by the dynamics of the PBL (although the role of biogenic emissions and chemistry cannot be ruled out).

The ozone error in CMAQ has a weak/negligible dependence on the error of
NO

The modeling and observational data generated for the
AQMEII exercise are accessible through the ENSEMBLE data platform
(

We gratefully acknowledge the contribution of various groups to the third Air Quality Model Evaluation International Initiative (AQMEII) activity. The following agencies have prepared the data sets used in this study: US EPA (North American emissions processing and gridded meteorology); US EPA, Environment Canada, Mexican Secretariat of the Environment and Natural Resources (Secretaría de Medio Ambiente y Recursos Naturales-SEMARNAT), and National Institute of Ecology (Instituto Nacional de Ecología-INE) (North American national emissions inventories); TNO (European emissions processing); ECMWF/MACC (chemical boundary conditions). Ambient North American concentration measurements were extracted from Environment Canada's National Atmospheric Chemistry Database (NAtChem) PM database and provided by several US and Canadian agencies (AQS, CAPMoN, CASTNet, IMPROVE, NAPS, SEARCH, and STN networks); North American precipitation chemistry measurements were extracted from NAtChem's precipitation chemistry database and were provided by several US and Canadian agencies (CAPMoN, NADP, NBPMN, NSPSN, and REPQ networks); the WMO World Ozone and Ultraviolet Data Centre (WOUDC) and its data-contributing agencies provided North American and European ozonesonde profiles; NASA's AErosol RObotic NETwork (AeroNet) and its data-contributing agencies provided North American and European AOD measurements; the MOZAIC Data Centre and its contributing airlines provided North American and European aircraft takeoff and landing vertical profiles. For European air quality data the following data centres were used: EMEP/EBAS and European Environment Agency/European Topic Center on Air and Climate Change/Air Quality e-reporting provided European air and precipitation chemistry data; the Finnish Meteorological Institute for providing biomass burning emission data for Europe. Data from meteorological station monitoring networks were provided by NOAA and Environment Canada (for the US and Canadian meteorological network data) and the National Center for Atmospheric Research (NCAR) data support section. Joint Research Center Ispra/Institute for Environment and Sustainability provided its ENSEMBLE system for model output harmonisation and analyses and evaluation.Edited by: Bruce Rolstad Denby Reviewed by: three anonymous referees