<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">ACP</journal-id><journal-title-group>
    <journal-title>Atmospheric Chemistry and Physics</journal-title>
    <abbrev-journal-title abbrev-type="publisher">ACP</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Atmos. Chem. Phys.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1680-7324</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/acp-22-11603-2022</article-id><title-group><article-title>Model output statistics (MOS) applied  to Copernicus Atmospheric Monitoring Service  (CAMS) O<sub>3</sub> forecasts: trade-offs between  continuous and categorical skill scores</article-title><alt-title>MOS correction of CAMS O<sub>3</sub> forecasts</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Petetin</surname><given-names>Hervé</given-names></name>
          <email>herve.petetin@bsc.es</email>
        <ext-link>https://orcid.org/0000-0001-5746-6504</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Bowdalo</surname><given-names>Dene</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-2434-2892</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Bretonnière</surname><given-names>Pierre-Antoine</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-3066-6685</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Guevara</surname><given-names>Marc</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-9727-8583</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Jorba</surname><given-names>Oriol</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-5872-0244</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Mateu Armengol</surname><given-names>Jan</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-5440-0673</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Samso Cabre</surname><given-names>Margarida</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Serradell</surname><given-names>Kim</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-8230-4347</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Soret</surname><given-names>Albert</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-1962-2972</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2">
          <name><surname>Pérez Garcia-Pando</surname><given-names>Carlos</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-4456-0697</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Barcelona Supercomputing Center, Barcelona, Spain</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>ICREA, Catalan Institution for Research and Advanced Studies, Barcelona, Spain</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Hervé Petetin (herve.petetin@bsc.es)</corresp></author-notes><pub-date><day>8</day><month>September</month><year>2022</year></pub-date>
      
      <volume>22</volume>
      <issue>17</issue>
      <fpage>11603</fpage><lpage>11630</lpage>
      <history>
        <date date-type="received"><day>18</day><month>October</month><year>2021</year></date>
           <date date-type="rev-request"><day>1</day><month>December</month><year>2021</year></date>
           <date date-type="rev-recd"><day>2</day><month>June</month><year>2022</year></date>
           <date date-type="accepted"><day>4</day><month>July</month><year>2022</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2022 Hervé Petetin et al.</copyright-statement>
        <copyright-year>2022</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://acp.copernicus.org/articles/acp-22-11603-2022.html">This article is available from https://acp.copernicus.org/articles/acp-22-11603-2022.html</self-uri><self-uri xlink:href="https://acp.copernicus.org/articles/acp-22-11603-2022.pdf">The full text article is available as a PDF file from https://acp.copernicus.org/articles/acp-22-11603-2022.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e194">Air quality (AQ) forecasting systems are usually built upon physics-based numerical models that are affected by a number of uncertainty sources. In order to reduce forecast errors, first and foremost the bias, they are often coupled with model output statistics (MOS) modules. MOS methods are statistical techniques used to correct raw forecasts at surface monitoring station locations, where AQ observations are available. In this study, we investigate the extent to which AQ forecasts can be improved using a variety of MOS methods, including moving average, quantile mapping, Kalman filter, analogs and gradient boosting machine methods, and consider as well the persistence method as a reference. We apply our analysis to the Copernicus Atmospheric Monitoring Service (CAMS) regional ensemble median O<sub>3</sub> forecasts over the Iberian Peninsula during 2018–2019. A key aspect of our study is the evaluation, which is performed using a comprehensive set of continuous and categorical metrics at various timescales, along different lead times and using different meteorological input datasets.</p>

      <p id="d2e206">Our results show that O<sub>3</sub> forecasts can be substantially improved using such MOS corrections and that improvements go well beyond the correction of the systematic bias. Depending on the timescale and lead time, root mean square errors decreased from 20 %–40 % to 10 %–30 %, while Pearson correlation coefficients increased from 0.7–0.8 to 0.8–0.9. Although the improvement typically affects all lead times, some MOS methods appear more adversely impacted by the lead time. The MOS methods relying on meteorological data were found to provide relatively similar performance with two different meteorological inputs. Importantly, our results also clearly show the trade-offs between continuous and categorical skills and their dependencies on the MOS method. The most sophisticated MOS methods better reproduce O<sub>3</sub> mixing ratios overall, with the lowest errors and highest correlations. However, they are not necessarily the best in predicting the peak O<sub>3</sub> episodes, for which simpler MOS methods can achieve better results. Although the complex impact of MOS methods on the distribution of and variability in raw forecasts can only be comprehended through an extended set of complementary statistical metrics, our study shows that optimally implementing MOS in AQ forecast systems crucially requires selecting the appropriate skill score to be optimized for the forecast application of interest.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Agencia Estatal de Investigación</funding-source>
<award-id>PID2020-116324RA-I00 / AEI / 10.13039/501100011033</award-id>
</award-group>
<award-group id="gs2">
<funding-source>H2020 Marie Skłodowska-Curie Actions</funding-source>
<award-id>H2020-MSCA-COFUND-2016-754433</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e245">Air pollution is recognized as a major health and environmental issue <xref ref-type="bibr" rid="bib1.bibx32" id="paren.1"/>. Mitigating its negative impacts on health requires reducing both pollutant concentrations and population exposure. Air quality (AQ) forecasts can be used to warn the population of the potential occurrence of a pollution episode while allowing the implementation of temporary emission reductions, including, for example, traffic restrictions, shutdown of industries and bans on the use of fertilizers in the agricultural sector.</p>
      <p id="d2e251">AQ forecasting systems are typically based on regional chemistry-transport models (CTMs), which remain subject to numerous uncertainty sources, leading to persistent systematic and random errors, especially for ozone (O<sub>3</sub>) and particulate matter (PM) (e.g.,  <xref ref-type="bibr" rid="bib1.bibx22 bib1.bibx23" id="altparen.2"/>). More importantly, they often largely underestimate the strongest episodes that exert the worst impacts upon health. In addition to the error sources related to the models themselves and the input data, part of the discrepancies between in situ observations and geophysical forecasts are due to inherent representativeness issues since concentrations measured at a specific location are not always comparable to the concentrations simulated over a relatively large volume.</p>
      <p id="d2e266">To overcome these limitations, operational AQ forecasting systems based on geophysical models often rely on so-called model output statistics (MOS) methods for statistically correcting the raw forecasts at monitoring stations. The basic idea of MOS methods is to combine raw forecasts with past observations, and eventually with other ancillary data, at a given station in order to produce a better forecast, preferably at a reasonable computational cost. As these MOS methods often significantly reduce systematic errors, bringing mean biases close to zero, they are also commonly referred to as bias-correction or bias-adjustment methods, although they may not be aimed at directly reducing this specific metric. MOS methods relying on local data (first and foremost the local observations) can also be seen as so-called downscaling methods since they allow some of the local features that cannot be reproduced at typical CTM spatial resolution to be captured.</p>
      <p id="d2e269">Over the last decades, several MOS methods have been proposed for correcting weather forecasts, before their more recent application to AQ forecasts, essentially on O<sub>3</sub> and fine particulate matter (PM<sub>2.5</sub>, with aerodynamic diameter lower than 2.5 <inline-formula><mml:math id="M10" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi></mml:mrow></mml:math></inline-formula>m). A very simple approach consists of subtracting the mean bias (or multiplying by a mean ratio to avoid negative values in the corrected forecasts) calculated from past data <xref ref-type="bibr" rid="bib1.bibx29" id="paren.3"/>. A more adaptive version consists of correcting the forecast by the model bias calculated over the previous days, which assumes some persistence in the errors <xref ref-type="bibr" rid="bib1.bibx11" id="paren.4"/>. Other authors proposed fitting linear regression models between chemical concentration errors and meteorological parameters (e.g., <xref ref-type="bibr" rid="bib1.bibx20 bib1.bibx31" id="altparen.5"/>). <xref ref-type="bibr" rid="bib1.bibx27" id="text.6"/> applied a set of autoregressive integrated moving average (ARIMA) models to improve Community Multiscale Air Quality (CMAQ) model forecasts. The Kalman filter (KF) method is a more sophisticated approach, yet still relatively simple to implement, based on signal processing theory (e.g., <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx25 bib1.bibx26 bib1.bibx1 bib1.bibx11 bib1.bibx12 bib1.bibx28" id="altparen.7"/>). Initially employed for correcting meteorological forecasts <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx18" id="paren.8"/>, the ANalogs (AN) method provides an observation-based forecast using historical forecasts and has recently provided encouraging results for correcting PM<sub>2.5</sub> CMAQ forecasts over the United States <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx21" id="paren.9"/>.</p>
      <p id="d2e330">A common limitation in the aforementioned studies is that MOS corrections are assessed mainly in terms of continuous variables (i.e., pollutant mixing ratios), while typically less attention is put on the parallel impact in terms of categorical variables (i.e., exceedances of given thresholds), which is however one of the primary goals of AQ forecasting systems. This can give a partial, if not misleading, view of the advantages and disadvantages of the different MOS approaches proposed in the literature.</p>
      <p id="d2e333">The present study aims at providing a comprehensive assessment of the impact of different MOS approaches upon AQ forecasts. We consider a representative set of MOS methods, including some already proposed in the recent literature and another one based on machine learning (ML). These MOS corrective methods are applied to the Copernicus Atmospheric Monitoring Service (CAMS) regional ensemble O<sub>3</sub> forecasts, focusing on the Iberian Peninsula (Spain and Portugal) during the period 2018–2019. The MOS methods are evaluated for a comprehensive set of continuous and categorical metrics, at various timescales (hourly to daily) and along different lead times (1 to 4 d), with different meteorological input data (forecast vs reanalyzed), in order to provide a more complete vision of their behavior.</p>
      <p id="d2e345">The paper is organized as follows: Sect. <xref ref-type="sec" rid="Ch1.S2"/> first describes the data and MOS methods used in this study; Sect. <xref ref-type="sec" rid="Ch1.S3"/> includes the evaluation of the raw (uncorrected) CAMS regional ensemble O<sub>3</sub> forecast over the Iberian Peninsula, along with a detailed assessment of the MOS results and some sensitivity analyses; and a broader discussion and conclusion are provided in Sect. <xref ref-type="sec" rid="Ch1.S4"/>.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data and methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Data</title>
<sec id="Ch1.S2.SS1.SSS1">
  <label>2.1.1</label><title>Ozone observations</title>
      <p id="d2e385">Hourly O<sub>3</sub> measurements over 2018–2019 are taken from the European Environmental Agency (EEA) AQ e-Reporting <xref ref-type="bibr" rid="bib1.bibx13" id="paren.10"/> and accessed through GHOST (Globally Harmonised Observational Surface Treatment). GHOST is a project developed at the Earth Sciences Department of the Barcelona Supercomputing Center that aims at harmonizing global surface atmospheric observations and metadata, for the purpose of facilitating quality-assured comparisons between observations and models within the atmospheric chemistry community <xref ref-type="bibr" rid="bib1.bibx2" id="paren.11"/>. On top of the public datasets it ingests, GHOST provides numerous data flags that are used here for quality assurance screening (see Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/>). In this study, daily mean, daily 1 h maximum and daily 8 h maximum (hereafter respectively referred to as d, d1max and d8max) are computed only when at least 75 % of the hourly data are available (i.e., 18 over 24 h). Note that despite such data availability criteria, large data gaps at some stations and during some days might occur mainly during daytime (for instance due to maintenance operations that typically occur during working hours). Considering all stations and days with at least 18 h of data, the frequency of data gaps exceeding 4 h between 08:00 and 15:00 UTC was found to be only 0.6 % (<inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:mn mathvariant="normal">1854</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">314</mml:mn><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mn mathvariant="normal">005</mml:mn></mml:mrow></mml:math></inline-formula>). Such situations occur with a similarly low frequency on days exceeding the target threshold (<inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:mn mathvariant="normal">77</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">13</mml:mn><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mn mathvariant="normal">221</mml:mn></mml:mrow></mml:math></inline-formula> or 0.6 %) and never occur on days exceeding the information threshold.</p>
      <p id="d2e436">Our study focuses on the Iberian Peninsula, over a domain ranging from 10° W to 5° E longitude and from 35 to 44° N latitude that includes Spain, Portugal and part of southwestern France. In total, 455 O<sub>3</sub> monitoring stations are included, which represents an observational dataset of 7 437 862 hourly O<sub>3</sub> measurements with 93 % of hourly data availability.</p>
</sec>
<sec id="Ch1.S2.SS1.SSS2">
  <label>2.1.2</label><title>CAMS regional ensemble forecast</title>
      <p id="d2e465">The benefit of MOS corrections is investigated on the CAMS regional ensemble forecasts. As one of the six Copernicus services, CAMS provides AQ forecast and reanalysis data at both regional and global scales (<uri>https://www.regional.atmosphere.copernicus.eu/</uri>, last access: 20 November 2020). At regional scale, nine state-of-the-art CTMs developed by European research institutions are currently participating in the operational ensemble AQ forecasts (CHIMERE from INERIS, EMEP from MET Norway, EURAD-IM from University of Cologne, LOTOS-EUROS from KNMI and TNO, MATCH from SMHI, MOCAGE from METEO-FRANCE, SILAM from FMI, DEHM from Aarhus University, GEM-AQ from IEP-NRI). In addition, MONARCH from BSC and MINNI from ENEA will join the ensemble soon. The ensemble forecast is computed as the median of all individual forecasts. Note that due to possible technical failures, all nine forecasts are not always available for computing the full ensemble. The CAMS regional forecasts are provided over 4 lead days, hereafter referred to as <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> (starting at 00:00 UTC).</p>
</sec>
<sec id="Ch1.S2.SS1.SSS3">
  <label>2.1.3</label><title>HRES and ERA5 meteorological data</title>
      <p id="d2e528">Some MOS methods rely on meteorological data. In this study, meteorological data are taken from the Atmospheric Model high-resolution 10 d forecast (HRES) (<uri>https://www.ecmwf.int/en/forecasts/datasets/set-i</uri>, last access: 1 September 2020) provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). HRES has a native spatial resolution of about 9 <inline-formula><mml:math id="M23" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">km</mml:mi></mml:mrow></mml:math></inline-formula> and 137 vertical levels. In addition, to investigate the sensitivity to the meteorological input data, we replicated all our experiments with the ERA5 reanalysis dataset (<xref ref-type="bibr" rid="bib1.bibx5" id="altparen.12"/>; <uri>https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5</uri>, last access: 1 September 2020). ERA5 data have a native spatial resolution of about 31 <inline-formula><mml:math id="M24" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">km</mml:mi></mml:mrow></mml:math></inline-formula> and 137 vertical levels, although data were downloaded on a 0.25° <inline-formula><mml:math id="M25" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 0.25° regular longitude-latitude grid from the Climate Data Store. At all surface O<sub>3</sub> monitoring stations, for both HRES and ERA5, we extracted the following variables at the hourly scale: 2 m temperature (code 167), 10 m surface wind speed (207), normalized 10 m zonal and meridian wind speed components (165 and 166), surface pressure (134), total cloud cover (164), surface net solar radiation (176), surface solar radiation downwards (169), downward UV radiation at the surface (57), boundary layer height (159), and geopotential at 500 hPa (129).</p>
</sec>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Applying MOS under restrictive operational conditions</title>
      <p id="d2e582">A novel aspect of this study is that we provide a comparison of a set of MOS methods under potentially restrictive training conditions in operational context. To mimic such restrictions we assume that (1) no past data, neither modeled nor observed, are available for training at the beginning of the period of study (here 1 January 2018) and (2) the number of modeled and observed data continuously grows with time along the period of study (here 2018–2019). On a given day, the MOS methods can therefore only rely on the historical data accumulated since the beginning of the period. Our approach consists of understanding the behavior of the different MOS methods in a worst-case scenario where a new or upgraded operational AQ forecasting system is implemented together with a MOS module for which there are few or no hindcast data. We believe that such a strategy allows the different MOS methods to be compared in a balanced way given the operational context. As described in detail in the next section, some MOS methods require very limited prior information to achieve their optimal performance, while others need a larger number of training data. In an operational context, the first category of methods might thus be advantaged at the beginning before being gradually supplanted with the second category. We note, however, that methods relying on limited past data may respond better to an abrupt change in environmental conditions, as experienced for instance during the COVID-19 lockdowns. Although not covered by the present study, we acknowledge here that in an operational context, the relationship between the length of past training data and the performance of the corresponding MOS prediction is an interesting aspect to investigate, as is the quantification of the spin-up time beyond which the MOS method might not significantly improve. Only some insights will be given by comparing the performance obtained in 2019 with and without using the data available in 2018. Similarly, our study does not investigate how potential issues (delays) in the near-real-time availability of the observations can impact the performance of the MOS methods, although this might be another important aspect to take into account in operational conditions; to the best of our knowledge, EEA observations are typically available with a 2 h lag, but some sporadic technical failures can induce extended delays.</p>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Description of the model output statistics (MOS) methods</title>
      <p id="d2e593">This section describes the different MOS methods implemented for correcting the raw forecasts (hereafter referred to as RAW), namely moving average (MA), Kalman filter (KF), quantile mapping (QM), analogs (AN) and gradient boosting machine (GBM) methods. All MOS methods are applied independently at each monitoring station. The skill of these different forecasts (including the RAW) is assessed relative to the persistence (PERS) reference method, which uses the previously observed concentration values at a specific hour of the day (averaged over 1 or several days) as the predicted value. As a first approach, we use a time window of 1 single day (hereafter referred to as PERS(1)).</p>
<sec id="Ch1.S2.SS3.SSS1">
  <label>2.3.1</label><title>Moving average (MA) method</title>
      <p id="d2e603">We primarily consider the moving average (MA) method, by which the raw CAMS forecast bias in the previous day(s) is used to correct the forecast. As a first approach, we use a time window of 1 single day (hereafter referred to as MA(1)). The sensitivity to the time window is discussed in Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS2">
  <label>2.3.2</label><title>Quantile mapping (QM) method</title>
      <p id="d2e616">The quantile mapping (QM) method aims at adjusting the distribution of the forecast concentrations to the distribution of observed concentrations. For a given day, the QM method consists of (1) computing two cumulative distribution functions (CDFs), corresponding to past modeled and observed O<sub>3</sub> mixing ratios, respectively; (2) locating the current O<sub>3</sub> forecast in the model CDF; and (3) identifying the corresponding O<sub>3</sub> values in the observation CDF and using it as the QM-corrected O<sub>3</sub> forecast. For instance, if the current O<sub>3</sub> forecast gives a value corresponding to the 95th percentile, the QM-corrected O<sub>3</sub> forecast will correspond to the 95th percentile of the observed O<sub>3</sub> mixing ratios. This approach thus aims at correcting all quantiles of the distribution, not only the mean.</p>
      <p id="d2e683">In the operational-like context in which this study is conducted (Sect. <xref ref-type="sec" rid="Ch1.S2.SS2"/>), the first QM corrections are computed when 30 d of data have been primarily accumulated to ensure a minimum representativeness of the model and observation CDFs. For computational reasons, both CDFs are updated every 30 d (although an update frequency of 1 single day would be optimal in a real operational context). The choice of a 30 d update frequency only aims at reducing the computational cost of running all MOS methods at all stations during the 2-year period. In a real operational context, only 1 d would have to be run, which would allow the update frequency to be increased up to 1 d; i.e., the CDFs would be updated every day, ensuring that we are taking advantage from the entire observational dataset available at a given time.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS3">
  <label>2.3.3</label><title>Kalman filter (KF) method</title>
      <p id="d2e696">The Kalman filter (KF) is an optimal recursive data processing algorithm with numerous science and engineering applications (see <xref ref-type="bibr" rid="bib1.bibx30" id="altparen.13"/>, for an introduction). In atmospheric sciences, it offers a popular framework for sophisticated data assimilation applications (e.g., <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx10" id="altparen.14"/>) but can also be used as a simple yet powerful MOS method for correcting forecasts (e.g., <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx25 bib1.bibx6" id="altparen.15"/>). The KF-based MOS method aims at recursively estimating the unknown forecast bias (here taken as the state variable of interest), combining previous forecast bias estimates with forecast bias observations. The updated forecast bias estimate is computed as a weighted average of these two terms, both being considered to be uncertain, i.e., affected by a noise with zero mean and a given variance. A detailed description of the KF algorithm can be found in Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>, but an important aspect to be mentioned here is that each of these two terms is weighted according to the value of the so-called Kalman gain that intrinsically depends on the ratio of both variances (hereafter referred to as the variance ratio). The value chosen for this internal parameter substantially affects the behavior of the KF, and thus the obtained MOS corrections. A variance ratio close to zero induces a Kalman gain close to 0. In such situations, the estimated forecast bias corresponds to the estimated forecast bias of the previous day, independently of the forecast error. A very high (infinite) variance ratio gives a Kalman gain close to 1. In this case, the estimated forecast bias corresponds to the observed forecast bias of the previous day, which thus makes it equivalent to the MA(1) method.</p>
      <p id="d2e710">In this study, the variance ratio is adjusted dynamically and updated regularly in order to optimize a specific statistical metric, in our case the RMSE (the corresponding approach being hereafter referred to as KF(RMSE)). The different steps are (1) at a given day of update, the KF corrections over the entire historical dataset are computed considering different values of variance ratio, from 0.001 to 100 in a logarithmic progression; (2) the RMSE is computed for each of the corrected historical time series obtained; and (3) the variance ratio associated with the best RMSE is retained and used until the next update. Other choices of metrics to optimize are explored in Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>.</p>
      <p id="d2e715">As for QM, for computational reasons, the update frequency is set to 30 d in this study (although, again, an update frequency of 1 single day would be optimal).</p>
</sec>
<sec id="Ch1.S2.SS3.SSS4">
  <label>2.3.4</label><title>Analogs (AN) method</title>
      <p id="d2e727">The analogs method (AN) implemented here consists of (1) comparing the current forecast to all past forecasts available, (2) identifying the past days with the most similar forecast (hereafter referred to as analog days or analogs) and (3) using the corresponding past observed concentrations to estimate the AN-corrected O<sub>3</sub> forecast (e.g., <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx9 bib1.bibx12 bib1.bibx21" id="altparen.16"/>). The current forecast is compared to each individual past forecast in order to identify which ones are the most similar. Based on a set of features including the raw O<sub>3</sub> mixing ratio forecast from the AQ model and the 10 m wind speed, 2 m temperature, surface pressure and boundary layer height forecast from the meteorological model, the distance metric proposed by <xref ref-type="bibr" rid="bib1.bibx8" id="text.17"/> and previously used in <xref ref-type="bibr" rid="bib1.bibx12" id="text.18"/> (see the formula in Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>) is used to compute the distance (i.e., to quantify the similarity) of each individual past forecast with respect to the current forecast. Then, as a first approach, the 10 best analog days that correspond here to the 10 most similar past forecasts are identified (hereafter referred to as AN(10); other values are tested in Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>). From those best analog days, the MOS-corrected forecast is computed as the weighted average of the corresponding observed concentrations, where weights are taken as the inverse of the distance metric previously computed. In comparison to a normal average, introducing the weights is expected to slightly reduce the dependence upon the number of analog days chosen.</p>
      <p id="d2e762">Therefore, in the analogs paradigm, the past days of similar chemical and/or meteorological conditions are identified in the forecast (i.e., model) space, while the output (i.e., the AN-corrected forecast) is taken from the observation space. The AQ model thus only serves to identify the past observed situations that look similar to the current one.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS5">
  <label>2.3.5</label><title>Machine-learning-based MOS method</title>
      <p id="d2e773">We also explore the use of ML algorithms as an innovative MOS approach for correcting AQ forecasts. In ML terms, it corresponds to a supervised regression problem where a ML model is trained to predict the observed concentrations, hereafter referred to as the target or output, based on multiple ancillary variables, hereafter referred to as the features or inputs, coming from meteorological and chemistry-transport geophysical models and/or past observations. In this context, the use of ML is of potential interest because (i) we suspect that some relationships exist between the target variable and at least some of these features, (ii) these relationships are likely too complex to be modeled in an analytical way, and (iii) data are available for extracting (learning) information about them. Over the last years, ML algorithms became very popular for many types of predictions, notably due to their ability to model complex (typically non-linear and multi-variable) relationships with good prediction skills. Among the myriad of ML algorithms developed so far, we focus on the decision-tree-based ensemble methods, and more specifically on the gradient boosting machine (GBM), which often gives among the best prediction skills (as shown in various ML competitions and model intercomparisons; e.g., <xref ref-type="bibr" rid="bib1.bibx3" id="altparen.19"/>).</p>
      <p id="d2e779">At each monitoring station, one single ML model is trained to forecast O<sub>3</sub> concentrations at all lead hours (from 1 to 96) or days (from 1 to 4), depending on the timescale used (see Sect. <xref ref-type="sec" rid="Ch1.S2.SS4"/>). The features taken into account include a set of chemical features (raw forecast O<sub>3</sub> concentration, O<sub>3</sub> concentration observed 1 d before), meteorological features (2 m temperature, 10 m surface wind speed, normalized 10 m zonal and meridian wind speed components, surface pressure, total cloud cover, surface net solar radiation, surface solar radiation downwards, downward UV radiation at the surface, boundary layer height, and geopotential at 500 hPa, all forecast by the meteorological model) and time features (day of year, day of week, lead hour). Although the past O<sub>3</sub> observed concentration corresponds to recursive information that will not be available for all forecast lead days, we use here the same value for all lead days. The tuning of the GBM models is described in Appendix <xref ref-type="sec" rid="App1.Ch1.S4"/>.</p>
      <p id="d2e823">As for QM, the GBM model is first trained (and tuned) only after 30 d to accumulate enough data and then retrained every 30 d based on all historical data available.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Timescales of MOS corrections</title>
      <p id="d2e835">Current AQ standards are defined according to pollutant-dependent timescales, e.g., daily 8 h maximum (d8max) concentration in the case of O<sub>3</sub>. In the literature, MOS corrections are typically applied to hourly concentrations, providing hourly corrected concentrations from which the value at the appropriate timescale can then be computed. Following this approach, for a given MOS method <inline-formula><mml:math id="M41" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula>, corrections in this study are first computed based on hourly time series (hereafter referred to as <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi mathvariant="normal">h</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), from which daily 24 h average (<inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mtext>d</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>), daily 1 h maximum (<inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mtext>d1max</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) and daily 8 h maximum (<inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mn mathvariant="normal">8</mml:mn><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) corrected concentrations are then deduced. In addition, MOS corrections are computed directly on daily 24 h average (<inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mtext>dd</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, the additional “d” indicating that the MOS method is applied directly on daily rather than hourly time series), daily 1 h maximum (<inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mtext>dd1max</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) and daily 8 h maximum (<inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi mathvariant="normal">dd</mml:mi><mml:mn mathvariant="normal">8</mml:mn><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) time series, respectively. When needed, meteorological features are used at the same timescale. This is done to investigate whether applying the MOS correction directly at the regulatory timescale can help to achieve better performance.</p>
</sec>
<sec id="Ch1.S2.SS5">
  <label>2.5</label><title>Evaluation metrics and skill scores</title>
      <p id="d2e951">In this study, O<sub>3</sub> forecasts are evaluated using an extended panel of continuous and categorical metrics to provide a comprehensive view of the impact of the different MOS methods on the predictions. Continuous metrics used to evaluate the O<sub>3</sub> concentrations include the following. <list list-type="bullet"><list-item>
      <p id="d2e974">nMB: normalized mean bias</p></list-item><list-item>
      <p id="d2e978">nRMSE: normalized root mean square error</p></list-item><list-item>
      <p id="d2e982">PCC: Pearson correlation coefficient</p></list-item><list-item>
      <p id="d2e986">slope: slope of the predicted-versus-observed O<sub>3</sub> mixing ratio to quantify how well the lowest and highest O<sub>3</sub> concentrations are predicted</p></list-item><list-item>
      <p id="d2e1008">nMSDB: normalized mean standard deviation bias to investigate how well the O<sub>3</sub> variability is reproduced by the forecast</p></list-item></list> Categorical metrics used to evaluate the O<sub>3</sub> exceedances beyond certain thresholds include the following. <list list-type="bullet"><list-item>
      <p id="d2e1032"><inline-formula><mml:math id="M55" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula>: hit rate to quantify the proportion of observed exceedances that are correctly detected</p></list-item><list-item>
      <p id="d2e1042"><inline-formula><mml:math id="M56" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>: false alarm rate to quantify the proportion of observed non-exceedances erroneously forecast as exceedances</p></list-item><list-item>
      <p id="d2e1052">FB: frequency bias to investigate the extent to which the forecast is predicting the same number of exceedances as observed (no matter if they are predicted on the correct days)</p></list-item><list-item>
      <p id="d2e1056">SR: success ratio to show how much of the predicted exceedances are indeed observed</p></list-item><list-item>
      <p id="d2e1060">CSI: critical success index to quantify the proportion of correctly predicted exceedances when discarding all the corrected rejections</p></list-item><list-item>
      <p id="d2e1064">PSS: Peirce skill score to investigate the extent to which the forecast is able to separate exceedances from non-exceedances</p></list-item><list-item>
      <p id="d2e1068">AUC: area under the receiver operating characteristic (ROC) curve to quantify the probability that the forecast predicts higher O<sub>3</sub> concentrations during a situation of exceedance compared to a situation of non-exceedance</p></list-item></list> The formula of these different metrics can be found in Appendix <xref ref-type="sec" rid="App1.Ch1.S5"/>. Each of them thus highlights a specific aspect of the performance. Regarding categorical metrics, <xref ref-type="bibr" rid="bib1.bibx24" id="text.20"/> gave a detailed explanation of the different metric properties desirable for assessing the quality of a forecasting system <xref ref-type="bibr" rid="bib1.bibx24" id="paren.21"><named-content content-type="pre">see Table 3.4 in</named-content></xref>. In this framework, PSS can be considered to be one of the most interesting metrics for assessing the accuracy of the different RAW and MOS-corrected forecasts, given that it gathers numerous valuable properties: (i) truly equitable (all random and fixed-value forecasting systems are awarded the same score, which provides a single no-skill baseline), (ii) not trivial to hedge (the forecaster cannot cheat on their forecast in order to increase PSS), (iii) base-rate-independent (PSS only depends on <inline-formula><mml:math id="M58" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M59" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>, which makes it invariant to natural variations in climate, which is particularly interesting in the framework of AQ forecasting, where AQ standards and subsequently the base rate can also change) and (v) bounded (values are comprised within a fixed range). It is worth noting that no perfect metric exists, and PSS (as most other metrics) does not benefit from the properties of non-degeneracy (it tends towards meaningless values for rare events).</p>
      <p id="d2e1106">In addition, results are also discussed in terms of skill scores, using the 1 <inline-formula><mml:math id="M60" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> persistence (PERS(1)) as the reference forecast. Skill scores aim at measuring the accuracy of a forecast relative to the accuracy of a chosen reference forecast (e.g., persistence, climatology, random choice). They can be computed as <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi mathvariant="normal">reference</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi mathvariant="normal">perfect</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi mathvariant="normal">reference</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M62" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> the score of the forecast, <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi mathvariant="normal">reference</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the score of the PERS(1) reference forecast and <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi mathvariant="normal">perfect</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the score expected with a perfect forecast. Skill scores indicate if a given forecast has a perfect skill (value of 1), a better skill than the reference forecast (value between 0–1), a skill equivalent to the reference forecast (value of 0) or a worse skill than the reference (value below 0, unbounded). To be converted into skill scores, the aforementioned metrics of interest need to be transformed into scores following the rule “the higher the better” (to constrain the skill score to values below 1). For the different metrics <inline-formula><mml:math id="M65" display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula>, the corresponding score <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:mi>X</mml:mi><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is obtained applying the following transformations: <inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:mi>X</mml:mi><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula> for nRMSE and <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mi>X</mml:mi><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mo>|</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>M</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula> for slope; no transformations are required for the other metrics (<inline-formula><mml:math id="M69" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M70" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>, SR, CSI, PSS and AUC). Note that, as indicated by its name, PSS is already intrinsically defined as a skill score (where the reference corresponds to a climatology or random choice, both giving PSS values tending toward 0), but it does not prevent it from being converted into a skill score related to the persistence forecast.</p>
      <p id="d2e1278">In order to ensure fair comparisons between observations and all the different forecasts, O<sub>3</sub> values at a given hour are discarded when at least one of these different dataset does not have data. Over the 2018–2019 period, the resulting data availability exceeds 94 % whatever the timescale considered. Note that about 4 % of the data are missing here due to the aforementioned minimum of 30 d (i.e., January 2018) of accumulated historical data requested to start computing the corrected forecasts with some MOS methods.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results</title>
      <p id="d2e1299">We first briefly describe the O<sub>3</sub> pollution over the Iberian Peninsula as observed by the monitoring stations and simulated by the CAMS regional ensemble forecast (Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>). Then, we investigate the performance of the MOS methods on both continuous (Sect. <xref ref-type="sec" rid="Ch1.S3.SS2"/>) and categorical (Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>) O<sub>3</sub> forecasts. Different sensitivity tests on the MOS methods are performed in Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>, including a test on the impact of the input meteorological data on the MOS performance.</p>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Ozone pollution over the Iberian Peninsula</title>
      <p id="d2e1336">The European Union sets different standards regarding O<sub>3</sub> pollution, including (1) a target threshold of 60 <inline-formula><mml:math id="M75" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula> for the daily 8 h maximum, with 25 exceedances per year allowed on average over 3 years; (2) an information threshold of 90 <inline-formula><mml:math id="M76" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula> for the daily 1 h maximum; and (3) an alert threshold of 120 <inline-formula><mml:math id="M77" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula> for the daily 1 h maximum. In this study, we focus on the first two thresholds and exclude the last one mainly because exceedances of the alert threshold are extremely rare (only 13 exceedances over 314 005 points, i.e., 0.004 %). With such a low frequency of occurrence, such events remain extremely difficult to predict (without predicting too many false alarms).</p>
      <p id="d2e1372">The mean O<sub>3</sub> mixing ratios, as well as the annual number of exceedances, are shown in Fig. <xref ref-type="fig" rid="Ch1.F1"/> for both observations and raw CAMS ensemble forecasts. The time series at the different timescales are shown in Fig. <xref ref-type="fig" rid="Ch1.F2"/>. Over the Iberian Peninsula, annual mean O<sub>3</sub> mixing ratios range between 10 and 50 <inline-formula><mml:math id="M80" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>, depending on the type of monitoring station (urban traffic, urban background, rural background), with typically higher levels on the Mediterranean coast compared to the Atlantic one. Over the entire domain and time period, the target (d8max <inline-formula><mml:math id="M81" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 60 <inline-formula><mml:math id="M82" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>) and information (d1max <inline-formula><mml:math id="M83" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 90 <inline-formula><mml:math id="M84" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>) thresholds have been exceeded 13 221 and 274 times, respectively (i.e., 4 % and 0.08 % of the 314 005 points, respectively). These exceedances are well distributed in time along the 2018–2019 period, with 404 d out of 730 d (55 %) with at least one station exceeding the target threshold, and 78 d out of 730 d (11 %) with at least one station exceeding the information threshold. These exceedances are observed over a large part of the peninsula, but with a higher frequency in specific locations, including the surroundings (typically downwind) of the largest cities (e.g., Madrid, Barcelona, Valencia, Lisbon, Porto) and close to industrial areas (e.g., Puertollano, a major industrial hot spot 200 <inline-formula><mml:math id="M85" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">km</mml:mi></mml:mrow></mml:math></inline-formula> south of Madrid).</p>

      <fig id="Ch1.F1"><label>Figure 1</label><caption><p id="d2e1446">Overview of the O<sub>3</sub> pollution over the Iberian Peninsula, as observed by monitoring stations <bold>(a, c, e)</bold> and as simulated by the CAMS regional ensemble <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> forecasts <bold>(b, d, f)</bold>, showing the mean O<sub>3</sub> mixing ratios <bold>(a, b)</bold> and the number of exceedances of the standard (d8max <inline-formula><mml:math id="M89" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 60 <inline-formula><mml:math id="M90" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>; <bold>c, d</bold>) and information threshold (d1max <inline-formula><mml:math id="M91" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 90 <inline-formula><mml:math id="M92" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>; <bold>e, f</bold>) over the period 2018–2019. In order to limit the overlap, stations are plotted here by decreasing value and with decreasing size (lowest values with largest symbols but in background, highest values with smallest symbols but in foreground). For clarity, the stations without any observed or simulated exceedance are omitted.</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f01.png"/>

        </fig>

      <fig id="Ch1.F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e1535">Time series of the mean O<sub>3</sub> mixing ratios over the Iberian Peninsula, as observed by monitoring stations (in black) and as simulated by the raw CAMS regional ensemble <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> forecasts (in yellow). Time series are shown at the hourly <bold>(h)</bold>, daily mean <bold>(d)</bold>, daily 1 h maximum (d1max) and daily 8 h maximum (d8max) timescales. O<sub>3</sub> mixing ratios are averaged over all surface stations of the domain.</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f02.png"/>

        </fig>


</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Performance on continuous forecasts</title>
<sec id="Ch1.S3.SS2.SSS1">
  <label>3.2.1</label><title>RAW forecasts</title>
      <p id="d2e1597">Considering the annual mean O<sub>3</sub> mixing ratios at all 456 stations (Fig. <xref ref-type="fig" rid="Ch1.F1"/>), the raw CAMS ensemble forecast represents moderately well the spatial distribution of annual O<sub>3</sub> over the Iberian Peninsula (PCC of 0.54 for <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> forecasts) and strongly underestimates the spatial variability (nMSDB of <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">42</mml:mn></mml:mrow></mml:math></inline-formula> %). At least part of these errors are due to the fact that all station types are taken into account here, including traffic stations where local road transport NO<sub><italic>x</italic></sub> emissions can strongly reduce the O<sub>3</sub> levels (titration by NO), which cannot be properly represented by models at 10 <inline-formula><mml:math id="M102" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">km</mml:mi></mml:mrow></mml:math></inline-formula> spatial resolution. In this study, all station types are included because we are ultimately interested in predicting O<sub>3</sub> exceedances at all locations where they can be observed (and thus where air quality standards apply). It is worth noting that the impact of the MOS methods on the different metrics might vary from one type of station to another, although this aspect is beyond the scope of our study. The raw CAMS ensemble forecast correctly identifies regions where most exceedances of the target threshold occur but often with underestimated frequency, especially around Madrid, in southern Spain (in-land part of the Andalusia region) and along the Mediterranean coast. More severe deficiencies are found with the information threshold that is almost never reached by the CAMS ensemble (with one single exception around Porto).</p>
      <p id="d2e1678">The overall statistical results are shown in Fig. <xref ref-type="fig" rid="Ch1.F3"/> for the different forecast methods, and a subset of these statistics is given in Table <xref ref-type="table" rid="Ch1.T1"/> (and in Table S1 in the Supplement for additional timescales). For a given lead day and timescale, statistics are computed here after aggregating data from all monitoring stations; therefore, statistics of <inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> O<sub>3</sub> forecasts at the hourly scale can be based on 730 <inline-formula><mml:math id="M106" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M107" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 24 <inline-formula><mml:math id="M108" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">h</mml:mi></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M109" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 455 stations <inline-formula><mml:math id="M110" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 7 971 600 points if there are no data gaps. The RAW forecast moderately overestimates the O<sub>3</sub> mixing ratios, especially at hourly and daily timescales, but shows a reasonable correlation at all timescales (above 0.75). However, its main deficiency lies in the underestimated variability (nMSDB around <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> %), which is reflected in the low model-versus-observation linear slope obtained (around 0.5–0.6). The deterioration of the performance of the raw CAMS forecasts with lead time is very low, with hourly scale <inline-formula><mml:math id="M113" display="inline"><mml:mi mathvariant="normal">nRMSE</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M114" display="inline"><mml:mi mathvariant="normal">PCC</mml:mi></mml:math></inline-formula> decreasing from <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:mn mathvariant="normal">38</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M116" display="inline"><mml:mn mathvariant="normal">0.75</mml:mn></mml:math></inline-formula> at <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M118" display="inline"><mml:mrow><mml:mn mathvariant="normal">39</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M119" display="inline"><mml:mn mathvariant="normal">0.72</mml:mn></mml:math></inline-formula> at <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, potentially due to their relatively coarse spatial resolution.</p>
      <p id="d2e1839">As expected (by construction), the PERS(1) reference forecast gives unbiased O<sub>3</sub> forecasts. Due to the temporal auto-correlation of O<sub>3</sub> concentrations, reasonable results are obtained at <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M124" display="inline"><mml:mi mathvariant="normal">nRMSE</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M125" display="inline"><mml:mi mathvariant="normal">PCC</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M126" display="inline"><mml:mi mathvariant="normal">slope</mml:mi></mml:math></inline-formula> of <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:mn mathvariant="normal">36</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M128" display="inline"><mml:mn mathvariant="normal">0.74</mml:mn></mml:math></inline-formula> and <inline-formula><mml:math id="M129" display="inline"><mml:mn mathvariant="normal">0.74</mml:mn></mml:math></inline-formula>) but quickly deteriorate with the lead time (down to <inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:mn mathvariant="normal">42</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M131" display="inline"><mml:mn mathvariant="normal">0.65</mml:mn></mml:math></inline-formula> and <inline-formula><mml:math id="M132" display="inline"><mml:mn mathvariant="normal">0.64</mml:mn></mml:math></inline-formula> at <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>). A subset of skill scores with PERS(1) for reference is shown in Fig. <xref ref-type="fig" rid="Ch1.F4"/>. Apart from the slope that is always better reproduced by PERS(1), the RAW forecast reaches better skill scores than PERS(1) on both the nRMSE and PCC but only beyond <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (with values typically ranging between 0–0.2), and not at all timescales (for instance, PERS(1) systematically shows better RMSE than RAW at the daily scale).</p>

<table-wrap id="Ch1.T1" specific-use="star"><label>Table 1</label><caption><p id="d2e1975">Evaluation of the different forecast methods on continuous metrics, at <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (and <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> in parentheses), for the <inline-formula><mml:math id="M137" display="inline"><mml:mi mathvariant="normal">h</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M138" display="inline"><mml:mi mathvariant="normal">d</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mn mathvariant="normal">8</mml:mn><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:math></inline-formula> timescales (see Table S1 in the Supplement for the evaluation results at <inline-formula><mml:math id="M141" display="inline"><mml:mi mathvariant="normal">dd</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:mi mathvariant="normal">dd</mml:mi><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:mi mathvariant="normal">dd</mml:mi><mml:mn mathvariant="normal">8</mml:mn><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:math></inline-formula> timescales).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="8">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Timescale</oasis:entry>
         <oasis:entry colname="col2">Forecast</oasis:entry>
         <oasis:entry colname="col3">nMB</oasis:entry>
         <oasis:entry colname="col4">nRMSE</oasis:entry>
         <oasis:entry colname="col5">PCC</oasis:entry>
         <oasis:entry colname="col6">slope</oasis:entry>
         <oasis:entry colname="col7">nMSDB</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M144" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">h</oasis:entry>
         <oasis:entry colname="col2">GBM</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M145" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0 % (<inline-formula><mml:math id="M146" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>1 %)</oasis:entry>
         <oasis:entry colname="col4">25 % (28 %)</oasis:entry>
         <oasis:entry colname="col5">0.87 (0.83)</oasis:entry>
         <oasis:entry colname="col6">0.75 (0.71)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M147" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>13 % (<inline-formula><mml:math id="M148" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>15 %)</oasis:entry>
         <oasis:entry colname="col8">7 067 085</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">AN(10)</oasis:entry>
         <oasis:entry colname="col3">0 % (0 %)</oasis:entry>
         <oasis:entry colname="col4">26 % (28 %)</oasis:entry>
         <oasis:entry colname="col5">0.86 (0.82)</oasis:entry>
         <oasis:entry colname="col6">0.75 (0.70)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M149" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>13 % (<inline-formula><mml:math id="M150" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>15 %)</oasis:entry>
         <oasis:entry colname="col8">7 067 085</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">KF(RMSE)</oasis:entry>
         <oasis:entry colname="col3">0 % (<inline-formula><mml:math id="M151" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0 %)</oasis:entry>
         <oasis:entry colname="col4">25 % (28 %)</oasis:entry>
         <oasis:entry colname="col5">0.86 (0.83)</oasis:entry>
         <oasis:entry colname="col6">0.78 (0.74)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M152" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>10 % (<inline-formula><mml:math id="M153" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>11 %)</oasis:entry>
         <oasis:entry colname="col8">7 067 085</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QM</oasis:entry>
         <oasis:entry colname="col3">3 % (3 %)</oasis:entry>
         <oasis:entry colname="col4">31 % (33 %)</oasis:entry>
         <oasis:entry colname="col5">0.81 (0.78)</oasis:entry>
         <oasis:entry colname="col6">0.81 (0.78)</oasis:entry>
         <oasis:entry colname="col7">0 % (<inline-formula><mml:math id="M154" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>1 %)</oasis:entry>
         <oasis:entry colname="col8">7 067 085</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">MA(1)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M155" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0 % (<inline-formula><mml:math id="M156" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>1 %)</oasis:entry>
         <oasis:entry colname="col4">31 % (36 %)</oasis:entry>
         <oasis:entry colname="col5">0.81 (0.74)</oasis:entry>
         <oasis:entry colname="col6">0.82 (0.75)</oasis:entry>
         <oasis:entry colname="col7">2 % (0 %)</oasis:entry>
         <oasis:entry colname="col8">7 067 085</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PERS(1)</oasis:entry>
         <oasis:entry colname="col3">0 % (0 %)</oasis:entry>
         <oasis:entry colname="col4">36 % (42 %)</oasis:entry>
         <oasis:entry colname="col5">0.75 (0.65)</oasis:entry>
         <oasis:entry colname="col6">0.75 (0.65)</oasis:entry>
         <oasis:entry colname="col7">0 % (<inline-formula><mml:math id="M157" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0 %)</oasis:entry>
         <oasis:entry colname="col8">7 067 085</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RAW</oasis:entry>
         <oasis:entry colname="col3">18 % (17 %)</oasis:entry>
         <oasis:entry colname="col4">38 % (39 %)</oasis:entry>
         <oasis:entry colname="col5">0.75 (0.72)</oasis:entry>
         <oasis:entry colname="col6">0.53 (0.50)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M158" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>29 % (<inline-formula><mml:math id="M159" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>30 %)</oasis:entry>
         <oasis:entry colname="col8">7 067 085</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">d</oasis:entry>
         <oasis:entry colname="col2">GBM</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M160" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1 % (<inline-formula><mml:math id="M161" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>1 %)</oasis:entry>
         <oasis:entry colname="col4">16 % (18 %)</oasis:entry>
         <oasis:entry colname="col5">0.91 (0.88)</oasis:entry>
         <oasis:entry colname="col6">0.84 (0.80)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M162" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>7 % (<inline-formula><mml:math id="M163" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>9 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">AN(10)</oasis:entry>
         <oasis:entry colname="col3">0 % (0 %)</oasis:entry>
         <oasis:entry colname="col4">16 % (19 %)</oasis:entry>
         <oasis:entry colname="col5">0.90 (0.86)</oasis:entry>
         <oasis:entry colname="col6">0.78 (0.73)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M164" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>13 % (<inline-formula><mml:math id="M165" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>15 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">KF(RMSE)</oasis:entry>
         <oasis:entry colname="col3">0 % (<inline-formula><mml:math id="M166" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0 %)</oasis:entry>
         <oasis:entry colname="col4">15 % (18 %)</oasis:entry>
         <oasis:entry colname="col5">0.91 (0.88)</oasis:entry>
         <oasis:entry colname="col6">0.85 (0.80)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M167" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>7 % (<inline-formula><mml:math id="M168" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>9 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QM</oasis:entry>
         <oasis:entry colname="col3">3 % (2 %)</oasis:entry>
         <oasis:entry colname="col4">20 % (22 %)</oasis:entry>
         <oasis:entry colname="col5">0.86 (0.84)</oasis:entry>
         <oasis:entry colname="col6">0.91 (0.87)</oasis:entry>
         <oasis:entry colname="col7">5 % (4 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">MA(1)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M169" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0 % (<inline-formula><mml:math id="M170" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>1 %)</oasis:entry>
         <oasis:entry colname="col4">16 % (22 %)</oasis:entry>
         <oasis:entry colname="col5">0.91 (0.82)</oasis:entry>
         <oasis:entry colname="col6">0.92 (0.81)</oasis:entry>
         <oasis:entry colname="col7">1 % (<inline-formula><mml:math id="M171" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>2 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PERS(1)</oasis:entry>
         <oasis:entry colname="col3">0 % (0 %)</oasis:entry>
         <oasis:entry colname="col4">20 % (29 %)</oasis:entry>
         <oasis:entry colname="col5">0.85 (0.70)</oasis:entry>
         <oasis:entry colname="col6">0.85 (0.70)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M172" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0 % (<inline-formula><mml:math id="M173" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RAW</oasis:entry>
         <oasis:entry colname="col3">18 % (17 %)</oasis:entry>
         <oasis:entry colname="col4">30 % (30 %)</oasis:entry>
         <oasis:entry colname="col5">0.76 (0.74)</oasis:entry>
         <oasis:entry colname="col6">0.55 (0.52)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M174" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>28 % (<inline-formula><mml:math id="M175" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>29 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">d1max</oasis:entry>
         <oasis:entry colname="col2">GBM</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M176" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>8 % (<inline-formula><mml:math id="M177" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>8 %)</oasis:entry>
         <oasis:entry colname="col4">16 % (18 %)</oasis:entry>
         <oasis:entry colname="col5">0.86 (0.83)</oasis:entry>
         <oasis:entry colname="col6">0.80 (0.75)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M178" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>8 % (<inline-formula><mml:math id="M179" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>10 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">AN(10)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M180" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>4 % (<inline-formula><mml:math id="M181" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>4 %)</oasis:entry>
         <oasis:entry colname="col4">15 % (17 %)</oasis:entry>
         <oasis:entry colname="col5">0.86 (0.82)</oasis:entry>
         <oasis:entry colname="col6">0.74 (0.70)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M182" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>14 % (<inline-formula><mml:math id="M183" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>15 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">KF(RMSE)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M184" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>3 % (<inline-formula><mml:math id="M185" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>4 %)</oasis:entry>
         <oasis:entry colname="col4">13 % (15 %)</oasis:entry>
         <oasis:entry colname="col5">0.89 (0.85)</oasis:entry>
         <oasis:entry colname="col6">0.81 (0.77)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M186" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>8 % (<inline-formula><mml:math id="M187" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>10 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QM</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M188" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1 % (<inline-formula><mml:math id="M189" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>1 %)</oasis:entry>
         <oasis:entry colname="col4">17 % (18 %)</oasis:entry>
         <oasis:entry colname="col5">0.82 (0.80)</oasis:entry>
         <oasis:entry colname="col6">0.83 (0.80)</oasis:entry>
         <oasis:entry colname="col7">1 % (<inline-formula><mml:math id="M190" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">MA(1)</oasis:entry>
         <oasis:entry colname="col3">3 % (2 %)</oasis:entry>
         <oasis:entry colname="col4">15 % (18 %)</oasis:entry>
         <oasis:entry colname="col5">0.86 (0.79)</oasis:entry>
         <oasis:entry colname="col6">0.87 (0.77)</oasis:entry>
         <oasis:entry colname="col7">1 % (<inline-formula><mml:math id="M191" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>2 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PERS(1)</oasis:entry>
         <oasis:entry colname="col3">0 % (0 %)</oasis:entry>
         <oasis:entry colname="col4">17 % (23 %)</oasis:entry>
         <oasis:entry colname="col5">0.82 (0.67)</oasis:entry>
         <oasis:entry colname="col6">0.82 (0.67)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M192" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0 % (<inline-formula><mml:math id="M193" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>1 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RAW</oasis:entry>
         <oasis:entry colname="col3">2 % (2 %)</oasis:entry>
         <oasis:entry colname="col4">19 % (19 %)</oasis:entry>
         <oasis:entry colname="col5">0.76 (0.74)</oasis:entry>
         <oasis:entry colname="col6">0.55 (0.52)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M194" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>28 % (<inline-formula><mml:math id="M195" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>29 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">d8max</oasis:entry>
         <oasis:entry colname="col2">GBM</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M196" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>4 % (<inline-formula><mml:math id="M197" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>5 %)</oasis:entry>
         <oasis:entry colname="col4">15 % (17 %)</oasis:entry>
         <oasis:entry colname="col5">0.89 (0.86)</oasis:entry>
         <oasis:entry colname="col6">0.83 (0.79)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M198" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>7 % (<inline-formula><mml:math id="M199" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>8 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">AN(10)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M200" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1 % (<inline-formula><mml:math id="M201" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>2 %)</oasis:entry>
         <oasis:entry colname="col4">15 % (17 %)</oasis:entry>
         <oasis:entry colname="col5">0.88 (0.85)</oasis:entry>
         <oasis:entry colname="col6">0.78 (0.73)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M202" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>12 % (<inline-formula><mml:math id="M203" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>14 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">KF(RMSE)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M204" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1 % (<inline-formula><mml:math id="M205" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>2 %)</oasis:entry>
         <oasis:entry colname="col4">13 % (15 %)</oasis:entry>
         <oasis:entry colname="col5">0.91 (0.88)</oasis:entry>
         <oasis:entry colname="col6">0.85 (0.81)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M206" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>7 % (<inline-formula><mml:math id="M207" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>8 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QM</oasis:entry>
         <oasis:entry colname="col3">1 % (2 %)</oasis:entry>
         <oasis:entry colname="col4">17 % (19 %)</oasis:entry>
         <oasis:entry colname="col5">0.85 (0.83)</oasis:entry>
         <oasis:entry colname="col6">0.88 (0.84)</oasis:entry>
         <oasis:entry colname="col7">3 % (1 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">MA(1)</oasis:entry>
         <oasis:entry colname="col3">1 % (0 %)</oasis:entry>
         <oasis:entry colname="col4">15 % (18 %)</oasis:entry>
         <oasis:entry colname="col5">0.89 (0.83)</oasis:entry>
         <oasis:entry colname="col6">0.89 (0.81)</oasis:entry>
         <oasis:entry colname="col7">0 % (<inline-formula><mml:math id="M208" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>2 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PERS(1)</oasis:entry>
         <oasis:entry colname="col3">0 % (0 %)</oasis:entry>
         <oasis:entry colname="col4">18 % (24 %)</oasis:entry>
         <oasis:entry colname="col5">0.84 (0.70)</oasis:entry>
         <oasis:entry colname="col6">0.84 (0.70)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M209" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0 % (<inline-formula><mml:math id="M210" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>1 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RAW</oasis:entry>
         <oasis:entry colname="col3">7 % (7 %)</oasis:entry>
         <oasis:entry colname="col4">21 % (22 %)</oasis:entry>
         <oasis:entry colname="col5">0.79 (0.76)</oasis:entry>
         <oasis:entry colname="col6">0.57 (0.54)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M211" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>27 % (<inline-formula><mml:math id="M212" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>29 %)</oasis:entry>
         <oasis:entry colname="col8">295 617</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <fig id="Ch1.F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e3345">Statistical performance of RAW and MOS-corrected CAMS O<sub>3</sub> forecasts for continuous metrics (top panels) and categorical metrics related to the exceedance of the target (intermediate panels) and information threshold (bottom panels). The different symbols depict results obtained at different timescales (h: hourly; d: daily mean; <inline-formula><mml:math id="M214" display="inline"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="normal">max</mml:mi><mml:mo>/</mml:mo><mml:mi mathvariant="normal">dd</mml:mi><mml:mn mathvariant="normal">1</mml:mn><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:math></inline-formula>: daily 1 h maximum; <inline-formula><mml:math id="M215" display="inline"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mn mathvariant="normal">8</mml:mn><mml:mi mathvariant="normal">max</mml:mi><mml:mo>/</mml:mo><mml:mi mathvariant="normal">dd</mml:mi><mml:mn mathvariant="normal">8</mml:mn><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:math></inline-formula>: daily 8 h maximum). In each panel, results are shown for the different methods (each with a given color). The overlaying symbols of decreasing transparency show the results at the different lead days from <inline-formula><mml:math id="M216" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (most transparent) to <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> (most opaque). Metrics: normalized mean bias (nMB in %), normalized root mean square error (nRMSE in percent), Pearson correlation coefficient (PCC), slope (unitless), normalized mean standard deviation bias (nMSDB in percent), hit rate (<inline-formula><mml:math id="M218" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula>), false alarm rate (<inline-formula><mml:math id="M219" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>), frequency bias (FB), success ratio (SR), critical success index (CSI), Peirce skill score (PSS), area under the ROC curve (AUC). See Sect. <xref ref-type="sec" rid="Ch1.S2.SS4"/> and <xref ref-type="sec" rid="Ch1.S2.SS5"/> for details on timescales and metrics, respectively.</p></caption>
            <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f03.png"/>

          </fig>

      <fig id="Ch1.F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e3448">Similar to Fig. <xref ref-type="fig" rid="Ch1.F3"/> for skill scores (see Sect. <xref ref-type="sec" rid="Ch1.S2.SS5"/> for details on the calculation of these skill scores). For clarity, the highest negative values (mostly obtained on RAW and/or shortest lead times) are cut but can be seen in Fig. S1 in the Supplement. </p></caption>
            <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f04.png"/>

          </fig>

</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <label>3.2.2</label><title>MOS-corrected forecasts</title>
      <p id="d2e3469">The MA(1) method removes most of the bias of O<sub>3</sub> concentrations and variability. Some residual biases appear when computing the daily 1 <inline-formula><mml:math id="M221" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">h</mml:mi></mml:mrow></mml:math></inline-formula> maximum from the MOS-corrected hourly O<sub>3</sub> concentrations (i.e., d1max scale) but can be removed by applying the MA(1) method directly at this timescale (i.e., dd1max scale). The MA(1) method substantially improves the other metrics for all lead days, with hourly scale nRMSE, PCC and slope of <inline-formula><mml:math id="M223" display="inline"><mml:mrow><mml:mn mathvariant="normal">31</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M224" display="inline"><mml:mn mathvariant="normal">0.81</mml:mn></mml:math></inline-formula> and <inline-formula><mml:math id="M225" display="inline"><mml:mn mathvariant="normal">0.82</mml:mn></mml:math></inline-formula> at <inline-formula><mml:math id="M226" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M227" display="inline"><mml:mrow><mml:mn mathvariant="normal">36</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="italic">%</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M228" display="inline"><mml:mn mathvariant="normal">0.74</mml:mn></mml:math></inline-formula> and <inline-formula><mml:math id="M229" display="inline"><mml:mn mathvariant="normal">0.75</mml:mn></mml:math></inline-formula> at <inline-formula><mml:math id="M230" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>. Thus, the performance still deteriorates with lead time, but slightly less dramatically than with PERS(1). In terms of skill scores, such a simple approach as MA(1) is found to strongly improve the skills initially obtained with RAW alone, whatever the timescale or lead time. Skills scores range between 0.1–0.3 for nRMSE and 0.3–0.4 for PCC and slope, with slightly higher values at daily and d8max scales. The variations in skill along lead time differ between nRMSE/PCC (lowest and highest skills typically obtained at <inline-formula><mml:math id="M231" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M232" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, respectively) and slope (skills tend to progressively decrease from <inline-formula><mml:math id="M235" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M236" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, although slightly).</p>
      <p id="d2e3646">The QM method shows quite similar results to the MA(1) method, but usually with worse (better) performance at short (long) lead time. Thus, the deterioration of the performance with lead time tends to be slower in QM than in MA(1). Biases in O<sub>3</sub> concentrations and O<sub>3</sub> variability are often slightly higher with QM but remain relatively low (below <inline-formula><mml:math id="M239" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> %). The strongest improvements in QM compared to MA(1) are found at the hourly scale for the longest lead times. On these continuous metrics, the skills of the QM method are only slightly positive or even negative at <inline-formula><mml:math id="M240" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (except at the hourly scale, where skill scores are always positive) but are much higher between <inline-formula><mml:math id="M241" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M242" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> and often slightly better than MA(1).</p>
      <p id="d2e3714">Compared to the previous MOS methods, the KF method provides a substantial improvement on both nRMSE and PCC, leading to skill scores of 0.3–0.4 and 0.4–0.6, respectively. However, this comes at the cost of an underestimation of the variability (nMSDB around <inline-formula><mml:math id="M243" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> %, still much better than the <inline-formula><mml:math id="M244" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> % of nMSDB found in RAW). As for the previous methods, some small biases appear at d1max scale and to a lesser extent at d8max scale, but applying this MOS method directly on d1max or d8max O<sub>3</sub> mixing ratios rather than hourly data (i.e., dd1max and dd8max scales) mitigates the issue.</p>
      <p id="d2e3746">Overall, comparable results are found with AN and GBM methods, but the aforementioned issues are typically exacerbated. The negative biases at d1max and d8max timescales are much higher, especially for GBM, but can be removed at dd1max and dd8max scales. Similarly, the underestimation of the variability is much more pronounced, with nMSDB values around <inline-formula><mml:math id="M246" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">15</mml:mn></mml:mrow></mml:math></inline-formula> % and <inline-formula><mml:math id="M247" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> % for AN and GBM, respectively. These two MOS methods thus show a good performance for predicting the central part of the distribution of O<sub>3</sub> mixing ratios but have more difficulty in capturing the lowest and highest O<sub>3</sub> concentrations observed on the tails of this distribution. Besides the negative nMSDB, this typically leads to lower slopes compared to the other MOS methods. Skill scores on nRMSE and PCC span over a relatively large range of values depending on the timescale and the lead time. They are typically the lowest at short lead times and/or at specific timescales (e.g., d1max) but can reach among the highest values (although slightly lower than KF), for instance with GBM, at the hourly and daily scale at <inline-formula><mml:math id="M250" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M251" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M252" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>. Concerning the slope, the aforementioned issues are illustrated here by the typically low skills of both AN and (to a slightly lesser extent) GBM methods, often worse than the other MOS methods.</p>
      <p id="d2e3825">Therefore, on this set of continuous metrics, the impact of the MOS corrections on the performance strongly varies with the method considered. Among the different MOS methods, KF seems to give the most balanced improvement with biases mostly removed, errors and correlation substantially improved, and variability not too strongly underestimated. However, it is worth noting that since some MOS methods (namely QM, AN and GBM) can ingest increasing quantities of input data over time, we can expect their performance to change (increase) between the beginning of the period, when very limited past data are available, and the end of the period, when more past data have been accumulated. Investigating this aspect would ideally require a proper analysis, comparing the performance obtained over a given period using a variable number of past input data. Here, we simply provide some insights by comparing the relative difference in performance of these MOS methods against RAW (1) when evaluated over the entire 2018–2019 period (i.e., including the beginning of the period of study when MOS methods can only rely on limited past data) and (2) when evaluated only over the year 2019 (i.e., when the first year is discarded). In the first case (evaluation over 2018–2019), the QM, AN and GBM show nRMSE 31 %, 41 % and 44 % lower than RAW, respectively. In the second case (evaluation over 2019), these MOS methods give nRMSE 33 %, 44 % and 49 % lower than RAW. Therefore, this basic comparison suggests that these MOS methods can indeed benefit from a larger number of past data. Here, the change is more pronounced for GBM, which suggests that this MOS method is the one benefiting the most from more past training data. For GBM, this improvement is mainly due to the relatively poor predictions made during the very first months of 2018, when the training dataset was the most limited (see time series in Fig. <xref ref-type="fig" rid="App1.Ch1.S6.F7"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S6"/>).</p>
</sec>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Performance on categorical forecasts</title>
<sec id="Ch1.S3.SS3.SSS1">
  <label>3.3.1</label><title>RAW forecasts</title>
      <p id="d2e3848">Focusing now on the performance for detecting target and information thresholds, Fig. <xref ref-type="fig" rid="Ch1.F3"/> (middle and bottom panels) shows a comprehensive set of metrics, where the most interesting ones are probably CSI and PSS, followed by SR and AUC.</p>
      <p id="d2e3853">The RAW forecast shows low <inline-formula><mml:math id="M253" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M254" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> (very few true positives and false negatives). With an intermediate SR (0.45; i.e., only 45 % of the exceedances predicted by RAW indeed occur), it can be seen as a moderately “conservative” forecast for target thresholds (d8max O<sub>3</sub> above 60 <inline-formula><mml:math id="M256" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>); the term “conservative” here refers to forecasting systems that predict exceedances only with strong evidence (it thus predicts very few exceedances but with a moderate confidence). Despite showing a reasonably good AUC, the RAW forecast strongly fails at reproducing high O<sub>3</sub> mixing ratios, as illustrated by the low FB (0.25; i.e., RAW predicts 4 times fewer exceedances than the observations), and finally shows the worst performance in terms of CSI (0.10) or PSS (0.15). In comparison, the PERS(1) reference forecast provides better detection skills regarding target thresholds. This is especially true at short lead days, but the performance then quickly decreases with the lead time, with <inline-formula><mml:math id="M258" display="inline"><mml:mi mathvariant="normal">CSI</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M259" display="inline"><mml:mi mathvariant="normal">PSS</mml:mi></mml:math></inline-formula> reduced from about <inline-formula><mml:math id="M260" display="inline"><mml:mn mathvariant="normal">0.27</mml:mn></mml:math></inline-formula> and <inline-formula><mml:math id="M261" display="inline"><mml:mn mathvariant="normal">0.42</mml:mn></mml:math></inline-formula> at <inline-formula><mml:math id="M262" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to about <inline-formula><mml:math id="M263" display="inline"><mml:mn mathvariant="normal">0.14</mml:mn></mml:math></inline-formula> and <inline-formula><mml:math id="M264" display="inline"><mml:mn mathvariant="normal">0.23</mml:mn></mml:math></inline-formula> at <inline-formula><mml:math id="M265" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>. Except FB, all categorical metrics show a similarly strong sensitivity to the lead time. With PERS(1) taken as a reference, the skill scores of RAW clearly show negative and positive values for <inline-formula><mml:math id="M266" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M267" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>, respectively (i.e., it predicts fewer true exceedances but produces fewer false alarms). The consequence in terms of SR skills is positive but only beyond <inline-formula><mml:math id="M268" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. With positive skills on AUC, RAW is able to discriminate exceedances and non-exceedances slightly better than PERS(1), but only beyond <inline-formula><mml:math id="M269" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>. However, its skills on the important CSI and PSS metrics are strongly negative at all lead times, which highlights its overall deficiency for correctly predicting the exceedances of the target threshold (i.e., without too many false alarms).</p>
      <p id="d2e4002">Exceedances of the information threshold (d1max O<sub>3</sub> above 90 <inline-formula><mml:math id="M271" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>) appear even more difficult to capture for the RAW forecast, with CSI and PSS typically below 0.02. However, given that it is also more difficult for PERS(1) to capture these exceedances, the skills of RAW on these two metrics are substantially better (although still negative) on this information threshold compared to the target threshold. Results also show much better SR, especially at the longest lead times (i.e., most of the predicted exceedances indeed occur), but this apparently good result has to be put in front of the extremely low <inline-formula><mml:math id="M272" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> (i.e., RAW almost never predict exceedances).</p>
</sec>
<sec id="Ch1.S3.SS3.SSS2">
  <label>3.3.2</label><title>MOS-corrected forecasts</title>
      <p id="d2e4037">Although the RAW forecast alone shows quite limited skills for predicting high O<sub>3</sub> exceedances, its potential usefulness is nicely illustrated by the results obtained when it is combined with observations, such as in MA(1), QM or KF(RMSE). When considering the target threshold exceedances, CSI and PSS are indeed greatly improved with these last MOS methods and to a lesser extent by the two other methods, AN(10) and GBM. KF(RMSE), AN(10) and GBM clearly appear as the most “conservative” MOS approaches here, with relatively low <inline-formula><mml:math id="M274" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M275" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> but strong SR. In other terms, they predict fewer exceedances but with a higher reliability. In terms of skill scores, all these MOS-corrected forecasts always have better skills than RAW. However, only MA(1) always beats PERS(1) at all lead times, while the other MOS methods provide positive skills only beyond <inline-formula><mml:math id="M276" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M277" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>. This MA(1) method thus clearly outperforms the other methods at <inline-formula><mml:math id="M278" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, while differences in performance are reduced when considering longer lead times. At longer lead times, the ranking between these different MOS methods varies substantially depending on the considered metric, with MA(1), KF(RMSE) and GBM showing the best skills on CSI and MA(1) and QM showing the best skills on PSS.</p>
      <p id="d2e4100">However, when considering the detection of the information threshold, the KF(RMSE), AN(10) and GBM methods still benefit from a strong SR but are missing too many of the observed exceedances, which leads to a dramatic deterioration of both CSI and PSS. As for RAW, this means that there is a high chance that an exceedance predicted by these methods indeed occurs, but such exceedances are too rarely predicted. Most of their skill scores on PSI are found to be negative, while only a few positive skills are obtained on CSI for specific timescales in KF and GBM methods. For detecting such high O<sub>3</sub> values, the best methods are finally MA(1) for the shortest lead times. At longer lead times, the skills of MA(1) quickly deteriorate, and the best skills are finally obtained for QM. Both methods reproduce fairly well the geographical distribution of high-O<sub>3</sub> episodes (PERS(1) reproduces it perfectly, by construction), as shown in Fig. <xref ref-type="fig" rid="Ch1.F5"/>, but still with very low SR (below 0.25 for exceedances of the information threshold).</p>

      <fig id="Ch1.F5"><label>Figure 5</label><caption><p id="d2e4125">Similar to Fig. <xref ref-type="fig" rid="Ch1.F1"/> but for observations and <inline-formula><mml:math id="M281" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> O<sub>3</sub> forecasts corrected with MA(1) and QM methods.</p></caption>
            <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f05.png"/>

          </fig>

<table-wrap id="Ch1.T2" specific-use="star"><label>Table 2</label><caption><p id="d2e4161">Evaluation of the different forecast methods on categorical metrics, at <inline-formula><mml:math id="M283" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> (and <inline-formula><mml:math id="M284" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula> in parentheses), for both target and information thresholds.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="9">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Timescale</oasis:entry>
         <oasis:entry colname="col2">Forecast</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M285" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M286" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">SR</oasis:entry>
         <oasis:entry colname="col6">CSI</oasis:entry>
         <oasis:entry colname="col7">PSS</oasis:entry>
         <oasis:entry colname="col8">AUC</oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M287" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">and threshold</oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6"/>
         <oasis:entry colname="col7"/>
         <oasis:entry colname="col8"/>
         <oasis:entry colname="col9"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">d8max <inline-formula><mml:math id="M288" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 60</oasis:entry>
         <oasis:entry colname="col2">GBM</oasis:entry>
         <oasis:entry colname="col3">0.30 (0.23)</oasis:entry>
         <oasis:entry colname="col4">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col5">0.72 (0.67)</oasis:entry>
         <oasis:entry colname="col6">0.27 (0.21)</oasis:entry>
         <oasis:entry colname="col7">0.29 (0.23)</oasis:entry>
         <oasis:entry colname="col8">0.95 (0.93)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">AN(10)</oasis:entry>
         <oasis:entry colname="col3">0.31 (0.24)</oasis:entry>
         <oasis:entry colname="col4">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col5">0.73 (0.66)</oasis:entry>
         <oasis:entry colname="col6">0.28 (0.22)</oasis:entry>
         <oasis:entry colname="col7">0.30 (0.24)</oasis:entry>
         <oasis:entry colname="col8">0.95 (0.94)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">KF(RMSE)</oasis:entry>
         <oasis:entry colname="col3">0.40 (0.30)</oasis:entry>
         <oasis:entry colname="col4">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col5">0.74 (0.67)</oasis:entry>
         <oasis:entry colname="col6">0.35 (0.26)</oasis:entry>
         <oasis:entry colname="col7">0.39 (0.29)</oasis:entry>
         <oasis:entry colname="col8">0.97 (0.95)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QM</oasis:entry>
         <oasis:entry colname="col3">0.47 (0.40)</oasis:entry>
         <oasis:entry colname="col4">0.02 (0.02)</oasis:entry>
         <oasis:entry colname="col5">0.47 (0.43)</oasis:entry>
         <oasis:entry colname="col6">0.31 (0.26)</oasis:entry>
         <oasis:entry colname="col7">0.44 (0.37)</oasis:entry>
         <oasis:entry colname="col8">0.94 (0.92)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">MA(1)</oasis:entry>
         <oasis:entry colname="col3">0.62 (0.39)</oasis:entry>
         <oasis:entry colname="col4">0.02 (0.02)</oasis:entry>
         <oasis:entry colname="col5">0.57 (0.44)</oasis:entry>
         <oasis:entry colname="col6">0.42 (0.26)</oasis:entry>
         <oasis:entry colname="col7">0.59 (0.36)</oasis:entry>
         <oasis:entry colname="col8">0.96 (0.92)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PERS(1)</oasis:entry>
         <oasis:entry colname="col3">0.51 (0.27)</oasis:entry>
         <oasis:entry colname="col4">0.02 (0.03)</oasis:entry>
         <oasis:entry colname="col5">0.51 (0.27)</oasis:entry>
         <oasis:entry colname="col6">0.34 (0.15)</oasis:entry>
         <oasis:entry colname="col7">0.49 (0.23)</oasis:entry>
         <oasis:entry colname="col8">0.95 (0.84)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RAW</oasis:entry>
         <oasis:entry colname="col3">0.17 (0.13)</oasis:entry>
         <oasis:entry colname="col4">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col5">0.45 (0.41)</oasis:entry>
         <oasis:entry colname="col6">0.14 (0.11)</oasis:entry>
         <oasis:entry colname="col7">0.16 (0.12)</oasis:entry>
         <oasis:entry colname="col8">0.90 (0.88)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">dd8max <inline-formula><mml:math id="M289" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 60</oasis:entry>
         <oasis:entry colname="col2">GBM</oasis:entry>
         <oasis:entry colname="col3">0.39 (0.33)</oasis:entry>
         <oasis:entry colname="col4">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col5">0.65 (0.60)</oasis:entry>
         <oasis:entry colname="col6">0.32 (0.27)</oasis:entry>
         <oasis:entry colname="col7">0.38 (0.32)</oasis:entry>
         <oasis:entry colname="col8">0.95 (0.94)</oasis:entry>
         <oasis:entry colname="col9">286 803</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">AN(10)</oasis:entry>
         <oasis:entry colname="col3">0.36 (0.29)</oasis:entry>
         <oasis:entry colname="col4">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col5">0.69 (0.62)</oasis:entry>
         <oasis:entry colname="col6">0.31 (0.25)</oasis:entry>
         <oasis:entry colname="col7">0.35 (0.28)</oasis:entry>
         <oasis:entry colname="col8">0.96 (0.94)</oasis:entry>
         <oasis:entry colname="col9">286 803</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">KF(RMSE)</oasis:entry>
         <oasis:entry colname="col3">0.46 (0.34)</oasis:entry>
         <oasis:entry colname="col4">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col5">0.71 (0.62)</oasis:entry>
         <oasis:entry colname="col6">0.39 (0.28)</oasis:entry>
         <oasis:entry colname="col7">0.46 (0.33)</oasis:entry>
         <oasis:entry colname="col8">0.97 (0.95)</oasis:entry>
         <oasis:entry colname="col9">286 803</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QM</oasis:entry>
         <oasis:entry colname="col3">0.44 (0.38)</oasis:entry>
         <oasis:entry colname="col4">0.02 (0.02)</oasis:entry>
         <oasis:entry colname="col5">0.47 (0.43)</oasis:entry>
         <oasis:entry colname="col6">0.29 (0.25)</oasis:entry>
         <oasis:entry colname="col7">0.42 (0.35)</oasis:entry>
         <oasis:entry colname="col8">0.94 (0.92)</oasis:entry>
         <oasis:entry colname="col9">286 803</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">MA(1)</oasis:entry>
         <oasis:entry colname="col3">0.60 (0.38)</oasis:entry>
         <oasis:entry colname="col4">0.02 (0.02)</oasis:entry>
         <oasis:entry colname="col5">0.59 (0.46)</oasis:entry>
         <oasis:entry colname="col6">0.42 (0.26)</oasis:entry>
         <oasis:entry colname="col7">0.58 (0.36)</oasis:entry>
         <oasis:entry colname="col8">0.97 (0.92)</oasis:entry>
         <oasis:entry colname="col9">286 803</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PERS(1)</oasis:entry>
         <oasis:entry colname="col3">0.51 (0.27)</oasis:entry>
         <oasis:entry colname="col4">0.02 (0.04)</oasis:entry>
         <oasis:entry colname="col5">0.50 (0.27)</oasis:entry>
         <oasis:entry colname="col6">0.34 (0.16)</oasis:entry>
         <oasis:entry colname="col7">0.49 (0.24)</oasis:entry>
         <oasis:entry colname="col8">0.95 (0.84)</oasis:entry>
         <oasis:entry colname="col9">286 803</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RAW</oasis:entry>
         <oasis:entry colname="col3">0.14 (0.11)</oasis:entry>
         <oasis:entry colname="col4">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col5">0.45 (0.42)</oasis:entry>
         <oasis:entry colname="col6">0.12 (0.09)</oasis:entry>
         <oasis:entry colname="col7">0.14 (0.10)</oasis:entry>
         <oasis:entry colname="col8">0.89 (0.88)</oasis:entry>
         <oasis:entry colname="col9">286 803</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">d1max <inline-formula><mml:math id="M290" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 90</oasis:entry>
         <oasis:entry colname="col2">GBM</oasis:entry>
         <oasis:entry colname="col3">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">1.00 (nan)</oasis:entry>
         <oasis:entry colname="col6">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col7">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col8">0.93 (0.92)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">AN(10)</oasis:entry>
         <oasis:entry colname="col3">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.50 (1.00)</oasis:entry>
         <oasis:entry colname="col6">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col7">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col8">0.95 (0.91)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">KF(RMSE)</oasis:entry>
         <oasis:entry colname="col3">0.02 (0.01)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.50 (1.00)</oasis:entry>
         <oasis:entry colname="col6">0.01 (0.01)</oasis:entry>
         <oasis:entry colname="col7">0.02 (0.01)</oasis:entry>
         <oasis:entry colname="col8">0.96 (0.95)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QM</oasis:entry>
         <oasis:entry colname="col3">0.13 (0.11)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.19 (0.19)</oasis:entry>
         <oasis:entry colname="col6">0.09 (0.08)</oasis:entry>
         <oasis:entry colname="col7">0.13 (0.11)</oasis:entry>
         <oasis:entry colname="col8">0.94 (0.93)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">MA(1)</oasis:entry>
         <oasis:entry colname="col3">0.24 (0.08)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.21 (0.13)</oasis:entry>
         <oasis:entry colname="col6">0.12 (0.05)</oasis:entry>
         <oasis:entry colname="col7">0.24 (0.07)</oasis:entry>
         <oasis:entry colname="col8">0.96 (0.94)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PERS(1)</oasis:entry>
         <oasis:entry colname="col3">0.12 (0.06)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.12 (0.06)</oasis:entry>
         <oasis:entry colname="col6">0.07 (0.03)</oasis:entry>
         <oasis:entry colname="col7">0.12 (0.06)</oasis:entry>
         <oasis:entry colname="col8">0.95 (0.82)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RAW</oasis:entry>
         <oasis:entry colname="col3">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.00 (1.00)</oasis:entry>
         <oasis:entry colname="col6">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M291" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.00</mml:mn></mml:mrow></mml:math></inline-formula> (0.00)</oasis:entry>
         <oasis:entry colname="col8">0.93 (0.92)</oasis:entry>
         <oasis:entry colname="col9">295 617</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">dd1max <inline-formula><mml:math id="M292" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 90</oasis:entry>
         <oasis:entry colname="col2">GBM</oasis:entry>
         <oasis:entry colname="col3">0.07 (0.02)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.57 (0.67)</oasis:entry>
         <oasis:entry colname="col6">0.06 (0.02)</oasis:entry>
         <oasis:entry colname="col7">0.07 (0.02)</oasis:entry>
         <oasis:entry colname="col8">0.96 (0.95)</oasis:entry>
         <oasis:entry colname="col9">288 980</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">AN(10)</oasis:entry>
         <oasis:entry colname="col3">0.02 (0.01)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.67 (1.00)</oasis:entry>
         <oasis:entry colname="col6">0.02 (0.01)</oasis:entry>
         <oasis:entry colname="col7">0.02 (0.01)</oasis:entry>
         <oasis:entry colname="col8">0.96 (0.93)</oasis:entry>
         <oasis:entry colname="col9">288 980</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">KF(RMSE)</oasis:entry>
         <oasis:entry colname="col3">0.09 (0.02)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.68 (0.50)</oasis:entry>
         <oasis:entry colname="col6">0.09 (0.02)</oasis:entry>
         <oasis:entry colname="col7">0.09 (0.02)</oasis:entry>
         <oasis:entry colname="col8">0.96 (0.95)</oasis:entry>
         <oasis:entry colname="col9">288 980</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QM</oasis:entry>
         <oasis:entry colname="col3">0.17 (0.14)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.19 (0.18)</oasis:entry>
         <oasis:entry colname="col6">0.10 (0.08)</oasis:entry>
         <oasis:entry colname="col7">0.17 (0.14)</oasis:entry>
         <oasis:entry colname="col8">0.93 (0.92)</oasis:entry>
         <oasis:entry colname="col9">288 980</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">MA(1)</oasis:entry>
         <oasis:entry colname="col3">0.25 (0.06)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.24 (0.11)</oasis:entry>
         <oasis:entry colname="col6">0.14 (0.04)</oasis:entry>
         <oasis:entry colname="col7">0.25 (0.06)</oasis:entry>
         <oasis:entry colname="col8">0.96 (0.94)</oasis:entry>
         <oasis:entry colname="col9">288 980</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PERS(1)</oasis:entry>
         <oasis:entry colname="col3">0.13 (0.06)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">0.12 (0.06)</oasis:entry>
         <oasis:entry colname="col6">0.07 (0.03)</oasis:entry>
         <oasis:entry colname="col7">0.13 (0.06)</oasis:entry>
         <oasis:entry colname="col8">0.95 (0.84)</oasis:entry>
         <oasis:entry colname="col9">288 980</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">RAW</oasis:entry>
         <oasis:entry colname="col3">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col4">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col5">nan (nan)</oasis:entry>
         <oasis:entry colname="col6">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col7">0.00 (0.00)</oasis:entry>
         <oasis:entry colname="col8">0.92 (0.91)</oasis:entry>
         <oasis:entry colname="col9">288 980</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e4188">nan: not a number.</p></table-wrap-foot></table-wrap>

</sec>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Sensitivity tests</title>
      <p id="d2e5186">Each of the forecast methods considered in this study relies on a specific configuration, e.g., the time window of PERS or MA methods, the metric used internally in KF for optimizing the variance ratio, the number of analogs taken into account in AN, the choice of input features, or metrics used internally for fitting the ML model in GBM. This configuration can substantially influence their general performance, although in a different way depending on the metric used. In the previous sections, we evaluated the performance of these different methods considering a relatively simple baseline configuration. In this section, we discuss some of these choices and investigate their impact on the performance through different sensitivity tests. Corresponding statistical results on continuous and categorical metrics are given in the tables in the Supplement.</p>
<sec id="Ch1.S3.SS4.SSS1">
  <label>3.4.1</label><title>Persistence method</title>
      <p id="d2e5196">The persistence method with a 1 <inline-formula><mml:math id="M293" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> time window (PERS(1)) provides a reference forecast for assessing the skill scores on the different RAW and MOS-corrected forecasts. Here we explore how the time window, from 1 to 10 <inline-formula><mml:math id="M294" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> (hereafter referred to as PERS(<inline-formula><mml:math id="M295" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>), with <inline-formula><mml:math id="M296" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> the window in days), impacts the performance of this PERS forecast. Results are shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S7.F8"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S7"/>.</p>
      <p id="d2e5234">Increasing the window leads to a growing negative bias on d1max and d8max scales that can be substantially reduced when working at dd1max and dd8max scales, i.e., when applying the PERS approach directly on daily 1 and 8 h maxima rather than on the hourly time series. The differences between the two approaches originate from the day-to-day variability in the hour of the day when O<sub>3</sub> mixing ratios peak. For illustration purposes, let us assume that O<sub>3</sub> peaks between 15 and 17 h; on a given day, O<sub>3</sub> mixing ratios at 15, 16 and 17 h reach 50, 60 and 50 <inline-formula><mml:math id="M300" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula> and on the following day 70, 70 and 80 <inline-formula><mml:math id="M301" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>. Then, the PERS(2)<sub>dd1max</sub> O<sub>3</sub> would be 70 <inline-formula><mml:math id="M304" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula> (mean of 60 and 80 <inline-formula><mml:math id="M305" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>), while the PERS(2)<sub>d1max</sub> O<sub>3</sub> would be only 65 <inline-formula><mml:math id="M308" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula> (maximum of the mean diurnal profile of these 2 d, in this case 60, 65 and 65 <inline-formula><mml:math id="M309" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>). Conversely, both nRMSE and PCC can be slightly improved with longer windows, but at the cost of a growing underestimation of the variability. As a consequence, both <inline-formula><mml:math id="M310" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M311" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> are slightly reduced, which means that PERS forecasts become more “conservative” with longer windows. The impact on SR for detecting exceedances of the target threshold is low for short lead times but positive for the longest ones. Interestingly, for information thresholds, the best SRs are obtained around 4–7 <inline-formula><mml:math id="M312" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula>. However and more importantly, using longer windows deteriorates the general performance of the forecast, as shown by the decrease in both CSI and PSS, especially at short lead times. Interestingly, there are also important differences in terms of AUC for detecting exceedances of the target threshold depending on the lead day, ranging from a decrease in AUC with longer windows at <inline-formula><mml:math id="M313" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to an increase at <inline-formula><mml:math id="M314" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e5407">Therefore, for detecting exceedances, considering PSS and/or CSI as the most relevant metrics, the PERS method shows its best performance for a time window of 1 <inline-formula><mml:math id="M315" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula>. However, it gives very “liberal” O<sub>3</sub> forecasts with rather poor SR. The term “liberal” is borrowed here from <xref ref-type="bibr" rid="bib1.bibx15" id="text.22"/> to designate forecasting systems that predict exceedances with weak evidence, in opposition with the aforementioned term “conservative”. Longer time windows can improve SR but result in an important deterioration of CSI and PSS, particularly for the shorter lead times (<inline-formula><mml:math id="M317" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M318" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>).</p>
</sec>
<sec id="Ch1.S3.SS4.SSS2">
  <label>3.4.2</label><title>Moving average method</title>
      <p id="d2e5462">Here, a sensitivity test is performed on MA with windows ranging between 1 and 10 <inline-formula><mml:math id="M319" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> (hereafter referred to as MA(<inline-formula><mml:math id="M320" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>), with <inline-formula><mml:math id="M321" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> the window in days). Results are shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S7.F9"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S7"/>. Increasing the window length impacts the MA performance in a very similar way to PERS, especially for continuous metrics. Regarding the detection of the target threshold, the main noticeable difference is the absence of strong deterioration of some metrics like AUC, SR or CSI for shorter lead times. Regarding the detection of the information threshold, the clearest difference with PERS concerns the SR that substantially improves when considering longer windows. However, the deterioration of both CSI and PSS persists.</p>
      <p id="d2e5491">Therefore, the detection of O<sub>3</sub> exceedances with the MA method shows its best performance with the shortest windows (1 <inline-formula><mml:math id="M323" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula>). As for PERS, the corresponding forecasts are quite liberal with low SR. However, in contrast to PERS, the SR associated with high thresholds can be substantially improved when using longer windows, which may be an interesting option if the corresponding deterioration of CSI and PSS is seen as acceptable.</p>
</sec>
<sec id="Ch1.S3.SS4.SSS3">
  <label>3.4.3</label><title>Kalman filter method</title>
      <p id="d2e5519">As explained in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS3"/> (and Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>), the behavior of the KF intrinsically depends on the <inline-formula><mml:math id="M324" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">η</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> ratio chosen. So far, this parameter has been adjusted dynamically (and updated regularly) to optimize the RMSE of past data. Here, a sensitivity test is performed with alternative strategies in which the variance ratio is chosen to optimize the SR, CSI, PSS or AUC with threshold values of 60 or 90 <inline-formula><mml:math id="M325" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula> (hereafter referred to as SR-60, SR-90, CSI-60, CSI-90, PSS-60, PSS-90, AUC-60 and AUC-90). The objective is to investigate the extent to which tuning the KF algorithm with appropriate categorical metrics allows improving the exceedance detection skills.</p>
      <p id="d2e5556">Results (Fig. <xref ref-type="fig" rid="App1.Ch1.S7.F10"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S7"/>) show that this tuning strategy barely impacts the performance obtained on continuous metrics, except for CSI-60 and PSS-60 that show slightly deteriorated RMSE and PCC. Only small differences are also found on target threshold exceedances, except again with these two methods that show slightly improved <inline-formula><mml:math id="M326" display="inline"><mml:mi mathvariant="normal">CSI</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M327" display="inline"><mml:mi mathvariant="normal">PSS</mml:mi></mml:math></inline-formula> at short lead time. Results on information threshold exceedances show more variability depending on the timescale, but both CSI and PSS can typically be improved when used internally in the KF procedure, although often only at short lead times. The choice of the threshold in this optimizing metric leads to more ambiguous results. For instance, besides giving the best PSS on the target threshold,  KF(PSS-60) also gives better results than KF(PSS-90) on the information threshold. Reasons behind this behavior are not clear but may be due to some instabilities brought into PSS-90 by the rareness of such exceedances. Indeed, a common and well-known issue of PSS (as well as CSI and most other categorical metrics) is that it degenerates to trivial values (either 0 or 1) for rare events: as the frequency of the event decreases, the numbers of hits (a), false alarms (b) and missed exceedances (c) all decay toward zero but typically at different rates, which causes the metric to take meaningless values (either 0 or 1 in the case of PSS) <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx16" id="paren.23"/>. All in all, the performance for detecting such high O<sub>3</sub> concentrations remains very poor, especially far in time, but this sensitivity test demonstrates that choosing an appropriate tuning strategy can help to slightly improve the detection skills at a potential cost in terms of continuous metrics.</p>
</sec>
<sec id="Ch1.S3.SS4.SSS4">
  <label>3.4.4</label><title>Analog method</title>
      <p id="d2e5599">The AN method identifies the closest analog days to estimate the corresponding prediction and thus depends on the number of analog days taken into account. We performed a sensitivity test with 1, 5, 10, 15, 20, 25 and 30 analog days (hereafter referred to as <inline-formula><mml:math id="M329" display="inline"><mml:mrow><mml:mi mathvariant="normal">AN</mml:mi><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M330" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> the number of analogs). Results are shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S7.F11"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S7"/>.</p>
      <p id="d2e5627">Although the best slopes are found with the smallest number of analogs, the best nRMSE and PCC are obtained using around 5–15 analogs. Using too many analogs increases the underestimation of the variability and deteriorates the slope. Regarding the detection of target thresholds, increasing the number of analogs makes the forecast more “conservative” (lower <inline-formula><mml:math id="M331" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M332" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>, higher SR) and deteriorates the CSI and PSS. When focusing on information threshold exceedances, the AN forecasts based on 10 analogs or more never reach such high O<sub>3</sub> values. The highest CSI and PSS are finally obtained with one single analog.</p>
      <p id="d2e5653">Therefore, similarly to PERS and MA methods that reached their best skills for the shortest time windows, with AN the best CSI and PSS skills are obtained when using the lowest number of analogs (with a cost in the continuous metrics, as for PERS and MA). Computing the AN-corrected O<sub>3</sub> mixing ratios based on a larger number of analogs gives smoother predictions, and our choice to weight the average by the distance to the different analogs is unable to substantially mitigate this issue.</p>
</sec>
<sec id="Ch1.S3.SS4.SSS5">
  <label>3.4.5</label><title>Gradient boosting machine method</title>
      <p id="d2e5673">Although GBM gives among the best RMSE and PCC, it strongly underestimates the variability in O<sub>3</sub> mixing ratios, with critical consequences in terms of detection skills, especially for the highest thresholds (e.g.,  d1max <inline-formula><mml:math id="M336" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 90 <inline-formula><mml:math id="M337" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>). This is at least partly due to the low frequency of occurrence of such episodes and their corresponding low weight in the entire population of points used for the training. One way of mitigating this issue consists of specifying different weights to the different training instances. This aims at forcing the GBM model to better predict the instances of higher weight, at the cost of a potential deterioration of the performance on the instances of lower weight.</p>
      <p id="d2e5700">In order to assess the extent to which it may improve the performance of the GBM MOS method, we test here different weighting strategies. At each training phase, we compute the absolute distance <inline-formula><mml:math id="M338" display="inline"><mml:mi>D</mml:mi></mml:math></inline-formula> between all observed O<sub>3</sub> mixing ratio instances and the mean O<sub>3</sub> mixing ratio (averaged over the entire training dataset). Then several sensitivity tests are performed, weighting the training data by <inline-formula><mml:math id="M341" display="inline"><mml:mi>D</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M342" display="inline"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M343" display="inline"><mml:mrow><mml:msup><mml:mi>D</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, respectively (hereafter referred to as GBM(W), GBM(W2) and GBM(W3), respectively). Using such weights, we want the GBM model to better predict the lower and upper tails of the O<sub>3</sub> distribution in order to better represent the variability in the O<sub>3</sub> mixing ratios. Given that the O<sub>3</sub> mixing ratio distribution is typically positively skewed, the highest weights are put on the strongest positive deviations from the mean.</p>
      <p id="d2e5785">As a parallel sensitivity test, we explore the performance of these different ML models but remove the input feature corresponding to the previous (1 d before) observed O<sub>3</sub> mixing ratio (hereafter referred to as GBM(noO), GBM(noO,W), GBM(noO,W2) and GBM(noO,W3)). This additional test is of interest for operational purposes since O<sub>3</sub> observations are not always available in near real time. Results are shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S7.F12"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S7"/>.</p>
      <p id="d2e5810">As expected, the results highlight a deterioration of the RMSE and PCC combined with an improvement in the slope and nMSDB. The negative bias affecting the variability with the unweighted GBM is substantially reduced when using weights, although too-strong weights (as in GBM(W3) for instance) can lead to a slight overestimation of the variability at specific timescales.</p>
      <p id="d2e5814">Regarding the skills for detecting target threshold exceedances, stronger weights typically increase both <inline-formula><mml:math id="M349" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M350" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> and improve the (underestimated) FB but deteriorate the SR and AUC (the forecasts become more liberal). Regarding the more balanced metrics (of strongest interest here), adding more weights on the tails of the O<sub>3</sub> distribution typically has a positive although small impact on CSI and PSS. Regarding the detection of information threshold exceedances, both CSI and PSS can also be slightly improved by adding some weight into the GBM, but the performance for detecting such high O<sub>3</sub> values remains relatively low. The interest of using the O<sub>3</sub> concentration observed 1 d before is found here to be limited.</p>
      <p id="d2e5858">Therefore, adopting an appropriate weighting strategy is simple yet effective for achieving slightly better O<sub>3</sub> exceedance detection skills in exchange for a reasonable deterioration in RMSE and PCC. Overall, the improvements are relatively small, but still valuable given the initially very low detection skills for the strongest O<sub>3</sub> episodes.</p>
</sec>
<sec id="Ch1.S3.SS4.SSS6">
  <label>3.4.6</label><title>Influence of the meteorological input data in AN and GBM methods</title>
      <p id="d2e5887">In the previous sections, O<sub>3</sub> corrections with AN and GBM methods relied on HRES meteorological forecasts. Here, we investigate the impact of using alternative meteorological data, namely the ERA5 meteorological reanalysis. For both AN and GBM methods, the MOS-corrected O<sub>3</sub> mixing ratios obtained with these two meteorological datasets are very similar, with PCC above 0.95. The results obtained against observations are shown in Fig. <xref ref-type="fig" rid="App1.Ch1.S7.F13"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S7"/>, for the AN(1), AN(5), AN(10) and GBM methods. Since O<sub>3</sub> predictions are close, the statistical performance against observations is also very consistent between both meteorological datasets. For both continuous and categorical metrics, the performance obtained with HRES data is found to be slightly lower than with ERA5. Discrepancies between both meteorological datasets tend to increase with lead time, with GBM being slightly more sensitive to the meteorological input data than AN.</p>
      <p id="d2e5921">Therefore, this experiment highlights a relatively low sensitivity of both AN and GBM methods to the two meteorological datasets tested here. The very similar results obtained with IFS and ERA5 meteorological input data are likely not explained by the fact that both datasets give very similar values for the different meteorological variables, but rather by the intrinsic characteristics of both AN and GBM methods. The AN method makes use of the meteorological data only to identify past days with more or less similar meteorological conditions and can thus handle to some extent the presence of biases in meteorological variables as far as they are systematic (and thus do not impact the identification of the analogs). On the other hand, the GBM method uses past information to learn the complex relationship between O<sub>3</sub> mixing ratios and the other ancillary features. Although the better the input data, the higher the chances are to fit a reliable model for predicting O<sub>3</sub>, the GBM models can also indirectly learn at least part of the potential errors affecting some meteorological variables and how they relate to O<sub>3</sub> mixing ratios. Therefore, the presence of biases in some of the ancillary features is not expected to strongly impact the performance of the predictions.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <label>4</label><title>Discussion and conclusions</title>
      <p id="d2e5961">We demonstrated the strong impact of MOS methods to enhance raw CAMS O<sub>3</sub> forecasts, not only by removing potential systematic biases but also by correcting other issues related to the distribution of and/or variability in O<sub>3</sub> mixing ratios. All MOS approaches were indeed able to substantially improve at least some aspects of the RAW O<sub>3</sub> forecasts, first and foremost the RMSE and PCC, for which the strongest improvements are obtained with the most sophisticated MOS methods like KF, AN or GBM. However, although all MOS methods were able to increase the underestimated variability in O<sub>3</sub> mixing ratios of RAW, the strongest improvements in slope and nMSDB were obtained with more simple MOS methods like MA or QM. O<sub>3</sub> mixing ratios corrected with AN, GBM and to a lesser extent KF remained too smooth, and such a deficiency has a major impact on the detection skills for high O<sub>3</sub> thresholds. All in all, the best PSS and CSI are usually obtained with the more simple MOS methods. Therefore, there is a clear trade-off between the continuous and categorical skills scores, as also shown by the different sensitivity tests. The quality of a MOS-corrected forecast assessed solely based on metrics like RMSE or PCC thus tells little about the forecast value, here understood as information a user can benefit from to make better decisions, notably for mitigating O<sub>3</sub> short-term episodes.</p>
      <p id="d2e6028">More generally, our study highlights the complexity of identifying the “best” MOS method given the multiple dimensions of the problem. The relative performance of the MOS methods can vary depending on the metric used, the threshold considered in the case of categorical metrics (or more specifically the base rate), the timescale at which MOS corrections are computed and/or evaluated, or the lead time. Other dimensions not covered by this study, like the seasonality of the performance, are also susceptible to shedding a different light on the intercomparison.</p>
      <p id="d2e6031">Among the continuous metrics, both RMSE and PCC provide initial valuable information on the performance of a MOS method. However, a MOS method can give the best RMSE and PCC, yet the poorest high O<sub>3</sub> detection skills. This was the case of the unweighted GBM method. Continuous metrics like the model-versus-observation linear slope or nMSDB provide important complementary information, potentially less misleading, especially in a context where the final objective is to predict episodes of strong O<sub>3</sub>. Among the categorical metrics, although results were presented on a relatively large set of metrics, not all metrics benefit from the same properties. PSS may be considered to be one of the most valuable, notably due to its independence from the base rate, in contrast to CSI. Such a property is particularly useful when comparing scores over different regions and/or time periods where the frequency of observed exceedances might vary, for instance due to different emission forcing and/or meteorological conditions. In an operational context where statistical metrics are continuously monitored, the independence from the base rate is an interesting property because it may change with time, which prevents a consistent comparison between different periods. However, a well-known issue of both PSS and CSI (as well as many other categorical metrics) is that they degenerate to trivial values (either 0 or 1) as events become rarer <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx16" id="paren.24"/>, which should restrict their use to the detection of not-too-rare (and therefore not-too-high) O<sub>3</sub> episodes. In this study, the base rate of the target threshold was likely sufficiently high (<inline-formula><mml:math id="M372" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula> around 5 %), but we were probably already at the limit regarding the information threshold (<inline-formula><mml:math id="M373" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula> around 0.1 %). All in all, the selection of the evaluation metrics depends on the subjective choices and intended use and is fundamentally a cost–loss problem where the user should arbitrate between the cost of missing exceedances and predicting false alarms.</p>
      <p id="d2e6079">The performance of the RAW forecasts was found to be only slightly sensitive to the lead day, but this sensitivity was substantially stronger with some MOS methods (although lower than for the persistence method). This aspect is important, although different users may have different needs in terms of lead time, depending on the intended use of the AQ forecast. Forecasts at <inline-formula><mml:math id="M374" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> may already be useful for some applications like warning the vulnerable population in advance so that they could adapt their outdoor activities. However, implementing short-term emission reduction measures at the local scale usually goes through decisions taken at different administrative and political levels and thus typically requires forecasts at least at <inline-formula><mml:math id="M375" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>. If such measures would have to be taken at a larger scale, the occurrence of O<sub>3</sub> episodes would probably need to be forecasted even more in advance.</p>
      <p id="d2e6116">We saw that some forecast methods like PERS or MA can provide a reasonable performance at <inline-formula><mml:math id="M377" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> but quickly deteriorate when looking further in the future (while other methods like GBM, AN or QM were less impacted by the lead time). Actually, the performance of our PERS(1) reference forecast obviously depends on the typical duration of O<sub>3</sub> episodes over the region of study; one (single) episode is defined here as a suite of successive days showing an exceedance of a given threshold at a given station. Over the Iberian Peninsula domain in 2018–2019, considering the target threshold (d8max <inline-formula><mml:math id="M379" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 60 <inline-formula><mml:math id="M380" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>), a total of 6540 such O<sub>3</sub> episodes were observed in the O<sub>3</sub> monitoring network with <inline-formula><mml:math id="M383" display="inline"><mml:mi mathvariant="normal">min</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M384" display="inline"><mml:mi mathvariant="normal">mean</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M385" display="inline"><mml:mi mathvariant="normal">max</mml:mi></mml:math></inline-formula> duration of 1, 2 and 27 <inline-formula><mml:math id="M386" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> (and <inline-formula><mml:math id="M387" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M388" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M389" display="inline"><mml:mrow><mml:mn mathvariant="normal">50</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M390" display="inline"><mml:mrow><mml:mn mathvariant="normal">75</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M391" display="inline"><mml:mrow><mml:mn mathvariant="normal">95</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula> percentiles of 1.0, 1.0, 1.0, 2.0 and 5.0 <inline-formula><mml:math id="M392" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula>). Note the 27 <inline-formula><mml:math id="M393" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> long O<sub>3</sub> exceedance occurred in June–July 2019 about 30 <inline-formula><mml:math id="M395" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">km</mml:mi></mml:mrow></mml:math></inline-formula> north of Madrid (station code <italic>ES1802A</italic>). Considering the information threshold, 240 episodes were observed, with min, mean and max duration of 1, 1.1 and 5 <inline-formula><mml:math id="M396" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> (and <inline-formula><mml:math id="M397" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M398" display="inline"><mml:mrow><mml:mn mathvariant="normal">25</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M399" display="inline"><mml:mrow><mml:mn mathvariant="normal">50</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M400" display="inline"><mml:mrow><mml:mn mathvariant="normal">75</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M401" display="inline"><mml:mrow><mml:mn mathvariant="normal">95</mml:mn><mml:mi mathvariant="normal">th</mml:mi></mml:mrow></mml:math></inline-formula> percentiles of 1.0, 1.0, 1.0, 1.0 and 2.0 <inline-formula><mml:math id="M402" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula>). This may partly explain why the deterioration of performance with lead time was stronger for target thresholds compared to information thresholds.</p>
      <p id="d2e6358">For operational purposes, several important aspects are to be taken into account. A first aspect concerns the input data required by the MOS method. Does the MOS method rely on observations, models or a combination of both? When the method relies on observations, are they needed in near real time? How many historical data are required? When the method relies on historical data, to which extent does the length of the historical dataset impact the performance? Related to this last point, another essential aspect concerns the ability of the MOS method to handle progressive and/or abrupt changes in the AQ forecasting system (e.g., configuration, parameterizations, input data like emissions) and/or in the Earth's atmosphere (long-term trends, anomalous events like the COVID-19-related emission reduction, climate change). In this framework, the year 2020 obviously offers a unique large-scale case study to investigate the behavior of the different MOS methods.</p>
      <p id="d2e6361">MOS methods relying only on very recent data (namely MA and KF methods) are evidently more adaptable to rapid changes, which is a clear asset under changing atmospheric conditions or modeling system configurations. On the other hand, they naturally discard all the potentially useful information available within the historical dataset. Methods like QM, AN or GBM aim at extracting such information to produce better forecasts but implicitly rely on the assumption that these historical data are still up to date and thus representative of the current conditions, which can be too strong a hypothesis when the historical dataset is long, or the emission forcing and/or meteorological conditions are changing rapidly. In this study, we considered a relatively short 2-year dataset, but using a longer training dataset would likely require building specific methodologies to tackle this issue, either by identifying and discarding the potentially outdated data or by giving them a lower weight in the procedure.</p>
      <p id="d2e6364">In this study, we implemented a relatively simple ML-based MOS method. Although the performance on categorical metrics was found to be limited despite encouraging results on continuous metrics, there is likely room for improvements in near-future developments. In order to improve the high O<sub>3</sub> detection skills, potential interesting aspects to explore include testing other types of ML models, customizing loss function and/or cross-validation scores, designing specific weighting strategies and/or re-sampling approaches, or comparing regression and classification ML models for the detection of exceedances. Along the preparation of this study, some of them have been investigated, but more efforts are required to draw firm conclusions regarding their potential for better predicting O<sub>3</sub> episodes. Finally, we focused here on the CAMS regional ensemble, but including the individual CAMS models in the set of ML input features may help to achieve better performance if the ML model is somehow able to learn the variability (in time and space or during specific meteorological conditions) in strengths and weaknesses of each model and build its predictions based on the most appropriate subset of individual models. More generally, the performance of the different MOS methods is expected to vary from one raw model to another. Investigating the performance and behavior of these methods on the different individual models might shed an interesting light on the results obtained here with the ensemble and eventually allow some of our conclusions to be generalized.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title>Quality assurance with GHOST</title>
      <p id="d2e6397">Using the metadata available in GHOST (Globally Harmonised Observational Surface Treatment), a quality assurance screening is applied to O<sub>3</sub> hourly observations, in which the following data are removed: missing measurements (GHOST's flag 0), infinite values (flag 1), negative measurements (flag 2), zero measurements (flag 4), measurements associated with data quality flags given by the data provider which have been decreed by the GHOST project architects to suggest the measurements are associated with substantial uncertainty or bias (flag 6), measurements for which no valid data remain to average in temporal window after screening by key QA flags (flag 8), measurements showing persistently recurring values (rolling seven out of nine data points; flag 10), concentrations greater than a scientifically feasible limit (above 5000 <inline-formula><mml:math id="M406" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>) (flag 12), measurements detected as distributional outliers using adjusted boxplot analysis (flag 13), measurements manually flagged as too extreme (flag 14), data with too coarse reported measurement resolution (above 1.0 <inline-formula><mml:math id="M407" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>) (flag 17), data with too coarse empirically derived measurement resolution (above 1.0 <inline-formula><mml:math id="M408" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">ppbv</mml:mi></mml:mrow></mml:math></inline-formula>) (flag 18), measurements below the reported lower limit of detection (flag 22), measurements above the reported upper limit of detection (flag 25), measurements with inappropriate primary sampling for preparing NO<sub>2</sub> for subsequent measurement (flag 40), measurements with inappropriate sample preparation for preparing NO<sub>2</sub> for subsequent measurement (flag 41) and measurements with erroneous measurement methodology (flag 42).</p>
</app>

<app id="App1.Ch1.S2">
  <label>Appendix B</label><title>Kalman filter</title>
      <p id="d2e6460">In this section, we briefly describe the application of the Kalman filter as a MOS correction method. More details can be found for instance in <xref ref-type="bibr" rid="bib1.bibx7" id="text.25"/>, while <xref ref-type="bibr" rid="bib1.bibx30" id="text.26"/> provide a clear general introduction to the Kalman filter. CAMS forecasts are available over 4 lead days, from <inline-formula><mml:math id="M411" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M412" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>. We define here the time <inline-formula><mml:math id="M413" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> as the day <inline-formula><mml:math id="M414" display="inline"><mml:mi>D</mml:mi></mml:math></inline-formula> at a given hour of the day (<inline-formula><mml:math id="M415" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> thus corresponds to <inline-formula><mml:math id="M416" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> at this specific hour of the day). In an operational context, observations at this hour of the day are available only until time <inline-formula><mml:math id="M417" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> (included). In this framework, our primary objective in this MOS approach is to estimate <inline-formula><mml:math id="M418" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, the true (unknown) forecast bias at time <inline-formula><mml:math id="M419" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> using the information available until <inline-formula><mml:math id="M420" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> (included), which can then be used to correct the raw CAMS forecast. Here, <inline-formula><mml:math id="M421" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> can be referred to as the a priori forecast bias at time <inline-formula><mml:math id="M422" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, while <inline-formula><mml:math id="M423" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> can be referred to as the a posteriori forecast bias at time <inline-formula><mml:math id="M424" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> as it takes advantage from the information obtained at <inline-formula><mml:math id="M425" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. We distinguish estimated values from true values using a hat ( <inline-formula><mml:math id="M426" display="inline"><mml:mover accent="true"><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:math></inline-formula> ) (<inline-formula><mml:math id="M427" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> therefore corresponds to the estimated value of <inline-formula><mml:math id="M428" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>). In its application as a MOS method, the Kalman filter considers the following <italic>process equations</italic> for describing the time evolution of the forecast bias:

              <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M429" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="App1.Ch1.S2.E1"><mml:mtd><mml:mtext>B1</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">η</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mo>;</mml:mo><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mo>(</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S2.E2"><mml:mtd><mml:mtext>B2</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">η</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M430" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">η</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the process noise and is assumed to be a white noise term with normal distribution, zero mean, variance <inline-formula><mml:math id="M431" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">η</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> and uncorrelated in time, and <inline-formula><mml:math id="M432" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the a priori expected error variance of the forecast bias estimate. Our process equations here are thus quite simple as we assume that the a priori forecast bias at time <inline-formula><mml:math id="M433" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M434" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, is similar to the previous a posteriori forecast bias <inline-formula><mml:math id="M435" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> but with some uncertainty <inline-formula><mml:math id="M436" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">η</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e6965">At time <inline-formula><mml:math id="M437" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, an observation of the forecast bias <inline-formula><mml:math id="M438" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, denoted <inline-formula><mml:math id="M439" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, is available but with some uncertainty (since the measurement of the pollutant concentration necessarily comes with some uncertainty):

              <disp-formula id="App1.Ch1.S2.E3" content-type="numbered"><label>B3</label><mml:math id="M440" display="block"><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

        where <inline-formula><mml:math id="M441" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the measurement noise and is assumed to be a white noise term with normal distribution, zero mean, variance <inline-formula><mml:math id="M442" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>, uncorrelated in time and independent of the process noise <inline-formula><mml:math id="M443" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">η</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. Then, the Kalman filter allows this observation <inline-formula><mml:math id="M444" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and the a priori estimate of the forecast bias <inline-formula><mml:math id="M445" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> to be fused in order to obtain an a posteriori estimate of the forecast bias <inline-formula><mml:math id="M446" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>:

              <disp-formula specific-use="align" content-type="numbered"><mml:math id="M447" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="App1.Ch1.S2.E4"><mml:mtd><mml:mtext>B4</mml:mtext></mml:mtd><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>K</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">η</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">η</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S2.E5"><mml:mtd><mml:mtext>B5</mml:mtext></mml:mtd><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>K</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S2.E6"><mml:mtd><mml:mtext>B6</mml:mtext></mml:mtd><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">η</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mi>K</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M448" display="inline"><mml:mrow><mml:msub><mml:mi>K</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> corresponds to the so-called Kalman gain used to weight the respective importance of the a priori forecast bias estimate (<inline-formula><mml:math id="M449" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) and its observed value (<inline-formula><mml:math id="M450" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>), and <inline-formula><mml:math id="M451" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> corresponds to the expected error in the forecast bias estimate (i.e., the variance of the forecast bias error: <inline-formula><mml:math id="M452" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>).</p>
      <p id="d2e7566">In practice, the KF algorithm first requires initialization of the <inline-formula><mml:math id="M453" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>|</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M454" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>|</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> values (any reasonable value can be chosen, given that the KF quickly converges). Then the algorithm starts its first iteration. As a first step, the a-priori-estimated value of the forecast bias <inline-formula><mml:math id="M455" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is obtained from <inline-formula><mml:math id="M456" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>|</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (in our problem, we simply have <inline-formula><mml:math id="M457" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>|</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) and used to correct the raw forecast of CAMS. As a second step, after obtaining the observed pollutant concentration, one can deduce <inline-formula><mml:math id="M458" display="inline"><mml:mrow><mml:msub><mml:mi>z</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and fuse it with <inline-formula><mml:math id="M459" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> using the Kalman filter equations, which gives us the a-posteriori-estimated value of the forecast bias <inline-formula><mml:math id="M460" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>|</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> that will be available for the second iteration. An overview of this workflow is given in Fig. <xref ref-type="fig" rid="App1.Ch1.S2.F6"/>.</p>

      <fig id="App1.Ch1.S2.F6"><label>Figure B1</label><caption><p id="d2e7735">Workflow of the Kalman filter method.</p></caption>
        <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f06.png"/>

      </fig>

      <p id="d2e7744">Solving these equations requires values to be assigned to both variances <inline-formula><mml:math id="M461" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">η</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M462" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula>. It can be demonstrated that, once <inline-formula><mml:math id="M463" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> is set to a fixed value (any reasonable value can be chosen, for instance <inline-formula><mml:math id="M464" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>), the KF results mainly depend on the <inline-formula><mml:math id="M465" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">η</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>/</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> variance ratio. Various strategies can be used to choose an appropriate value for this variance ratio. This aspect is discussed in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS3"/>.</p>
</app>

<app id="App1.Ch1.S3">
  <label>Appendix C</label><title>Analogs norm</title>
      <p id="d2e7836">The analogs (AN) method requires identification of which past forecast days are the most similar to the current one. Given a set of features to take into account, this similarity is computed using the norm introduced by <xref ref-type="bibr" rid="bib1.bibx7" id="text.27"/>:
          <disp-formula id="App1.Ch1.S3.E7" content-type="numbered"><label>C1</label><mml:math id="M466" display="block"><mml:mrow><mml:mfenced open="∥" close="∥"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:msqrt><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>T</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>+</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
        with <inline-formula><mml:math id="M467" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the raw forecast at time <inline-formula><mml:math id="M468" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M469" display="inline"><mml:mrow><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> an analog forecast at time <inline-formula><mml:math id="M470" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M471" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> the number of features taken into account, <inline-formula><mml:math id="M472" display="inline"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the weight of the feature <inline-formula><mml:math id="M473" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M474" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> its standard deviation calculated over past forecasts and <inline-formula><mml:math id="M475" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> the half-width of the time window over which to compute the metric (i.e., a value <inline-formula><mml:math id="M476" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> means that the squared difference between the forecast and the analog will be computed over a <inline-formula><mml:math id="M477" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> h time window). In our study, we used weights of 1 for all features (wind speed, wind direction, temperature, surface pressure) and <inline-formula><mml:math id="M478" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>.</p>
</app>

<app id="App1.Ch1.S4">
  <label>Appendix D</label><title>Tuning of the GBM models</title>
      <p id="d2e8081">The GBM models are tuned using a so-called <italic>randomized search</italic> in which a range of values is given for each hyperparameter of interest and a total number of hyperparameters combinations to test. After fixing the learning rate to 0.05 (<italic>learning_rate</italic> in the <italic>scikit-learn</italic> Python package), the tuning of the GBM model was done over the following set of hyperparameters: the tree maximum depth (<italic>max_depth</italic>: from 1 to 5 by 1), the subsample (<italic>subsample</italic>: from 0.3 to 1.0 by 0.1), the number of trees (<italic>n_estimators</italic>: from 50 to 1000 by 50) and the minimum number of samples required to be at a leaf node (<italic>min_samples_leaf</italic>: from 1 to 50).  As we are dealing here with time series, this tuning is conducted through a rolling-origin cross-validation in which validation data are always posterior to train data.</p>
</app>

<app id="App1.Ch1.S5">
  <label>Appendix E</label><title>Evaluation metrics</title>
      <p id="d2e8114">The continuous metrics used in this study are defined as follows: 

              <disp-formula id="App1.Ch1.S5.E8" specific-use="gather" content-type="subnumberedsingle"><mml:math id="M479" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="App1.Ch1.S5.E8.9"><mml:mtd><mml:mtext>E1a</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">MB</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msub><mml:mi>m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E8.10"><mml:mtd><mml:mtext>E1b</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">nMB</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi mathvariant="normal">MB</mml:mi><mml:mover accent="true"><mml:mi>o</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E8.11"><mml:mtd><mml:mtext>E1c</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="normal">RMSE</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:msub><mml:mi>m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle></mml:msqrt></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E8.12"><mml:mtd><mml:mtext>E1d</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">nRMSE</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi mathvariant="normal">RMSE</mml:mi><mml:mover accent="true"><mml:mi>o</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E8.13"><mml:mtd><mml:mtext>E1e</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">PCC</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>m</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>o</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>m</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>o</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E8.14"><mml:mtd><mml:mtext>E1f</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">nMSDB</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          with <inline-formula><mml:math id="M480" display="inline"><mml:mrow><mml:msub><mml:mi>m</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M481" display="inline"><mml:mrow><mml:msub><mml:mi>o</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> the predicted and observed mixing ratios, <inline-formula><mml:math id="M482" display="inline"><mml:mover accent="true"><mml:mi>m</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> and <inline-formula><mml:math id="M483" display="inline"><mml:mover accent="true"><mml:mi>m</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> their corresponding mean, <inline-formula><mml:math id="M484" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M485" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> their corresponding standard deviation, and <inline-formula><mml:math id="M486" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> the number of points.</p>
      <p id="d2e8450">The performance of the categorical forecasts of exceedances beyond a certain threshold can primarily be described through a contingency table (Table <xref ref-type="table" rid="App1.Ch1.S5.T3"/>). Based on these individual numbers <inline-formula><mml:math id="M487" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> (hits), <inline-formula><mml:math id="M488" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> (false alarms), <inline-formula><mml:math id="M489" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> (misses) and <inline-formula><mml:math id="M490" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula> (correct rejections), a wide number of verification metrics have been proposed in the literature, often with inconsistent nomenclature. In order to avoid confusion, all metrics used in this paper systematically follow the nomenclature given in the reference book of <xref ref-type="bibr" rid="bib1.bibx24" id="text.28"/>.</p>
      <p id="d2e8487">For a given total number of data <inline-formula><mml:math id="M491" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="M492" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:math></inline-formula>), the <inline-formula><mml:math id="M493" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> contingency table can be fully described by three independent measures, namely the base rate <inline-formula><mml:math id="M494" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula> independent of the forecasting system (total proportion of observed exceedances, also known as the climatological probability of an exceedance), the hit rate <inline-formula><mml:math id="M495" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> (proportion of the observed exceedances that are correctly detected) and the false alarm rate <inline-formula><mml:math id="M496" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> (proportion of the observed non-exceedances erroneously forecast as exceedances, to be distinguished from the false alarm ratio). These metrics as well as the other categorical metrics used in this study – frequency bias (FB), success ratio (SR), critical success index (CSI) or Peirce skill score (PSS) – are defined as follows:

              <disp-formula id="App1.Ch1.S5.E15" specific-use="gather" content-type="subnumberedsingle"><mml:math id="M497" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="App1.Ch1.S5.E15.16"><mml:mtd><mml:mtext>E2a</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E15.17"><mml:mtd><mml:mtext>E2b</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi>H</mml:mi><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E15.18"><mml:mtd><mml:mtext>E2c</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi>F</mml:mi><mml:mo>=</mml:mo><mml:mi>b</mml:mi><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>b</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E15.19"><mml:mtd><mml:mtext>E2d</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">PC</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>F</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi>s</mml:mi><mml:mi>H</mml:mi></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E15.20"><mml:mtd><mml:mtext>E2e</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">FB</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo><mml:mi>F</mml:mi><mml:mo>/</mml:mo><mml:mi>s</mml:mi><mml:mo>+</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E15.21"><mml:mtd><mml:mtext>E2f</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="normal">SR</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:msup><mml:mfenced close="]" open="["><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>+</mml:mo><mml:mfenced close=")" open="("><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>s</mml:mi><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>H</mml:mi><mml:mi>F</mml:mi></mml:mfrac></mml:mstyle></mml:mrow></mml:mfenced><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E15.22"><mml:mtd><mml:mtext>E2g</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="normal">CSI</mml:mi><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mo>/</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>H</mml:mi><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo><mml:mi>s</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S5.E15.23"><mml:mtd><mml:mtext>E2h</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="normal">PSS</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mo>-</mml:mo><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>b</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mi>H</mml:mi><mml:mo>-</mml:mo><mml:mi>F</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          Note that as shown in these formulas, any categorical metric that is initially a function of <inline-formula><mml:math id="M498" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M499" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M500" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M501" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula> can be expressed in terms of <inline-formula><mml:math id="M502" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M503" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M504" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>. One interest of considering this <inline-formula><mml:math id="M505" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula>–<inline-formula><mml:math id="M506" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula>–<inline-formula><mml:math id="M507" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> framework (so-called likelihood–base rate factorization; see chapter 3 of <xref ref-type="bibr" rid="bib1.bibx24" id="altparen.29"/>, for a detailed description) lies in the fact that, since the forecaster does not have any influence on <inline-formula><mml:math id="M508" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula>, the tri-dimensional problem is reduced to a bi-dimensional problem (<inline-formula><mml:math id="M509" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M510" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>). Since it is easily possible to maximize <inline-formula><mml:math id="M511" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> (by always predicting an exceedance) or <inline-formula><mml:math id="M512" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> (by always predicting a non-exceedance), none of these two metrics taken individually is a good and balanced metric for assessing the quality of a forecasting system; only some combinations of both (possibly with <inline-formula><mml:math id="M513" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula>) can eventually provide a good way to assess these detection skills, such as those used in this study.</p><table-wrap id="App1.Ch1.S5.T3"><label>Table E1</label><caption><p id="d2e9048">Schematic contingency table for deterministic forecasts of binary exceedances of the regulatory limit values.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry rowsep="1" namest="col2" nameend="col4" align="center">Exceedance observed </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Exceedance forecast</oasis:entry>
         <oasis:entry colname="col2">Yes</oasis:entry>
         <oasis:entry colname="col3">No</oasis:entry>
         <oasis:entry colname="col4">Total</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Yes</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M514" display="inline"><mml:mi>a</mml:mi></mml:math></inline-formula> (hits)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M515" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> (false alarms)</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M516" display="inline"><mml:mrow><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">No</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M517" display="inline"><mml:mi>c</mml:mi></mml:math></inline-formula> (misses)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M518" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula> (correct rejections)</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M519" display="inline"><mml:mrow><mml:mi>c</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Total</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M520" display="inline"><mml:mrow><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M521" display="inline"><mml:mrow><mml:mi>b</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M522" display="inline"><mml:mrow><mml:mi>a</mml:mi><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>+</mml:mo><mml:mi>c</mml:mi><mml:mo>+</mml:mo><mml:mi>d</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</app>

<app id="App1.Ch1.S6">
  <label>Appendix F</label><title>Time series</title>

      <fig id="App1.Ch1.S6.F7"><label>Figure F1</label><caption><p id="d2e9236">Time series of the mean O<sub>3</sub> mixing ratios over the Iberian Peninsula, as observed by monitoring stations (in black) and as simulated by CAMS <inline-formula><mml:math id="M524" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> forecasts corrected with the GBM MOS method (in yellow). Time series are shown at the hourly (h), daily mean (d), daily 1 h maximum (d1max) and daily 8 h maximum (d8max) timescales. O<sub>3</sub> mixing ratios are averaged over all surface stations of the domain.</p></caption>
        
        <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f07.png"/>

      </fig>


</app>

<app id="App1.Ch1.S7">
  <label>Appendix G</label><title>Sensitivity tests</title>

      <fig id="App1.Ch1.S7.F8"><label>Figure G1</label><caption><p id="d2e9289">Similar to Fig. <xref ref-type="fig" rid="Ch1.F3"/> for sensitivity tests on the PERS method.</p></caption>
        
        <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f08.png"/>

      </fig>

<fig id="App1.Ch1.S7.F9"><label>Figure G2</label><caption><p id="d2e9305">Similar to Fig. <xref ref-type="fig" rid="Ch1.F3"/> for sensitivity tests on the MA method.</p></caption>
        
        <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f09.png"/>

      </fig>

<fig id="App1.Ch1.S7.F10"><label>Figure G3</label><caption><p id="d2e9322">Similar to Fig. <xref ref-type="fig" rid="Ch1.F3"/> for sensitivity tests on the KF method.</p></caption>
        
        <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f10.png"/>

      </fig>

<fig id="App1.Ch1.S7.F11"><label>Figure G4</label><caption><p id="d2e9338">Similar to Fig. <xref ref-type="fig" rid="Ch1.F3"/> for sensitivity tests on the AN method.</p></caption>
        
        <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f11.png"/>

      </fig>

<fig id="App1.Ch1.S7.F12"><label>Figure G5</label><caption><p id="d2e9354">Similar to Fig. <xref ref-type="fig" rid="Ch1.F3"/> for sensitivity tests on the GBM method.</p></caption>
        
        <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f12.png"/>

      </fig>

<fig id="App1.Ch1.S7.F13"><label>Figure G6</label><caption><p id="d2e9371">Similar to Fig. <xref ref-type="fig" rid="Ch1.F3"/> for sensitivity tests on the meteorological data (HRES versus ERA5) used in the AN and GBM methods.</p></caption>
        
        <graphic xlink:href="https://acp.copernicus.org/articles/22/11603/2022/acp-22-11603-2022-f13.png"/>

      </fig>

</app>
  </app-group><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d2e9388">The EEA AQ e-Reporting dataset is publicly available (<uri>https://www.eea.europa.eu/data-and-maps/data/aqereporting-2</uri>; <xref ref-type="bibr" rid="bib1.bibx14" id="altparen.30"/>), as well as the ERA5 meteorological dataset (<ext-link xlink:href="https://doi.org/10.24381/cds.adbb2d47" ext-link-type="DOI">10.24381/cds.adbb2d47</ext-link>; <xref ref-type="bibr" rid="bib1.bibx19" id="altparen.31"/>) and the CAMS regional forecasts (<uri>https://atmosphere.copernicus.eu/catalogue#/</uri>; <xref ref-type="bibr" rid="bib1.bibx4" id="altparen.32"/>).</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e9410">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/acp-22-11603-2022-supplement" xlink:title="pdf">https://doi.org/10.5194/acp-22-11603-2022-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e9421">HP contributed to the conception and design of the study. PAB and MSC were responsible for downloading the CAMS and meteorological data. KS was responsible for installing the Python packages and other useful modules on the MareNostrum supercomputer. DB was responsible for the acquisition and preprocessing of the air quality data through the GHOST project. HP carried out the analysis. HP, CPGP, OJ, AS, MG, JMA and DB contributed to the interpretation of results. HP was responsible for writing the article, with a careful review from CPGP and JAM.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e9427">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e9433">Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e9439">This research has been funded by the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement H2020-MSCA-COFUND-2016-754433, as well as the MITIGATE project (PID2020-113840RA-I00 funded by MCIN/AEI/10.13039/501100011033)  from the Agencia Estatal de Investigacion (AEI). We also acknowledge support by the AXA Research Fund and Red Temática ACTRIS España (CGL2017-90884-REDT), the BSC-CNS “Centro de Excelencia Severo Ochoa 2015-2019” program (SEV-2015-0493), PRACE, and RES for awarding us access to the MareNostrum supercomputer in the Barcelona Supercomputing Center as well as H2020 ACTRIS IMP (no. 871115). We also acknowledge support from the VITALISE project (PID2019-108086RA-I00) funded by MCIN/AEI/10.13039/501100011033.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e9444">This research has been supported by the Agencia Estatal de Investigación (MITIGATE project, grant no. PID2020-113840RA-I00 funded by MCIN/AEI/10.13039/501100011033) and the H2020 Marie Skłodowska-Curie Actions (grant no. H2020-MSCA-COFUND-2016-754433).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e9450">This paper was edited by Pedro Jimenez-Guerrero and reviewed by three anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Borrego et al.(2011)</label><mixed-citation>Borrego, C., Monteiro, A., Pay, M., Ribeiro, I., Miranda, A., Basart, S., and Baldasano, J.: How bias-correction can improve air quality forecasts over Portugal, Atmos. Environ., 45, 6629–6641, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2011.09.006" ext-link-type="DOI">10.1016/j.atmosenv.2011.09.006</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Bowdalo(2022)</label><mixed-citation> Bowdalo, D.: Globally Harmonised Observational Surface Treatment: Database of global surface gas observations, in preparation, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Caruana and Niculescu-Mizil(2005)</label><mixed-citation> Caruana, R. and Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms using different performance metrics, Tech. rep., Technical Report TR2005-1973, Cornell University, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Copernicus(2020)</label><mixed-citation>Copernicus: Catalogue, Copernicus [data set], <uri>https://atmosphere.copernicus.eu/catalogue#/</uri>, last access: 20 November 2020.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Copernicus Climate Change Service (C3S)(2017)</label><mixed-citation> Copernicus Climate Change Service (C3S): ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>De Ridder et al.(2012)</label><mixed-citation>De Ridder, K., Kumar, U., Lauwaet, D., Blyth, L., and Lefebvre, W.: Kalman filter-based air quality forecast adjustment, Atmos. Environ., 50, 381–384, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2012.01.032" ext-link-type="DOI">10.1016/j.atmosenv.2012.01.032</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Delle Monache et al.(2006)</label><mixed-citation>Delle Monache, L., Nipen, T., Deng, X., Zhou, Y., and Stull, R.: Ozone ensemble forecasts: 2. A Kalman filter predictor bias correction, J. Geophys. Res., 111, D05308, <ext-link xlink:href="https://doi.org/10.1029/2005JD006311" ext-link-type="DOI">10.1029/2005JD006311</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Delle Monache et al.(2011)</label><mixed-citation>Delle Monache, L., Nipen, T., Liu, Y., Roux, G., and Stull, R.: Kalman Filter and Analog Schemes to Postprocess Numerical Weather Predictions, Mon. Weather Rev., 139, 3554–3570, <ext-link xlink:href="https://doi.org/10.1175/2011MWR3653.1" ext-link-type="DOI">10.1175/2011MWR3653.1</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Delle Monache et al.(2013)</label><mixed-citation>Delle Monache, L., Eckel, F. A., Rife, D. L., Nagarajan, B., and Searight, K.: Probabilistic Weather Prediction with an Analog Ensemble, Mon. Weather Rev., 141, 3498–3516, <ext-link xlink:href="https://doi.org/10.1175/MWR-D-12-00281.1" ext-link-type="DOI">10.1175/MWR-D-12-00281.1</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Di Tomaso et al.(2017)</label><mixed-citation>Di Tomaso, E., Schutgens, N. A. J., Jorba, O., and Pérez García-Pando, C.: Assimilation of MODIS Dark Target and Deep Blue observations in the dust aerosol component of NMMB-MONARCH version 1.0, Geosci. Model Dev., 10, 1107–1129, <ext-link xlink:href="https://doi.org/10.5194/gmd-10-1107-2017" ext-link-type="DOI">10.5194/gmd-10-1107-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Djalalova et al.(2010)</label><mixed-citation>Djalalova, I., Wilczak, J., McKeen, S., Grell, G., Peckham, S., Pagowski, M., DelleMonache, L., McQueen, J., Tang, Y., and Lee, P.: Ensemble and bias-correction techniques for air quality model forecasts of surface O<sub>3</sub> and PM<sub>2.5</sub> during the TEXAQS-II experiment of 2006, Atmos. Environ., 44, 455–467, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2009.11.007" ext-link-type="DOI">10.1016/j.atmosenv.2009.11.007</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Djalalova et al.(2015)</label><mixed-citation>Djalalova, I., Delle Monache, L., and Wilczak, J.: PM<sub>2.5</sub> analog forecast and Kalman filter post-processing for the Community Multiscale Air Quality (CMAQ) model, Atmos. Environ., 108, 76–87, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2015.02.021" ext-link-type="DOI">10.1016/j.atmosenv.2015.02.021</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>EEA(2020)</label><mixed-citation>EEA: Air Quality e-Reporting Database, European Environment Agency, <uri>https://www.eea.europa.eu/data-and-maps/data/aqereporting-9</uri>, last access: 1 May 2020.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>EEA(2021)</label><mixed-citation>EEA: Air Quality e-Reporting (AQ e-Reporting), EEA [data set], <uri>https://www.eea.europa.eu/data-and-maps/data/aqereporting-2</uri>, last access: 10 May 2021.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Fawcett(2006)</label><mixed-citation>Fawcett, T.: An introduction to ROC analysis, Pattern Recogn. Lett., 27, 861–874, <ext-link xlink:href="https://doi.org/10.1016/j.patrec.2005.10.010" ext-link-type="DOI">10.1016/j.patrec.2005.10.010</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Ferro and Stephenson(2011)</label><mixed-citation> Ferro, C. A. T. and Stephenson, D. B.: Extremal Dependence Indices: Improved Verification Measures for Deterministic Forecasts of Rare Binary Events, Weather Forecast., 26, 699–713, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Gaubert et al.(2014)</label><mixed-citation>Gaubert, B., Coman, A., Foret, G., Meleux, F., Ung, A., Rouil, L., Ionescu, A., Candau, Y., and Beekmann, M.: Regional scale ozone data assimilation using an ensemble Kalman filter and the CHIMERE chemical transport model, Geosci. Model Dev., 7, 283–302, <ext-link xlink:href="https://doi.org/10.5194/gmd-7-283-2014" ext-link-type="DOI">10.5194/gmd-7-283-2014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Hamill and Whitaker(2006)</label><mixed-citation>Hamill, T. M. and Whitaker, J. S.: Probabilistic Quantitative Precipitation Forecasts Based on Reforecast Analogs: Theory and Application, Mon. Weather Rev., 134, 3209–3229, <ext-link xlink:href="https://doi.org/10.1175/MWR3237.1" ext-link-type="DOI">10.1175/MWR3237.1</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Hersbach et al.(2018)</label><mixed-citation>Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J.-N.: ERA5 hourly data on single levels from 1959 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set], <ext-link xlink:href="https://doi.org/10.24381/cds.adbb2d47" ext-link-type="DOI">10.24381/cds.adbb2d47</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Honoré et al.(2008)</label><mixed-citation>Honoré, C., Rouïl, L., Vautard, R., Beekmann, M., Bessagnet, B., Dufour, A., Elichegaray, C., Flaud, J.-M., Malherbe, L., Meleux, F., Menut, L., Martin, D., Peuch, A., Peuch, V.-H., and Poisson, N.: Predictability of European air quality: Assessment of 3 years of operational forecasts and analyses by the PREV'AIR system, J. Geophys. Res., 113, D04301, <ext-link xlink:href="https://doi.org/10.1029/2007JD008761" ext-link-type="DOI">10.1029/2007JD008761</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Huang et al.(2017)</label><mixed-citation>Huang, J., McQueen, J., Wilczak, J., Djalalova, I., Stajner, I., Shafran, P., Allured, D., Lee, P., Pan, L., Tong, D., Huang, H.-C., DiMego, G., Upadhayay, S., and Delle Monache, L.: Improving NOAA NAQFC PM<sub>2.5</sub> Predictions with a Bias Correction Approach, Weather Forecast., 32, 407–421, <ext-link xlink:href="https://doi.org/10.1175/WAF-D-16-0118.1" ext-link-type="DOI">10.1175/WAF-D-16-0118.1</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Im et al.(2015a)</label><mixed-citation>Im, U., Bianconi, R., Solazzo, E., Kioutsioukis, I., Badia, A., Balzarini, A., Baró, R., Bellasio, R., Brunner, D., Chemel, C., Curci, G., Denier van der Gon, H., Flemming, J., Forkel, R., Giordano, L., Jiménez-Guerrero, P., Hirtl, M., Hodzic, A., Honzak, L., Jorba, O., Knote, C., Makar, P. A., Manders-Groot, A., Neal, L., Pérez, J. L., Pirovano, G., Pouliot, G., San Jose, R., Savage, N., Schroder, W., Sokhi, R. S., Syrakov, D., Torian, A., Tuccella, P., Wang, K., Werhahn, J., Wolke, R., Zabkar, R., Zhang, Y., Zhang, J., Hogrefe, C., and Galmarini, S.: Evaluation of operational online-coupled regional air quality models over Europe and North America in the context of AQMEII phase 2. Part II: Particulate matter, Atmos. Environ., 115, 421–441, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2014.08.072" ext-link-type="DOI">10.1016/j.atmosenv.2014.08.072</ext-link>, 2015a.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Im et al.(2015b)</label><mixed-citation>Im, U., Bianconi, R., Solazzo, E., Kioutsioukis, I., Badia, A., Balzarini, A., Baró, R., Bellasio, R., Brunner, D., Chemel, C., Curci, G., Flemming, J., Forkel, R., Giordano, L., Jiménez-Guerrero, P., Hirtl, M., Hodzic, A., Honzak, L., Jorba, O., Knote, C., Kuenen, J. J., Makar, P. A., Manders-Groot, A., Neal, L., Pérez, J. L., Pirovano, G., Pouliot, G., San Jose, R., Savage, N., Schroder, W., Sokhi, R. S., Syrakov, D., Torian, A., Tuccella, P., Werhahn, J., Wolke, R., Yahya, K., Zabkar, R., Zhang, Y., Zhang, J., Hogrefe, C., and Galmarini, S.: Evaluation of operational on-line-coupled regional air quality models over Europe and North America in the context of AQMEII phase 2. Part I: Ozone, Atmos. Environ., 115, 404–420, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2014.09.042" ext-link-type="DOI">10.1016/j.atmosenv.2014.09.042</ext-link>, 2015b.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Jolliffe and Stephenson(2011)</label><mixed-citation>Jolliffe, I. T. and Stephenson, D. B. (Eds.): Forecast Verification: A Practitioner's Guide in Atmospheric Science, 2nd Edn., J. Wiley, Chichester, United Kingdom, ISBN 9780470660713, 2011.  </mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Kang et al.(2008)</label><mixed-citation>Kang, D., Mathur, R., Rao, S. T., and Yu, S.: Bias adjustment techniques for improving ozone air quality forecasts, J. Geophys. Res., 113, D23308, <ext-link xlink:href="https://doi.org/10.1029/2008JD010151" ext-link-type="DOI">10.1029/2008JD010151</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Kang et al.(2010)</label><mixed-citation>Kang, D., Mathur, R., and Trivikrama Rao, S.: Real-time bias-adjusted O<sub>3</sub> and PM<sub>2.5</sub> air quality index forecasts and their performance evaluations over the continental United States, Atmos. Environ., 44, 2203–2212, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2010.03.017" ext-link-type="DOI">10.1016/j.atmosenv.2010.03.017</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Liu et al.(2018)</label><mixed-citation>Liu, T., Lau, A. K. H., Sandbrink, K., and Fung, J. C. H.: Time Series Forecasting of Air Quality Based On Regional Numerical Modeling in Hong Kong, J. Geophys. Res.-Atmos., 123, 4175–4196, <ext-link xlink:href="https://doi.org/10.1002/2017JD028052" ext-link-type="DOI">10.1002/2017JD028052</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Ma et al.(2018)</label><mixed-citation>Ma, C., Wang, T., Zang, Z., and Li, Z.: Comparisons of Three-Dimensional Variational Data Assimilation and Model Output Statistics in Improving Atmospheric Chemistry Forecasts, Adv. Atmos. Sci., 35, 813–825, <ext-link xlink:href="https://doi.org/10.1007/s00376-017-7179-y" ext-link-type="DOI">10.1007/s00376-017-7179-y</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>McKeen et al.(2005)</label><mixed-citation>McKeen, S., Wilczak, J., Grell, G., Djalalova, I., Peckham, S., Hsie, E.-Y., Gong, W., Bouchet, V., Menard, S., Moffet, R., McHenry, J., McQueen, J., Tang, Y., Carmichael, G. R., Pagowski, M., Chan, A., Dye, T., Frost, G., Lee, P., and Mathur, R.: Assessment of an ensemble of seven real-time ozone forecasts over eastern North America during the summer of 2004, J. Geophys. Res., 110, D21307, <ext-link xlink:href="https://doi.org/10.1029/2005JD005858" ext-link-type="DOI">10.1029/2005JD005858</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Pei et al.(2017)</label><mixed-citation>Pei, Y., Biswas, S., Fussell, D. S., and Pingali, K.: An Elementary Introduction to Kalman Filtering,  arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/ARXIV.1710.04055" ext-link-type="DOI">10.48550/ARXIV.1710.04055</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Struzewska et al.(2016)</label><mixed-citation>Struzewska, J., Kaminski, J., and Jefimow, M.: Application of model output statistics to the GEM-AQ high resolution air quality forecast, Atmos. Res., 181, 186–199, <ext-link xlink:href="https://doi.org/10.1016/j.atmosres.2016.06.012" ext-link-type="DOI">10.1016/j.atmosres.2016.06.012</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>World Health Organization(2016)</label><mixed-citation>World Health Organization: Ambient air pollution: a global assessment of exposure and burden of disease, Tech. rep., <uri>https://apps.who.int/iris/bitstream/handle/10665/250141/9789241511353-eng.pdf?sequence=1</uri> (last access: 1 September 2021), 2016.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Model output statistics (MOS) applied  to Copernicus Atmospheric Monitoring Service  (CAMS) O<sub>3</sub> forecasts: trade-offs between  continuous and categorical skill scores</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Borrego et al.(2011)</label><mixed-citation>
      
Borrego, C., Monteiro, A., Pay, M., Ribeiro, I., Miranda, A., Basart, S., and
Baldasano, J.: How bias-correction can improve air quality forecasts over
Portugal, Atmos. Environ., 45, 6629–6641,
<a href="https://doi.org/10.1016/j.atmosenv.2011.09.006" target="_blank">https://doi.org/10.1016/j.atmosenv.2011.09.006</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Bowdalo(2022)</label><mixed-citation>
      
Bowdalo, D.: Globally Harmonised Observational Surface Treatment: Database of
global surface gas observations, in preparation, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Caruana and Niculescu-Mizil(2005)</label><mixed-citation>
      
Caruana, R. and Niculescu-Mizil, A.: An empirical comparison of supervised
learning algorithms using different performance metrics, Tech. rep.,
Technical Report TR2005-1973, Cornell University, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Copernicus(2020)</label><mixed-citation>
      
Copernicus: Catalogue, Copernicus [data set], <a href="https://atmosphere.copernicus.eu/catalogue#/" target="_blank"/>, last access: 20 November 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Copernicus Climate Change Service
(C3S)(2017)</label><mixed-citation>
      
Copernicus Climate Change Service (C3S): ERA5: Fifth generation of ECMWF
atmospheric reanalyses of the global climate, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>De Ridder et al.(2012)</label><mixed-citation>
      
De Ridder, K., Kumar, U., Lauwaet, D., Blyth, L., and Lefebvre, W.: Kalman
filter-based air quality forecast adjustment, Atmos. Environ., 50,
381–384, <a href="https://doi.org/10.1016/j.atmosenv.2012.01.032" target="_blank">https://doi.org/10.1016/j.atmosenv.2012.01.032</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Delle Monache et al.(2006)</label><mixed-citation>
      
Delle Monache, L., Nipen, T., Deng, X., Zhou, Y., and Stull, R.: Ozone
ensemble forecasts: 2. A Kalman filter predictor bias correction, J.
Geophys. Res., 111, D05308, <a href="https://doi.org/10.1029/2005JD006311" target="_blank">https://doi.org/10.1029/2005JD006311</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Delle Monache et al.(2011)</label><mixed-citation>
      
Delle Monache, L., Nipen, T., Liu, Y., Roux, G., and Stull, R.: Kalman
Filter and Analog Schemes to Postprocess Numerical Weather Predictions,
Mon. Weather Rev., 139, 3554–3570, <a href="https://doi.org/10.1175/2011MWR3653.1" target="_blank">https://doi.org/10.1175/2011MWR3653.1</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Delle Monache et al.(2013)</label><mixed-citation>
      
Delle Monache, L., Eckel, F. A., Rife, D. L., Nagarajan, B., and Searight,
K.: Probabilistic Weather Prediction with an Analog Ensemble, Mon.
Weather Rev., 141, 3498–3516, <a href="https://doi.org/10.1175/MWR-D-12-00281.1" target="_blank">https://doi.org/10.1175/MWR-D-12-00281.1</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Di Tomaso et al.(2017)</label><mixed-citation>
      
Di Tomaso, E., Schutgens, N. A. J., Jorba, O., and Pérez García-Pando, C.: Assimilation of MODIS Dark Target and Deep Blue observations in the dust aerosol component of NMMB-MONARCH version 1.0, Geosci. Model Dev., 10, 1107–1129, <a href="https://doi.org/10.5194/gmd-10-1107-2017" target="_blank">https://doi.org/10.5194/gmd-10-1107-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Djalalova et al.(2010)</label><mixed-citation>
      
Djalalova, I., Wilczak, J., McKeen, S., Grell, G., Peckham, S., Pagowski, M.,
DelleMonache, L., McQueen, J., Tang, Y., and Lee, P.: Ensemble and
bias-correction techniques for air quality model forecasts of surface O<sub>3</sub> and
PM<sub>2.5</sub> during the TEXAQS-II experiment of 2006, Atmos. Environ., 44,
455–467, <a href="https://doi.org/10.1016/j.atmosenv.2009.11.007" target="_blank">https://doi.org/10.1016/j.atmosenv.2009.11.007</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Djalalova et al.(2015)</label><mixed-citation>
      
Djalalova, I., Delle Monache, L., and Wilczak, J.: PM<sub>2.5</sub> analog forecast and
Kalman filter post-processing for the Community Multiscale Air Quality (CMAQ)
model, Atmos. Environ., 108, 76–87,
<a href="https://doi.org/10.1016/j.atmosenv.2015.02.021" target="_blank">https://doi.org/10.1016/j.atmosenv.2015.02.021</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>EEA(2020)</label><mixed-citation>
      
EEA: Air Quality e-Reporting Database, European Environment Agency,
<a href="https://www.eea.europa.eu/data-and-maps/data/aqereporting-9" target="_blank"/>, last access: 1 May 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>EEA(2021)</label><mixed-citation>
      
EEA: Air Quality e-Reporting (AQ e-Reporting), EEA [data set], <a href="https://www.eea.europa.eu/data-and-maps/data/aqereporting-2" target="_blank"/>, last access: 10 May 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Fawcett(2006)</label><mixed-citation>
      
Fawcett, T.: An introduction to ROC analysis, Pattern Recogn. Lett.,
27, 861–874, <a href="https://doi.org/10.1016/j.patrec.2005.10.010" target="_blank">https://doi.org/10.1016/j.patrec.2005.10.010</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Ferro and Stephenson(2011)</label><mixed-citation>
      
Ferro, C. A. T. and Stephenson, D. B.: Extremal Dependence Indices: Improved
Verification Measures for Deterministic Forecasts of Rare Binary Events,
Weather Forecast., 26, 699–713, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Gaubert et al.(2014)</label><mixed-citation>
      
Gaubert, B., Coman, A., Foret, G., Meleux, F., Ung, A., Rouil, L., Ionescu, A., Candau, Y., and Beekmann, M.: Regional scale ozone data assimilation using an ensemble Kalman filter and the CHIMERE chemical transport model, Geosci. Model Dev., 7, 283–302, <a href="https://doi.org/10.5194/gmd-7-283-2014" target="_blank">https://doi.org/10.5194/gmd-7-283-2014</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Hamill and Whitaker(2006)</label><mixed-citation>
      
Hamill, T. M. and Whitaker, J. S.: Probabilistic Quantitative Precipitation
Forecasts Based on Reforecast Analogs: Theory and Application, Mon.
Weather Rev., 134, 3209–3229, <a href="https://doi.org/10.1175/MWR3237.1" target="_blank">https://doi.org/10.1175/MWR3237.1</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Hersbach et al.(2018)</label><mixed-citation>
      
Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J.-N.: ERA5 hourly data on single levels from 1959 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS) [data set], <a href="https://doi.org/10.24381/cds.adbb2d47" target="_blank">https://doi.org/10.24381/cds.adbb2d47</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Honoré et al.(2008)</label><mixed-citation>
      
Honoré, C., Rouïl, L., Vautard, R., Beekmann, M., Bessagnet, B.,
Dufour, A., Elichegaray, C., Flaud, J.-M., Malherbe, L., Meleux, F., Menut,
L., Martin, D., Peuch, A., Peuch, V.-H., and Poisson, N.: Predictability of
European air quality: Assessment of 3 years of operational forecasts and
analyses by the PREV'AIR system, J. Geophys. Res., 113,
D04301, <a href="https://doi.org/10.1029/2007JD008761" target="_blank">https://doi.org/10.1029/2007JD008761</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Huang et al.(2017)</label><mixed-citation>
      
Huang, J., McQueen, J., Wilczak, J., Djalalova, I., Stajner, I., Shafran, P.,
Allured, D., Lee, P., Pan, L., Tong, D., Huang, H.-C., DiMego, G., Upadhayay,
S., and Delle Monache, L.: Improving NOAA NAQFC PM<sub>2.5</sub> Predictions with a
Bias Correction Approach, Weather Forecast., 32, 407–421,
<a href="https://doi.org/10.1175/WAF-D-16-0118.1" target="_blank">https://doi.org/10.1175/WAF-D-16-0118.1</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Im et al.(2015a)</label><mixed-citation>
      
Im, U., Bianconi, R., Solazzo, E., Kioutsioukis, I., Badia, A., Balzarini, A.,
Baró, R., Bellasio, R., Brunner, D., Chemel, C., Curci, G., Denier van
der Gon, H., Flemming, J., Forkel, R., Giordano, L., Jiménez-Guerrero,
P., Hirtl, M., Hodzic, A., Honzak, L., Jorba, O., Knote, C., Makar, P. A.,
Manders-Groot, A., Neal, L., Pérez, J. L., Pirovano, G., Pouliot, G.,
San Jose, R., Savage, N., Schroder, W., Sokhi, R. S., Syrakov, D., Torian,
A., Tuccella, P., Wang, K., Werhahn, J., Wolke, R., Zabkar, R., Zhang, Y.,
Zhang, J., Hogrefe, C., and Galmarini, S.: Evaluation of operational
online-coupled regional air quality models over Europe and North America in
the context of AQMEII phase 2. Part II: Particulate matter, Atmos.
Environ., 115, 421–441, <a href="https://doi.org/10.1016/j.atmosenv.2014.08.072" target="_blank">https://doi.org/10.1016/j.atmosenv.2014.08.072</a>,
2015a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Im et al.(2015b)</label><mixed-citation>
      
Im, U., Bianconi, R., Solazzo, E., Kioutsioukis, I., Badia, A., Balzarini, A.,
Baró, R., Bellasio, R., Brunner, D., Chemel, C., Curci, G., Flemming,
J., Forkel, R., Giordano, L., Jiménez-Guerrero, P., Hirtl, M., Hodzic,
A., Honzak, L., Jorba, O., Knote, C., Kuenen, J. J., Makar, P. A.,
Manders-Groot, A., Neal, L., Pérez, J. L., Pirovano, G., Pouliot, G.,
San Jose, R., Savage, N., Schroder, W., Sokhi, R. S., Syrakov, D., Torian,
A., Tuccella, P., Werhahn, J., Wolke, R., Yahya, K., Zabkar, R., Zhang, Y.,
Zhang, J., Hogrefe, C., and Galmarini, S.: Evaluation of operational
on-line-coupled regional air quality models over Europe and North America in
the context of AQMEII phase 2. Part I: Ozone, Atmos. Environ., 115,
404–420, <a href="https://doi.org/10.1016/j.atmosenv.2014.09.042" target="_blank">https://doi.org/10.1016/j.atmosenv.2014.09.042</a>, 2015b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Jolliffe and Stephenson(2011)</label><mixed-citation>
      
Jolliffe, I. T. and Stephenson, D. B. (Eds.): Forecast Verification: A Practitioner's
Guide in Atmospheric Science, 2nd Edn., J.
Wiley, Chichester, United Kingdom, ISBN 9780470660713, 2011.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Kang et al.(2008)</label><mixed-citation>
      
Kang, D., Mathur, R., Rao, S. T., and Yu, S.: Bias adjustment techniques for
improving ozone air quality forecasts, J. Geophys. Res., 113,
D23308, <a href="https://doi.org/10.1029/2008JD010151" target="_blank">https://doi.org/10.1029/2008JD010151</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Kang et al.(2010)</label><mixed-citation>
      
Kang, D., Mathur, R., and Trivikrama Rao, S.: Real-time bias-adjusted O<sub>3</sub> and
PM<sub>2.5</sub> air quality index forecasts and their performance evaluations over the
continental United States, Atmos. Environ., 44, 2203–2212,
<a href="https://doi.org/10.1016/j.atmosenv.2010.03.017" target="_blank">https://doi.org/10.1016/j.atmosenv.2010.03.017</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Liu et al.(2018)</label><mixed-citation>
      
Liu, T., Lau, A. K. H., Sandbrink, K., and Fung, J. C. H.: Time Series
Forecasting of Air Quality Based On Regional Numerical Modeling in Hong
Kong, J. Geophys. Res.-Atmos., 123, 4175–4196,
<a href="https://doi.org/10.1002/2017JD028052" target="_blank">https://doi.org/10.1002/2017JD028052</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Ma et al.(2018)</label><mixed-citation>
      
Ma, C., Wang, T., Zang, Z., and Li, Z.: Comparisons of Three-Dimensional
Variational Data Assimilation and Model Output Statistics in Improving
Atmospheric Chemistry Forecasts, Adv. Atmos. Sci., 35,
813–825, <a href="https://doi.org/10.1007/s00376-017-7179-y" target="_blank">https://doi.org/10.1007/s00376-017-7179-y</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>McKeen et al.(2005)</label><mixed-citation>
      
McKeen, S., Wilczak, J., Grell, G., Djalalova, I., Peckham, S., Hsie, E.-Y.,
Gong, W., Bouchet, V., Menard, S., Moffet, R., McHenry, J., McQueen, J.,
Tang, Y., Carmichael, G. R., Pagowski, M., Chan, A., Dye, T., Frost, G., Lee,
P., and Mathur, R.: Assessment of an ensemble of seven real-time ozone
forecasts over eastern North America during the summer of 2004, J.
Geophys. Res., 110, D21307, <a href="https://doi.org/10.1029/2005JD005858" target="_blank">https://doi.org/10.1029/2005JD005858</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Pei et al.(2017)</label><mixed-citation>
      
Pei, Y., Biswas, S., Fussell, D. S., and Pingali, K.: An Elementary
Introduction to Kalman Filtering,  arXiv [preprint], <a href="https://doi.org/10.48550/ARXIV.1710.04055" target="_blank">https://doi.org/10.48550/ARXIV.1710.04055</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Struzewska et al.(2016)</label><mixed-citation>
      
Struzewska, J., Kaminski, J., and Jefimow, M.: Application of model output
statistics to the GEM-AQ high resolution air quality forecast, Atmos.
Res., 181, 186–199, <a href="https://doi.org/10.1016/j.atmosres.2016.06.012" target="_blank">https://doi.org/10.1016/j.atmosres.2016.06.012</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>World Health Organization(2016)</label><mixed-citation>
      
World Health Organization: Ambient air pollution: a global assessment of
exposure and burden of disease, Tech. rep., <a href="https://apps.who.int/iris/bitstream/handle/10665/250141/9789241511353-eng.pdf?sequence=1" target="_blank"/> (last access: 1 September 2021), 2016.

    </mixed-citation></ref-html>--></article>
