Articles | Volume 24, issue 10
Technical note
24 May 2024
Technical note |  | 24 May 2024

Technical note: An assessment of the performance of statistical bias correction techniques for global chemistry–climate model surface ozone fields

Christoph Staehle, Harald E. Rieder, Arlene M. Fiore, and Jordan L. Schnell

State-of-the-art chemistry–climate models (CCMs) still show biases compared to ground-level ozone observations, illustrating the difficulties and challenges remaining in the simulation of atmospheric processes governing ozone production and loss. Therefore, CCM output is frequently bias-corrected in studies seeking to explore the health or environmental impacts from changing air quality burdens. Here, we assess four statistical bias correction techniques of varying complexities and their application to surface ozone fields simulated with four CCMs and evaluate their performance against gridded observations in the EU and US. We focus on two time periods (2005–2009 and 2010–2014), where the first period is used for development and training and the second to evaluate the performance of techniques when applied to model projections. We find that all methods are capable of significantly reducing the model bias. However, biases are lowest when we apply more complex approaches such as quantile mapping and delta functions. We also highlight the sensitivity of the correction techniques to individual CCM skill at reproducing the observed distributional change in surface ozone. Ensemble simulations available for one CCM indicate that model ozone biases are likely more sensitive to the process representation embedded in chemical mechanisms than to meteorology.

1 Introduction

Surface ozone (O3) is both an air pollutant and a greenhouse gas, formed in photochemical reactions involving precursor substances such as nitrogen oxides (NOx) and volatile organic compounds (VOCs) of both anthropogenic origin and non-anthropogenic origin (e.g., Checa-Garcia et al., 2018; Lelieveld and Dentener, 2000; Monks et al., 2015). In addition to the availability of precursor gases, the NOx to VOC ratio, solar radiation and ambient air temperature, controlling emissions of biogenic VOCs (BVOCs), and chemical reaction rates play a crucial role for O3 formation (Chameides et al., 1988; Sillman, 1999; Sillman et al., 1990). Tropospheric O3 abundance is also substantially influenced by stratospheric intrusions, which can, in certain regions or during specific events, alter concentrations significantly (Akritidis et al., 2010; Lin et al., 2015; Stohl et al., 2003). O3 is associated with a variety of detrimental human health effects, especially in the context of the respiratory and cardiovascular systems, resulting in about 5 %–20 % of premature deaths attributable to ambient air pollution (Gu et al., 2023; Malashock et al., 2022; Monks et al., 2015; Murray et al., 2020; Pozzer et al., 2023; Zhang et al., 2019). In addition to its negative health effects, O3 can compromise the metabolism of plants through stomatal uptake and cause damage to leaf surfaces, thereby affecting biomass and crop production (Da et al., 2022; EEA, 2020; Fleming et al., 2018; Mills et al., 2018; Monks et al., 2015). Consequently, a large body of studies examines past, present and future development of surface O3 burdens as well as resulting health and ecological impacts on both regional and global scales (e.g., Da et al., 2022; Meehl et al., 2018; Nolte et al., 2018; Westervelt et al., 2019).

Studies exploring future changes in surface O3 burdens and their implications for human health and the biosphere rely on simulated fields of chemistry–climate models (CCMs) and chemistry-transport models (CTMs). However, despite ongoing development, these models show deficiencies in the adequate representation of ground-level O3 on regional to local scales and changes therein when compared to observations (e.g., Griffiths et al., 2021; Karlický et al., 2024; Turnock et al., 2020; Young et al., 2018). This shortcoming raises questions regarding the reliability of the simulated surface ozone response to changes in precursors and ambient climate. The number of possible reasons for the deviation of model output and observations increases with the complexity of the models. However, the published literature commonly suggests issues with emissions fed into the models, the applied chemical mechanism, meteorology and deposition in addition to uncertainties associated with the spatial resolution (e.g., Archibald et al., 2020; Liu et al., 2022; Young et al., 2018). To overcome these issues, also as individual experiments are computationally expensive similar to climate studies, statistical bias correction techniques of different complexities are frequently applied to correct global model fields. Such corrections allow for diagnosis of changes in ambient meteorological conditions and ozone in isolation or combination and for investigation of related impacts on human health. Machine learning approaches are increasingly being used for correction purposes (e.g., Liu et al., 2022). These methods, however, usually have the disadvantage of behaving like a “black box”; i.e., algorithms lack traceability and thus physical insights as to the root cause of biases. To date, no detailed comparison of different statistical bias correction techniques for surface ozone burdens has been performed, and the present study aims to close this gap.

Here, we analyze historical simulations from three different global CCMs contributing to the Coupled Model Intercomparison Project Phase 6 (CMIP6), as well as a 13-member ensemble of the Community Earth System Model 2 – Whole Atmosphere Community Climate Model version 6 (CESM2-WACCM6) for the European (EU) and contiguous United States of America (US) domains. For an assessment of model performance, we compare model outputs with gridded observational datasets available for both domains. First, we evaluate the ozone fields of the individual CCMs against observations and contrast the magnitude, sign and seasonality of the bias among CCMs. Thereafter, we apply a set of statistical bias correction techniques aiming for a reduction of the initial bias, independent of its origin, and evaluate the performance of these methods to identify whether a particular correction technique is preferable across models.

Since the model simulations are “free running” and thus create their own meteorology internally, a direct day-to-day comparison with the observations is not meaningful. Hence, our analysis primarily aims to evaluate the distribution of the O3 fields in a statistical sense. Given the detrimental impact of ozone on human health, we focus on the upper tail of the maximum daily 8 h average (MDA8) O3 distribution and the frequency of occurrence of exceedance of health-related target values for Europe and the US.

2 Data and methods

2.1 Model and observational data

The O3 datasets explored in our analysis are hourly surface O3 outputs from three CCMs (GFDL-ESM4, UKESM1-0-LL and EC-Earth3) contributing to CMIP6 and a 13-member ensemble simulation created with CESM2-WACCM6. For most of our study, we use only the first ensemble member of CESM2-WACCM6 to be analogous with the other CCMs, given the overall heterogeneity in the number of members available per model. In Sect. 3.4, we focus on the chemical vs. meteorological driving of model biases and utilize the entire CESM2-WACCM6 ensemble. We also obtain observed MDA8 O3 with a spatial resolution of 1° × 1° per grid cell for both the European domain and the US domain using an extended dataset constructed using the methods of Schnell et al. (2014, 2015) and Schnell and Prather (2017), which was designed specifically to compare against gridded CCMs. The dataset was constructed using an inverse distance weighted interpolation method that includes a de-clustering component similar to kriging; i.e., clustered (within 100 km) observation weights are reduced such that those stations (often located around urban centers) are not disproportionately used in the interpolation. For the US domain, point-based observations that are used in the interpolation include the US Environmental Protection Agency (EPA) Air Quality System (AQS), the US EPA Clean Air Status and Trends Network (CASTNET), and the Environment Canada National Air Pollution Surveillance Program (NAPS); for the European domain we include the European Monitoring and Evaluation Programme (EMEP) and the European Environment Agency (EEA) AirBase network (excluding stations designated as traffic). The exponent for the distance component is 2.5, and a maximum distance of 500 km is used for the weights. Parameters were estimated using a leave-N-out cross-validation technique. Estimations are made at 25 equally spaced points within each 1° × 1° cell and trapezoidally averaged. Other recent work has used this extended dataset (e.g., Ducker et al., 2018; Garrido-Perez et al., 2019; Guo et al., 2018). Schnell et al. (2014) estimated an RMSE of 6–9 ppb for individual stations and 0–3 ppb for the grid cell averages; Ducker et al. (2018) estimated a mean bias of 5–10 ppb with the updated dataset over their study locations. For the analysis here, the interpolation is performed on hourly abundances, and the MDA8 O3 is estimated using the interpolated hourly fields. Note that we apply the nomenclature of the European Union for the calculation of the MDA8 O3 values in both domains; i.e., the 8 h average for a given hour is derived using the data of that specific hour and the preceding 7 h (EUR-LEX, 2008). For convenience, the data are provided on a public repository; see “Data availability” section at the end of the paper.

To allow for an optimal comparison, the model data are re-gridded using an ordinary inverse distance weighting algorithm to match the spatial extent of the observations. In addition, all datasets are harmonized regarding their temporal resolution by removing days not included in any of the other datasets, resulting in a 358 d calendar (30 d per month except for February). MDA8 O3 is derived for each dataset and time step according to the European nomenclature as mentioned above. For the historical analysis, we use 2005–2009 to evaluate the baseline bias of the individual CCMs and establish the performance of individual bias correction techniques. The time slice 2010–2014 is used subsequently to evaluate the performance of our methods for model projections.

2.2 Bias correction methods

For statistical bias correction, we apply four different techniques that are detailed below. Here, Mq and Oq denote quantiles (q1,,N1=min, N=max) of the model and observational distributions, respectively. The running index j marks individual MDA8 O3 model values. Additionally, we use the indices “hist” and “proj” to differentiate between historical and projected data. Primed terms indicate the bias-corrected model outputs.

2.2.1 Mean bias correction (MB)


The MB is a commonly used approach assuming a constant offset between the model and observations. As an initial step, we derive the average difference of the historical model and observational percentiles. Alternatively, the difference between the mean values of both empirical cumulative distribution functions (ECDFs) can be computed. Subsequently, we subtract the result of Eq. (1) from each quantile of the projected model distribution to retrieve a bias-corrected model ECDF (Eq. 2).

2.2.2 Relative bias correction (RB)


Here, similar to the MB method, we assume that the model and observations differ by a constant factor. In contrast to the MB correction, however, we derive the average of the relative deviation of the historic model and observational percentiles (Eq. 3). The bias-corrected model projection (Eq. 4) is then calculated as the difference between the raw model and the observed quantiles times the correction term established in Eq. (3).

2.2.3 Delta correction (DC)


The DC approach follows the methodology detailed in Rieder et al. (2018). In contrast to the MB and RB methods, it is assumed that while the individual model values may be biased, the system response (i.e., change between two time periods) is represented adequately by the model. Therefore the deviation between future and base period model data is calculated for all quantiles individually (Eq. 5). Finally, the corrected model projection is derived as the observed distribution plus the initially computed model change (Eq. 6).

2.2.4 Quantile mapping (QM)


The term “quantile mapping” summarizes a variety of similar bias correction approaches used within the climate research community (e.g., Lehner et al., 2023). Here, however, we follow the method described for CCM outputs in Rieder et al. (2015). In contrast to the other methods used in this study, the QM is a multistep approach. The first steps, illustrated in Eqs. (7) to (9), consist of the computation of a bias-corrected historic model distribution. Next, the result is used to create a bias-corrected future ECDF, similar to the DC method (Eqs. 10 and 11), which is then employed to derive the bias-corrected future model data (Eqs. 12 to 14). In contrast to Rieder et al. (2015), however, who suggested a fixed apportionment for the quantiles to avoid non-meaningful results by executing undefined operations, especially in Eqs. (7) and (12) (i.e., when the denominator equals zero or both denominator and numerator equal zero), we employ here a variable algorithm to select the optimal number of percentiles for each individual realization of the QM method. This is achieved by fixing the minimum and maximum values of the model ECDF and allowing for all quantiles with unique values within this range; i.e., if several quantiles share the same value, which might be the case especially for narrow distributions, only the first quantile is used.

All four methods are applied to the ECDFs of the individual CCM datasets (1) on a monthly basis within the base time interval, (2) for each grid cell individually (in contrast to Rieder et al., 2015, who used a regional approach), and (3) for both the EU domain and US domain. While it is implied that the model data differ from the observations by a constant factor for the MB and RB methods, the DC and QM techniques assume that the difference between the future and reference periods is represented adequately in the individual models, independent of the prevailing model bias. In contrast to the QM method, which provides the opportunity to directly correct individual daily MDA8 O3 values, the application of the MB, RB (according to the methodology detailed above) and DC techniques solely results in new model ECDFs. The mapping algorithm detailed in Eqs. (7) to (9) is therefore applied further to the outputs of these three correction methods. The model data are thereby mapped onto the bias-corrected ECDFs, allowing for an optimal comparison of original and bias-corrected model data with the observations and the results from the other correction techniques under investigation here.

To quantify the initial biases as well as the remaining bias after application of the individual correction techniques, we derive the number of days above the target value for the protection of human health (120 µg m−3 in the EU – approximately 60 ppb – and 70 ppb in the US) and the residual bias of the ECDFs on seasonal and annual timescales (EPA, 2015; EUR-LEX, 2008, 2011).

3 Results

3.1 Model evaluation

We start by evaluating the performance of the global models in representing the MDA8 O3 burden for the historical time period (2005–2009). Figure 1a and b show the pooled MDA8 O3 probability density function for the models and gridded observations for the EU and US domains. Pronounced differences emerge between the individual models and observations for both domains. Generally, the models show a high bias compared to observations, and the amplitude of the bias varies substantially among models. One exception in this regard is the EC-Earth3 model, which shows a high bias compared to observations across the majority of the MDA8 O3 distribution but in contrast to other models has a low bias at the upper tail.

We further investigate the magnitude of the model biases in Fig. 1c and d by contrasting the annual average number (and seasonal partitioning) of days above the target value to protect human health, defined as 60 and 70 ppb for the EU and US domains, respectively. For the observations, we find a domain average number of exceedance days of the target values of 8 d (5 for summer and 3 for spring) and 3 d (2 for summer and 1 for spring) for the EU and US domains in 2005–2009. While the models agree with observations regarding a more frequent occurrence of non-attainment days in summer, all models but EC-Earth3 substantially overestimate the occurrence frequency of exceedance days. The domain average bias in non-attainment days for the EU ranges between 5 d in EC-Earth3 and 113 d in UKESM1-0-LL. In contrast, values for the US vary between 2 and 79 d. Overall, our findings indicate slightly better agreement in CCMs regarding the policy-relevant metrics in the US than in the EU, a fact which has to be taken with caution also given the regional difference in the MDA8 O3 target value. Assuming the same target threshold as for Europe, we find that the number of exceedance days ranges between 20 and 174. Table 1 provides a summary of the occurrence frequency of MDA8 O3 extremes for models and observations on annual and seasonal bases (note that fall and winter are grouped together (FW) due to the small number of exceedance days derived for these seasons).

Table 1Average number of exceedance days (i.e., the number of days above the target threshold of 60 (EU) and 70 ppb (US), respectively) per grid cell derived from observations and individual raw model data for the EU and US (given in parenthesis) for spring (MAM), summer (JJA), fall and winter (FW), and annually. Note that numbers in italics in the parentheses were derived by applying the EU threshold to the US.

Download Print Version | Download XLSX

Next, we turn to model biases in the spatial domain. Figure 2 shows the difference in the average number of days above the target value for individual models and observations (note that grey-shaded areas indicate a marginal difference of up to ± 2 d). The spatial distribution of differences confirms the biases detailed above, showing regionally varying but distinct biases of the models examined. Of the models examined, the EC-Earth3 model performs best in both domains, with a domain average bias of +7 (EU) and +3 d (US). While pronounced differences in the magnitude of the bias between individual models occur, the spatial patterns in biases are quite similar. In particular, a north-to-south gradient emerges in the European domain with significantly higher model biases in the Mediterranean region and small to negligible biases in Scandinavia and the UK. For the US, we find less pronounced biases across models in the Midwest, while substantial biases emerge in the North, Southeast and Southwest.

To investigate the consistency of the spatial bias in models compared to observations, we expand the analysis to the 2010–2014 time period (Figs. S1 and S2). Although slight variations are found for individual seasons, overall the result for this time period resembles the results obtained for 2005–2009 in both the US domain and the EU domain (see Fig. S1). This result provides further confidence in the robustness of our assessment of general model biases in the MDA8 O3 distribution and the modeled frequency of non-attainment days.

Figure 1Probability density function (PDF) of observed (black) and modeled (colored) MDA8 O3 during 2005–2009 in the EU (a) and US (b) domains. Average number of days above the MDA8 O3 target value per grid cell for summer (JJA – red) and spring (MAM – orange) as well as fall and winter months (FW – yellow) during 2005–2009 in the EU (c) and US (d) domains. In panels (a) and (b), dashed vertical lines indicate the target value for the protection of human health. The annual average number of exceedance days in (c) and (d) is given by the sum of the individual segments, i.e., the total height of the bars.


Figure 2Difference in the average number of days above the MDA8 O3 target value in CCM simulations (EC-Earth3 – a and e, CESM2-WACCM6 – b and f, GFDL-ESM4 – c and g, and UKESM1-0-LL – d and h) compared to gridded observations for the EU (left) and US domains (right). All panels show differences during 2005–2009. Red numbers in the upper- or lower-left corner indicate the grid cell average anomaly. Grey shading indicates differences within ± 2 exceedance days.

3.2 Bias correction for the base period, 2005–2009

Having illustrated the model biases for the past, we turn next to bias correction. To this end, we apply the individual bias correction methods to model outputs for 2005–2009 and evaluate their performance for the MDA8 O3 distribution and the number of non-attainment days. The DC method represents an exception in this case as applying this method, by definition, would yield “perfect” agreement with the observational ECDF. Accordingly, any potential deviations from observed ECDF would be a mere result of uncertainties associated with implementation, in particular with the mapping algorithm and rounding, and thereby do not represent the performance of the DC method in the context of the base dataset. The performance of the DC method will, however, be assessed along with the other methods when applied to the evaluation period of 2010–2014.

Figure 3 shows the distribution of the grid-cell-level bias in the number of exceedance days for the European (panels a–d) and US (panels e–h) domains. All methods reduce the bias substantially. The MB and RB methods yield similar results. Both methods tend to overcorrect the bias, yielding residual biases for individual grid cells varying between 22 to +8 d (EU) and 10 to +6 d (US), with MB performing slightly better. In contrast, the QM method yields almost perfect agreement (comparable to the DC method as detailed above) with observations. Residual biases are between 2 and +1 d for Europe and 0 d for the US.

Spatial distributions of the anomaly on exceedance days are illustrated in Figs. S3 and S4. We find that the application of a particular method yields similar spatial patterns of improvement independent of the model to which it is applied and independent of the initial model bias. For the MB and RB approaches, the spatial gradient in the bias identified in the raw models remains for the EU domain, although with reversed sign for the majority of applications, i.e., stronger overcorrection in central Europe and the Mediterranean than in the northern parts of the EU domain. For the US, the MB and RB methods perform better compared to Europe. This finding, however, is attributable to the higher target threshold rather than to the actual performance of these methods, as shown in Sect. 3.1. The QM method best captures the observations in both domains.

We examine the PDFs of the bias-corrected model data for conformity with the observations (see Fig. S5). While all correction methods lower the bias across the whole distribution, the MB and RB approaches still deviate from the observations. In contrast, the distribution of the QM-corrected data is almost perfectly aligned with the observational PDF, independent of the model and domain. In summary, our evaluation for the baseline period indicates a clear preference for the QM (or the DC) method.

Figure 3Box plots of the average residual bias in exceedance days pooled across grid cells in the individual CCMs in 2005–2009: (a) EC-Earth3, (b) CESM2-WACCM6, (c) GFDL-ESM4 and (d) UKESM1-0-LL models in the EU domain; panels (e)(h) are as (a)(d) but for the US domain. Blue, green and red colors indicate the MB, RB and QM correction methods, respectively.


3.3 Bias correction performance in the evaluation period, 2010–2014

Next, we turn the focus to the results obtained with individual bias correction techniques during the evaluation time period (2010–2014). We apply the adjustment methods to the MDA8 O3 outputs of the individual models but treat the data as independent realizations in order to assess the method performances for their applicability to future projections (see Sect. 2).

Figure 4 shows the distribution of the residual grid-cell-level mean bias compared to observations for the number of exceedance days of the target value. Here we find a larger residual bias in the European domain, ranging between 17 and +11 exceedance days (Fig. 4a–d), than in the US (Fig. 4e–h) where the bias after correction varies between 5 and +5 d across grid cells. Furthermore, contrasting the performance of the individual bias correction techniques yields a curious result, as we no longer identify an individual correction technique as optimal across models and spatial domains.

We further explore the spatial distribution of the residual bias. Compared to the base period, the MB- and RB-corrected (see panels a–d and e–h in Figs. S6–S7) models show improved agreement compared to observations. While the spatial patterns of bias distributions are similar to the 2005–2009 period (except for the GFDL-ESM4 model), an improvement compared to the base period is found for northern and eastern European countries as well as for the Southeast US. The residual bias worsens in the central EU and the Mediterranean as well as the Southwest US when applied to the GFDL-ESM4 model. For the DC and QM approaches (see panels i–l and m–p in Figs. S6–S7), on the other hand, we find a significantly increased residual bias (of both positive and negative sign) independent of model and domain.

Although all methods applied are still capable of significantly reducing the bias, these results, in contrast to those for the base period, no longer allow the identification of a sole ideal correction method, indicating changes in the underlying processes contributing to the bias. Our findings show that the correction approach yielding the lowest residual bias varies strongly across models and spatial domains. For example, while the QM method performs best for CESM2-WACCM6 in the EU domain (Fig. 4b), the RB method yields a smaller residual bias in the US (Fig. 4f).

These results are supported by the analysis of the PDFs of the bias-corrected model output (Fig. S8). While conformity with observations remains widely similar for the majority of the distribution, the adjustment of the high tail yields slightly better results in the context of the MB and RB methods when compared to the base period. Contrarily, the distributions of both the DC and QM methods show good agreement with the low tail and the midsection of the observational PDF. The performance, however, deteriorates towards the high tail, partially resulting in an overestimation of the monitored distribution, especially in the European domain.

To further investigate this curious result, we examine, on a quantile basis across the MDA8 O3 distributions, (i) the error resulting from the initial bias correction of the base period (EB) and (ii) the error resulting from the deviation of the model change between the base and evaluation periods when compared to observations (EΔ).


The results of this analysis are exemplarily shown in Fig. 5 for the CESM2-WACCM6 model for the EU (panels a–c) and the US (panels d–f; note that the illustrations for the other models are included in the Supplement as Figs. S9–S10). Here, the red shading indicates the minimum to maximum range of the residual bias across grid cells for the base period after bias correction (EB; Eq. 15), and the solid red line shows the domain average of this bias in the individual quantiles concerned. In contrast, the grey shading illustrates the minimum to maximum range of the differences in the change between the base and evaluation period of the raw model and observations, respectively (EΔ; Eq. 16). The solid black line marks the domain average of this bias in individual quantiles. The residual bias for the evaluation period EF (or by analogy, any other future time period) comprises the sum of these errors (base bias and response bias) and is illustrated for the domain average as the dashed yellow line. We note that as the DC method by definition yields no initial error in the base period, only EΔ is relevant in the evaluation period, which is illustrated by the grey shading and the solid black line in all panels of Fig. 5 (as well as Figs. S9–S10).

For the base period, it is apparent that the QM correction technique, in contrast to the RB and MB corrections, yields only minor differences across the MDA8 O3 distribution when compared to the observations in both spatial domains. For the evaluation period, we see that the difference in response between models and observations dominates the raw performance of the individual correction techniques and that the residual bias depends strongly on the region and model concerned (see Figs. 4–5 and Supplement Figs. S9–S10). Given this result, we assume that the correction performance depends strongly on models being able to represent precursor emission changes over time as seen in observations.

All models show distinct biases in reproducing observed ozone changes between the two time periods, with a particularly pronounced magnitude in the tails of the distributions. Although both error terms and the resulting net error are found to be rather small in the domain average (roughly ± 5 ppb), they might have a strong influence on the individual grid-cell level (see shading). Especially for the MB and RB techniques, the individual errors might compensate for each other, as illustrated by the improved results relative to the base period. The DC and QM approaches, on the other hand, strongly depend on the quality of the model response in time. Here, we find that pronounced errors in the model change offset (at least in part) the benefits illustrated for the base period (see Fig. 4).

Figure 4As Fig. 3 but for the 2010–2014 time period. Blue, green, orange and red colors indicates the MB, RB, DC and QM correction methods, respectively.


Figure 5Error components of the CESM2-WACCM6 model during the evaluation period for the MB (a, d), RB (b, e) and QM (c, f) methods in the EU (left column) and the US (right column) domains. The red shading gives the minimum to maximum range, and the solid red line is the domain average of the residual bias in the base period (EB). The grey shading gives the minimum to maximum range, and the solid black line is the domain average of the differences in the change between the base and evaluation period of the raw model and observations (EΔ). The resulting domain average error of the evaluation period (EF) is indicated by the dashed yellow line (note that for the DC method EB=0 and hence EF= EΔ).


3.4 The influence of meteorology on the bias in the CESM2-WACCM6 ensemble

Having illustrated the MDA8 O3 biases of various CMIP6 models, the performance of various statistical bias techniques and the influence of the model response to changes in, for instance, emissions on the performance of bias correction, we turn our focus to shedding light on the underlying cause of biased MDA8 O3 model outputs. To this end, we analyze the 13 members of the CESM2-WACCM6 ensemble in more detail in order to examine the individual realizations for consistency as well as to find a possible dominant cause for the bias in the modeled surface ozone fields. Here, two likely prime candidates exist: (1) issues with the sensitivity in chemical mechanisms to local and/or regional precursor emissions (note that anthropogenic emissions are consistent across individual models) and (2) issues in meteorology simulated by the free-running CCM. For the latter, we further include three key climatological drivers of ozone production/accumulation in our analysis, i.e., daily maximum temperature (TSMX), daily average downwelling shortwave radiation (FSDS) and daily average wind speed (WSPD), in order to discern whether the bias is predominantly driven by sensitivity to meteorology or to chemistry. As chemical covariates, we include monthly averages of NO, NO2 and HCHO; the latter we consider a bulk proxy for VOCs (e.g., Shen et al., 2019; Zhu et al., 2017).

Figures 6 and 7 illustrate the PDFs of MDA8 O3, NO, NO2, HCHO, TSMX, FSDS and WSPD for the individual ensemble members during spring and summer in 2005–2009 (the PDFs for 2010–2014 are shown in Supplement Figs. S11 and S12). MAM and JJA MDA8 O3 (Fig. 6a and e) shows a very similar distribution across ensemble members for both domains. For example, the median MDA8 O3 value across ensemble members ranges roughly between 50 to 52 ppb (MAM) and 45 to 47 ppb (JJA) in the EU. For the US, the median MDA8 O3 values were found to be slightly higher than in the EU, but the differences within the ensemble lie in the same narrow range (53 to 55 ppb for MAM and 54 to 55 ppb for JJA). Similarly, compact PDFs across the ensemble are found for NO, NO2 and HCHO. Interestingly, differences emerge for HCHO in the US but not in Europe, which represents a larger influence of biogenic emissions.

Similar results are found for the meteorological variables. Although slight variations occur for surface temperature, radiation and wind speed (which one would expect from a model generating its own meteorology), the PDFs are widely homogenous across the ensemble, thereby explaining the similarity of surface ozone distributions within the ensemble in both domains (as all ensemble members are driven by the same set of precursor emissions). The analysis of the MDA8 O3, NO, NO2, HCHO, TSMX, FSDS and WSPD distributions over the second time period (2010–2014; Figs. S11 and S12) yields similar results, thereby providing confidence in the robustness of our findings.

The strong similarity across ensemble members indicates that the MDA8 O3 bias identified in CESM2-WACCM6 most likely stems from sensitivities in the chemical mechanism and/or emissions and not from meteorological drivers and their variability. As the models use the same anthropogenic emissions, the differences are more likely to stem from the chemistry, which could include different mixes of emitted VOCs. Previous research has shown that temperature biases are rather small and that a significant overestimation of the temperature in the troposphere occurs solely in the Southern Hemisphere polar region, a region which is not investigated here (Danabasoglu et al., 2020; Gettelman et al., 2019). Nevertheless, we note that small deviations in temperature have been found to explain biases of 5–15 ppb for surface O3 in former model generations (Rasmussen et al., 2012). While the presented ensemble analysis is, due to data availability, only possible for CESM2-WACCM6, the results provide a first-order estimate of the dominant model component responsible for surface ozone biases. Future work should confirm that this finding holds for other global models, and thus an ensemble strategy for model experiments is recommended for future model intercomparison activities such as CCMI and CMIP.

Figure 6CESM2-WACCM6 spring (MAM) and summer (JJA) PDFs of (a) MDA8 O3, (b) NO, (c) NO2 and (d) HCHO for the EU domain in 2005–2009. Panels (e)(h) are as (a)(b) but for the US domain. Note that a value of 2 has been added to summertime concentrations of NO, NO2 and HCHO to allow for visual separation of the seasonal PDFs.


Figure 7CESM2-WACCM6 spring (MAM) and summer (JJA) PDFs of (a) TSMX, (b) FSDS and (c) WSPD for the EU domain in 2005–2009. Panels (d)(f) are as (a)(c) but for the US domain.


4 Summary and conclusions

In this study, we evaluate the bias in surface ozone burdens for four global CCMs contributing to CMIP6 (EC-Earth3, CESM2-WACCM6, GFDL-ESM4 and UKESM1-0-LL) and present the first comprehensive comparison of the performance of four different statistical bias correction techniques to derive CCM-based ozone metrics with relevance for public health and policy. While all models show biases when compared to observations, the bias magnitude of the raw, uncorrected MDA8 O3 outputs differs strongly within the pool of models analyzed.

The evaluation of the four bias correction techniques for the base period (2005–2009), where techniques are tuned to observations, illustrates that all methods are capable of lowering the bias. The MB and RB methods, however, are less accurate when contrasted with the results obtained from the DC or QM approaches. Furthermore, when applying the MB and RB methods, the model output fields might even be overcorrected for individual grid cells; i.e., the resulting ozone distributions might become biased low. This is not surprising as both techniques apply a single average value for the correction of the whole distribution function, which is a disadvantage – especially when it comes to the tails of the distribution – if the bias is not constant across the ECDF.

The independent evaluation of the four techniques over the second time period (2010–2014) focusing on the bias correction of model projections yields less distinct results. Although the model-to-observation agreement is improved for all MDA8 O3 metrics in the corrected models compared to their raw counterparts, no single, optimal correction technique can be identified. Our results illustrate that technique performance depends strongly on the model selected and its MDA8 O3 evolution over time and thus on the response to boundary condition changes. This at first surprising result, however, can be explained by the examination of the composition of the residual model error.

The residual error for future projections is comprised of two parts: (1) the residual error of the base period EB and (2) the error attributable to the model response to changes in boundary conditions (emissions, climate, etc.) between both time periods, EΔ. The magnitude of EΔ was found to exert a dominant influence on the overall correction performance, which raises some concerns regarding the robustness of model responses and thus the reliability of model projections (not only in the context of surface O3). In contrast to EΔ,EB depends on the quality of the initial base period bias correction. Here, our results clearly show that EB is substantially larger for the MB and RB methods than for the QM and DC methods. When applying the correction techniques, EΔ and EB might compensate for individual grid cells, resulting in a low residual bias. On the contrary, the strong performance for the base period obtained with the QM and DC approaches is attributable to a very low EB, which might deteriorate in projections if EΔ is large. Thus, we conclude that under the assumption of an adequate model response to changing boundary conditions (and thus low EΔ), the QM and DC methods outperform the MB and RB techniques. If a decision has to be made as to whether the DC or QM approach is used for bias correction, given that differences between the results obtained with both techniques are negligible, we would argue for DC correction due the comparably easy numerical implementation.

To obtain further insights into the root cause(s) of the surface ozone bias in models, we explored the MDA8 O3 output of the 13-member CESM2-WACCM6 ensemble together with information on NO, NO2 and HCHO and key meteorological covariates for ozone production, i.e., daily maximum temperature, daily mean incoming shortwave radiation and daily mean wind speed. Here, our analysis showed only small variations within the CESM2-WACCM6 ensemble for core meteorological drivers (and chemical covariates) of surface ozone. This suggests, given that emissions are consistent across models, a dominant influence of the chemical mechanism on the bias in the O3 fields rather than a prominent role of model meteorology. Investigating whether this finding can be generalized to other CCMs requires future community efforts in the provision of additional ensemble simulations for individual CCMs contributing to the CCMI or CMIP frameworks.

Data availability

CMIP6 datasets are publicly available at, last access: 1 March 2023. Processed data can be made available by the corresponding author upon reasonable request. The gridded MDA8 O3 datasets are available at (Staehle, 2024).


The supplement related to this article is available online at:

Author contributions

CS: conceptualization, formal analysis, methodology, visualization and writing – original draft preparation. HER: conceptualization, methodology, resources, supervision and writing – review and editing. AMF: resources, supervision and writing – review and editing. JLS: data curation and writing – review and editing.

Competing interests

The contact author has declared that none of the authors has any competing interests.


The statements, findings, conclusions, and recommendations are those of the author(s) and do not necessarily reflect the views of NOAA or the US Department of Commerce.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.


The authors are grateful to the EC-Earth3, GFDL-ESM4 and UKESM1-0-LL modeling teams for providing the ensemble simulations at (last access: 1 March 2023). The authors are grateful to Ramiro Checa-Garcia for fruitful discussions and comments. The authors thank the two anonymous reviewers for their valuable comments on an earlier version of this paper.

Financial support

This research was supported in part by the Klima und Energiefonds under grant agreement no. ACRP11 KR18AC0K14686 to BOKU University. This research was supported in part by the NOAA cooperative agreement no. NA22OAR4320151 for the Cooperative Institute for Earth System Research and Data Science (CIESRDS). Christoph Staehle was supported in part by an OeAD Marietta Blau Fellowship grant.

Review statement

This paper was edited by Andrea Pozzer and reviewed by two anonymous referees.


Akritidis, D., Zanis, P., Pytharoulis, I., Mavrakis, A., and Karacostas, T.: A deep stratospheric intrusion event down to the earth's surface of the megacity of Athens, Meteorol. Atmos. Phys., 109, 9–18,, 2010. 

Archibald, A. T., Neu, J. L., Elshorbany, Y. F., Cooper, O. R., Young, P. J., Akiyoshi, H., Cox, R. A., Coyle, M., Derwent, R. G., Deushi, M., Finco, A., Frost, G. J., Galbally, I. E., Gerosa, G., Granier, C., Griffiths, P. T., Hossaini, R., Hu, L., Jöckel, P., Josse, B., Lin, M. Y., Mertens, M., Morgenstern, O., Naja, M., Naik, V., Oltmans, S., Plummer, D. A., Revell, L. E., Saiz-Lopez, A., Saxena, P., Shin, Y. M., Shahid, I., Shallcross, D., Tilmes, S., Trickl, T., Wallington, T. J., Wang, T., Worden, H. M., and Zeng, G.: Tropospheric Ozone Assessment Report: A critical review of changes in the tropospheric ozone burden and budget from 1850 to 2100, Elementa, 8, 1–53,, 2020. 

Chameides, W. L., Lindsay, R. W., Richardson, J., and Kiang, C. S.: The Role of Biogenic Hydrocarbons in Urban Photochemical Smog: Atlanta as a Case Study, Science, 241, 1473–1475,, 1988. 

Checa-Garcia, R., Hegglin, M. I., Kinnison, D., Plummer, D. A., and Shine, K. P.: Historical Tropospheric and Stratospheric Ozone Radiative Forcing Using the CMIP6 Database, Geophys. Res. Lett., 45, 3264–3273,, 2018. 

Da, Y., Xu, Y., and McCarl, B.: Effects of Surface Ozone and Climate on Historical (1980–2015) Crop Yields in the United States: Implication for Mid-21st Century Projection, Environ. Resour. Econ., 81, 355–378,, 2022. 

Danabasoglu, G., Lamarque, J. F., Bacmeister, J., Bailey, D. A., DuVivier, A. K., Edwards, J., Emmons, L. K., Fasullo, J., Garcia, R., Gettelman, A., Hannay, C., Holland, M. M., Large, W. G., Lauritzen, P. H., Lawrence, D. M., Lenaerts, J. T. M., Lindsay, K., Lipscomb, W. H., Mills, M. J., Neale, R., Oleson, K. W., Otto-Bliesner, B., Phillips, A. S., Sacks, W., Tilmes, S., van Kampenhout, L., Vertenstein, M., Bertini, A., Dennis, J., Deser, C., Fischer, C., Fox-Kemper, B., Kay, J. E., Kinnison, D., Kushner, P. J., Larson, V. E., Long, M. C., Mickelson, S., Moore, J. K., Nienhouse, E., Polvani, L., Rasch, P. J., and Strand, W. G.: The Community Earth System Model Version 2 (CESM2), J. Adv. Model Earth Sy., 12, e2019MS001916,, 2020. 

Ducker, J. A., Holmes, C. D., Keenan, T. F., Fares, S., Goldstein, A. H., Mammarella, I., Munger, J. W., and Schnell, J.: Synthetic ozone deposition and stomatal uptake at flux tower sites, Biogeosciences, 15, 5395–5413,, 2018. 

EEA: Air quality in Europe – 2020 Report (09/2020), (last access: 9 November 2023), 2020. 

EPA: National Ambient Air Quality Standards for Ozone, (last access: 16 November 2023), 2015. 

EUR-LEX: Directive 2008/50/EC of the European parliament and of the council of 21 May 2008 on ambient air quality and cleaner air for Europe (2008/50/EC), (last access: 12 March 2023), 2008. 

EUR-LEX: Commission Implementing Decision of 12 December 2011 laying down rules for Directives 2004/107/EC and 2008/50/EC of the European Parliament and of the Council as regards the reciprocal exchange of information and reporting on ambient air quality (notified under document C(2011) 9068) (2011/850/EU), (last access: 10 September 2023), 2011. 

Fleming, Z. L., Doherty, R. M., von Schneidemesser, E., Malley, C. S., Cooper, O. R., Pinto, J. P., Colette, A., Xu, X., Simpson, D., Schultz, M. G., Lefohn, A. S., Hamad, S., Moolla, R., Solberg, S., and Feng, Z.: Tropospheric Ozone Assessment Report: Present-day ozone distribution and trends relevant to human health, Elementa, 6, 177 pp.,, 2018. 

Garrido-Perez, J. M., Ordóñez, C., García-Herrera, R., and Schnell, J. L.: The differing impact of air stagnation on summer ozone across Europe, Atmos. Environ., 219, 117062,, 2019. 

Gettelman, A., Mills, M. J., Kinnison, D. E., Garcia, R. R., Smith, A. K., Marsh, D. R., Tilmes, S., Vitt, F., Bardeen, C. G., McInerny, J., Liu, H. L., Solomon, S. C., Polvani, L. M., Emmons, L. K., Lamarque, J. F., Richter, J. H., Glanville, A. S., Bacmeister, J. T., Phillips, A. S., Neale, R. B., Simpson, I. R., DuVivier, A. K., Hodzic, A., and Randel, W. J.: The Whole Atmosphere Community Climate Model Version 6 (WACCM6), J. Geophys. Res.-Atmos., 124, 12380–12403,, 2019. 

Griffiths, P. T., Murray, L. T., Zeng, G., Shin, Y. M., Abraham, N. L., Archibald, A. T., Deushi, M., Emmons, L. K., Galbally, I. E., Hassler, B., Horowitz, L. W., Keeble, J., Liu, J., Moeini, O., Naik, V., O'Connor, F. M., Oshima, N., Tarasick, D., Tilmes, S., Turnock, S. T., Wild, O., Young, P. J., and Zanis, P.: Tropospheric ozone in CMIP6 simulations, Atmos. Chem. Phys., 21, 4187-4218,, 2021. 

Gu, Y., Henze, D. K., Nawaz, M. O., and Wagner, U. J.: Response of the ozone-related health burden in Europe to changes in local anthropogenic emissions of ozone precursors, Environ. Res. Lett., 18, 114034,, 2023. 

Guo, J. J., Fiore, A. M., Murray, L. T., Jaffe, D. A., Schnell, J. L., Moore, C. T., and Milly, G. P.: Average versus high surface ozone levels over the continental USA: model bias, background influences, and interannual variability, Atmos. Chem. Phys., 18, 12123–12140,, 2018. 

Karlický, J., Rieder, H. E., Huszár, P., Peiker, J., and Sukhodolov, T.: A cautious note advocating the use of ensembles of models and driving data in modeling of regional ozone burdens, Air Quality, Air Qual. Atmos. Hlth., 10 pp.,, 2024. 

Lehner, F., Nadeem, I., and Formayer, H.: Evaluating skills and issues of quantile-based bias adjustment for climate change scenarios, Adv. Stat. Clim. Meteorol. Oceanogr., 9, 29–44,, 2023. 

Lelieveld, J. and Dentener, F. J.: What controls tropospheric ozone?, J. Geophys. Res.-Atmos., 105, 3531–3551,, 2000. 

Lin, M., Fiore, A. M., Horowitz, L. W., Langford, A. O., Oltmans, S. J., Tarasick, D., and Rieder, H. E.: Climate variability modulates western US ozone air quality in spring via deep stratospheric intrusions, Nat. Commun., 6, 7105,, 2015. 

Liu, Z., Doherty, R. M., Wild, O., O'Connor, F. M., and Turnock, S. T.: Correcting ozone biases in a global chemistry–climate model: implications for future ozone, Atmos. Chem. Phys., 22, 12543-12557,, 2022. 

Malashock, D. A., Delang, M. N., Becker, J. S., Serre, M. L., West, J. J., Chang, K.-L., Cooper, O. R., and Anenberg, S. C.: Global trends in ozone concentration and attributable mortality for urban, peri-urban, and rural areas between 2000 and 2019: a modelling study, Lancet Planet. Hlth., 6, e958–e967,, 2022. 

Meehl, G. A., Tebaldi, C., Tilmes, S., Lamarque, J.-F., Bates, S., Pendergrass, A., and Lombardozzi, D.: Future heat waves and surface ozone, Environ. Res. Lett., 13, 064004,, 2018. 

Mills, G., Pleijel, H., Malley, C. S., Sinha, B., Cooper, O. R., Schultz, M. G., Neufeld, H. S., Simpson, D., Sharps, K., Feng, Z., Gerosa, G., Harmens, H., Kobayashi, K., Saxena, P., Paoletti, E., Sinha, V., and Xu, X.: Tropospheric Ozone Assessment Report: Present-day tropospheric ozone distribution and trends relevant to vegetation, Elementa, 6, 46 pp.,, 2018. 

Monks, P. S., Archibald, A. T., Colette, A., Cooper, O., Coyle, M., Derwent, R., Fowler, D., Granier, C., Law, K. S., Mills, G. E., Stevenson, D. S., Tarasova, O., Thouret, V., von Schneidemesser, E., Sommariva, R., Wild, O., and Williams, M. L.: Tropospheric ozone and its precursors from the urban to the global scale from air quality to short-lived climate forcer, Atmos. Chem. Phys., 15, 8889–8973,, 2015. 

Murray, C. J. L., Aravkin, A. Y., Zheng, P., Abbafati, C., Abbas, K. M., Abbasi-Kangevari, M., Abd-Allah, F., Abdelalim, A., Abdollahi, M., Abdollahpour, I., Abegaz, K. H., Abolhassani, H., Aboyans, V., Abreu, L. G., Abrigo, M. R. M., Abualhasan, A., Abu-Raddad, L. J., Abushouk, A. I., Adabi, M., Adekanmbi, V., Adeoye, A. M., Adetokunboh, O. O., Adham, D., Advani, S. M., Agarwal, G., Aghamir, S. M. K., Agrawal, A., Ahmad, T., Ahmadi, K., Ahmadi, M., Ahmadieh, H., Ahmed, M. B., Akalu, T. Y., Akinyemi, R. O., Akinyemiju, T., Akombi, B., Akunna, C. J., Alahdab, F., Al-Aly, Z., Alam, K., Alam, S., Alam, T., Alanezi, F. M., Alanzi, T. M., Alemu, B. w., Alhabib, K. F., Ali, M., Ali, S., Alicandro, G., Alinia, C., Alipour, V., Alizade, H., Aljunid, S. M., Alla, F., Allebeck, P., Almasi-Hashiani, A., Al-Mekhlafi, H. M., Alonso, J., Altirkawi, K. A., Amini-Rarani, M., Amiri, F., Amugsi, D. A., Ancuceanu, R., Anderlini, D., Anderson, J. A., Andrei, C. L., Andrei, T., Angus, C., Anjomshoa, M., Ansari, F., Ansari-Moghaddam, A., Antonazzo, I. C., Antonio, C. A. T., Antony, C. M., Antriyandarti, E., Anvari, D., Anwer, R., Appiah, S. C. Y., Arabloo, J., Arab-Zozani, M., Ariani, F., Armoon, B., Ärnlöv, J., Arzani, A., Asadi-Aliabadi, M., Asadi-Pooya, A. A., Ashbaugh, C., Assmus, M., Atafar, Z., Atnafu, D. D., Atout, M. M. d. W., Ausloos, F., Ausloos, M., Ayala Quintanilla, B. P., Ayano, G., Ayanore, M. A., Azari, S., Azarian, G., Azene, Z. N., Badawi, A., Badiye, A. D., Bahrami, M. A., Bakhshaei, M. H., Bakhtiari, A., Bakkannavar, S. M., Baldasseroni, A., Ball, K., Ballew, S. H., Balzi, D., Banach, M., Banerjee, S. K., Bante, A. B., Baraki, A. G., Barker-Collo, S. L., Bärnighausen, T. W., Barrero, L. H., Barthelemy, C. M., Barua, L., Basu, S., Baune, B. T., Bayati, M., Becker, J. S., Bedi, N., Beghi, E., Béjot, Y., Bell, M. L., Bennitt, F. B., Bensenor, I. M., Berhe, K., Berman, A. E., Bhagavathula, A. S., Bhageerathy, R., Bhala, N., Bhandari, D., Bhattacharyya, K., Bhutta, Z. A., Bijani, A., Bikbov, B., Bin Sayeed, M. S., Biondi, A., Birihane, B. M., Bisignano, C., Biswas, R. K., Bitew, H., Bohlouli, S., Bohluli, M., Boon-Dooley, A. S., Borges, G., Borzì, A. M., Borzouei, S., Bosetti, C., Boufous, S., Braithwaite, D., Breitborde, N. J. K., Breitner, S., Brenner, H., Briant, P. S., Briko, A. N., Briko, N. I., Britton, G. B., Bryazka, D., Bumgarner, B. R., Burkart, K., Burnett, R. T., Burugina Nagaraja, S., Butt, Z. A., Caetano dos Santos, F. L., Cahill, L. E., Cámera, L. L. A. A., Campos-Nonato, I. R., Cárdenas, R., Carreras, G., Carrero, J. J., Carvalho, F., Castaldelli-Maia, J. M., Castañeda-Orjuela, C. A., Castelpietra, G., Castro, F., Causey, K., Cederroth, C. R., Cercy, K. M., Cerin, E., Chandan, J. S., Chang, K.-L., Charlson, F. J., Chattu, V. K., Chaturvedi, S., Cherbuin, N., Chimed-Ochir, O., Cho, D. Y., Choi, J.-Y. J., Christensen, H., Chu, D.-T., Chung, M. T., Chung, S.-C., Cicuttini, F. M., Ciobanu, L. G., Cirillo, M., Classen, T. K. D., Cohen, A. J., Compton, K., Cooper, O. R., Costa, V. M., Cousin, E., Cowden, R. G., Cross, D. H., Cruz, J. A., Dahlawi, S. M. A., Damasceno, A. A. M., Damiani, G., Dandona, L., Dandona, R., Dangel, W. J., Danielsson, A.-K., Dargan, P. I., Darwesh, A. M., Daryani, A., Das, J. K., Das Gupta, R., das Neves, J., Dávila-Cervantes, C. A., Davitoiu, D. V., De Leo, D., Degenhardt, L., DeLang, M., Dellavalle, R. P., Demeke, F. M., Demoz, G. T., Demsie, D. G., Denova-Gutiérrez, E., Dervenis, N., Dhungana, G. P., Dianatinasab, M., Dias da Silva, D., Diaz, D., Dibaji Forooshani, Z. S., Djalalinia, S., Do, H. T., Dokova, K., Dorostkar, F., Doshmangir, L., Driscoll, T. R., Duncan, B. B., Duraes, A. R., Eagan, A. W., Edvardsson, D., El Nahas, N., El Sayed, I., El Tantawi, M., Elbarazi, I., Elgendy, I. Y., El-Jaafary, S. I., Elyazar, I. R. F., Emmons-Bell, S., Erskine, H. E., Eskandarieh, S., Esmaeilnejad, S., Esteghamati, A., Estep, K., Etemadi, A., Etisso, A. E., Fanzo, J., Farahmand, M., Fareed, M., Faridnia, R., Farioli, A., Faro, A., Faruque, M., Farzadfar, F., Fattahi, N., Fazlzadeh, M., Feigin, V. L., Feldman, R., Fereshtehnejad, S.-M., Fernandes, E., Ferrara, G., Ferrari, A. J., Ferreira, M. L., Filip, I., Fischer, F., Fisher, J. L., Flor, L. S., Foigt, N. A., Folayan, M. O., Fomenkov, A. A., Force, L. M., Foroutan, M., Franklin, R. C., Freitas, M., Fu, W., Fukumoto, T., Furtado, J. M., Gad, M. M., Gakidou, E., Gallus, S., Garcia-Basteiro, A. L., Gardner, W. M., Geberemariyam, B. S., Gebreslassie, A. A. A. A., Geremew, A., Gershberg Hayoon, A., Gething, P. W., Ghadimi, M., Ghadiri, K., Ghaffarifar, F., Ghafourifard, M., Ghamari, F., Ghashghaee, A., Ghiasvand, H., Ghith, N., Gholamian, A., Ghosh, R., Gill, P. S., Ginindza, T. G. G., Giussani, G., Gnedovskaya, E. V., Goharinezhad, S., Gopalani, S. V., Gorini, G., Goudarzi, H., Goulart, A. C., Greaves, F., Grivna, M., Grosso, G., Gubari, M. I. M., Gugnani, H. C., Guimarães, R. A., Guled, R. A., Guo, G., Guo, Y., Gupta, R., Gupta, T., Haddock, B., Hafezi-Nejad, N., Hafiz, A., Haj-Mirzaian, A., Haj-Mirzaian, A., Hall, B. J., Halvaei, I., Hamadeh, R. R., Hamidi, S., Hammer, M. S., Hankey, G. J., Haririan, H., Haro, J. M., Hasaballah, A. I., Hasan, M. M., Hasanpoor, E., Hashi, A., Hassanipour, S., Hassankhani, H., Havmoeller, R. J., Hay, S. I., Hayat, K., Heidari, G., Heidari-Soureshjani, R., Henrikson, H. J., Herbert, M. E., Herteliu, C., Heydarpour, F., Hird, T. R., Hoek, H. W., Holla, R., Hoogar, P., Hosgood, H. D., Hossain, N., Hosseini, M., Hosseinzadeh, M., Hostiuc, M., Hostiuc, S., Househ, M., Hsairi, M., Hsieh, V. C.-r., Hu, G., Hu, K., Huda, T. M., Humayun, A., Huynh, C. K., Hwang, B.-F., Iannucci, V. C., Ibitoye, S. E., Ikeda, N., Ikuta, K. S., Ilesanmi, O. S., Ilic, I. M., Ilic, M. D., Inbaraj, L. R., Ippolito, H., Iqbal, U., Irvani, S. S. N., Irvine, C. M. S., Islam, M. M., Islam, S. M. S., Iso, H., Ivers, R. Q., Iwu, C. C. D., Iwu, C. J., Iyamu, I. O., Jaafari, J., Jacobsen, K. H., Jafari, H., Jafarinia, M., Jahani, M. A., Jakovljevic, M., Jalilian, F., James, S. L., Janjani, H., Javaheri, T., Javidnia, J., Jeemon, P., Jenabi, E., Jha, R. P., Jha, V., Ji, J. S., Johansson, L., John, O., John-Akinola, Y. O., Johnson, C. O., Jonas, J. B., Joukar, F., Jozwiak, J. J., Jürisson, M., Kabir, A., Kabir, Z., Kalani, H., Kalani, R., Kalankesh, L. R., Kalhor, R., Kanchan, T., Kapoor, N., Karami Matin, B., Karch, A., Karim, M. A., Kassa, G. M., Katikireddi, S. V., Kayode, G. A., Kazemi Karyani, A., Keiyoro, P. N., Keller, C., Kemmer, L., Kendrick, P. J., Khalid, N., Khammarnia, M., Khan, E. A., Khan, M., Khatab, K., Khater, M. M., Khatib, M. N., Khayamzadeh, M., Khazaei, S., Kieling, C., Kim, Y. J., Kimokoti, R. W., Kisa, A., Kisa, S., Kivimäki, M., Knibbs, L. D., Knudsen, A. K. S., Kocarnik, J. M., Kochhar, S., Kopec, J. A., Korshunov, V. A., Koul, P. A., Koyanagi, A., Kraemer, M. U. G., Krishan, K., Krohn, K. J., Kromhout, H., Kuate Defo, B., Kumar, G. A., Kumar, V., Kurmi, O. P., Kusuma, D., La Vecchia, C., Lacey, B., Lal, D. K., Lalloo, R., Lallukka, T., Lami, F. H., Landires, I., Lang, J. J., Langan, S. M., Larsson, A. O., Lasrado, S., Lauriola, P., Lazarus, J. V., Lee, P. H., Lee, S. W. H., LeGrand, K. E., Leigh, J., Leonardi, M., Lescinsky, H., Leung, J., Levi, M., Li, S., Lim, L.-L., Linn, S., Liu, S., Liu, S., Liu, Y., Lo, J., Lopez, A. D., Lopez, J. C. F., Lopukhov, P. D., Lorkowski, S., Lotufo, P. A., Lu, A., Lugo, A., Maddison, E. R., Mahasha, P. W., Mahdavi, M. M., Mahmoudi, M., Majeed, A., Maleki, A., Maleki, S., Malekzadeh, R., Malta, D. C., Mamun, A. A., Manda, A. L., Manguerra, H., Mansour-Ghanaei, F., Mansouri, B., Mansournia, M. A., Mantilla Herrera, A. M., Maravilla, J. C., Marks, A., Martin, R. V., Martini, S., Martins-Melo, F. R., Masaka, A., Masoumi, S. Z., Mathur, M. R., Matsushita, K., Maulik, P. K., McAlinden, C., McGrath, J. J., McKee, M., Mehndiratta, M. M., Mehri, F., Mehta, K. M., Memish, Z. A., Mendoza, W., Menezes, R. G., Mengesha, E. W., Mereke, A., Mereta, S. T., Meretoja, A., Meretoja, T. J., Mestrovic, T., Miazgowski, B., Miazgowski, T., Michalek, I. M., Miller, T. R., Mills, E. J., Mini, G. K., Miri, M., Mirica, A., Mirrakhimov, E. M., Mirzaei, H., Mirzaei, M., Mirzaei, R., Mirzaei-Alavijeh, M., Misganaw, A. T., Mithra, P., Moazen, B., Mohammad, D. K., Mohammad, Y., Mohammad Gholi Mezerji, N., Mohammadian-Hafshejani, A., Mohammadifard, N., Mohammadpourhodki, R., Mohammed, A. S., Mohammed, H., Mohammed, J. A., Mohammed, S., Mokdad, A. H., Molokhia, M., Monasta, L., Mooney, M. D., Moradi, G., Moradi, M., Moradi-Lakeh, M., Moradzadeh, R., Moraga, P., Morawska, L., Morgado-da-Costa, J., Morrison, S. D., Mosapour, A., Mosser, J. F., Mouodi, S., Mousavi, S. M., Mousavi Khaneghah, A., Mueller, U. O., Mukhopadhyay, S., Mullany, E. C., Musa, K. I., Muthupandian, S., Nabhan, A. F., Naderi, M., Nagarajan, A. J., Nagel, G., Naghavi, M., Naghshtabrizi, B., Naimzada, M. D., Najafi, F., Nangia, V., Nansseu, J. R., Naserbakht, M., Nayak, V. C., Negoi, I., Ngunjiri, J. W., Nguyen, C. T., Nguyen, H. L. T., Nguyen, M., Nigatu, Y. T., Nikbakhsh, R., Nixon, M. R., Nnaji, C. A., Nomura, S., Norrving, B., Noubiap, J. J., Nowak, C., Nunez-Samudio, V., Oţoiu, A., Oancea, B., Odell, C. M., Ogbo, F. A., Oh, I.-H., Okunga, E. W., Oladnabi, M., Olagunju, A. T., Olusanya, B. O., Olusanya, J. O., Omer, M. O., Ong, K. L., Onwujekwe, O. E., Orpana, H. M., Ortiz, A., Osarenotor, O., Osei, F. B., Ostroff, S. M., Otstavnov, N., Otstavnov, S. S., Øverland, S., Owolabi, M. O., P A, M., Padubidri, J. R., Palladino, R., Panda-Jonas, S., Pandey, A., Parry, C. D. H., Pasovic, M., Pasupula, D. K., Patel, S. K., Pathak, M., Patten, S. B., Patton, G. C., Pazoki Toroudi, H., Peden, A. E., Pennini, A., Pepito, V. C. F., Peprah, E. K., Pereira, D. M., Pesudovs, K., Pham, H. Q., Phillips, M. R., Piccinelli, C., Pilz, T. M., Piradov, M. A., Pirsaheb, M., Plass, D., Polinder, S., Polkinghorne, K. R., Pond, C. D., Postma, M. J., Pourjafar, H., Pourmalek, F., Poznańska, A., Prada, S. I., Prakash, V., Pribadi, D. R. A., Pupillo, E., Quazi Syed, Z., Rabiee, M., Rabiee, N., Radfar, A., Rafiee, A., Raggi, A., Rahman, M. A., Rajabpour-Sanati, A., Rajati, F., Rakovac, I., Ram, P., Ramezanzadeh, K., Ranabhat, C. L., Rao, P. C., Rao, S. J., Rashedi, V., Rathi, P., Rawaf, D. L., Rawaf, S., Rawal, L., Rawassizadeh, R., Rawat, R., Razo, C., Redford, S. B., Reiner, R. C., Jr., Reitsma, M. B., Remuzzi, G., Renjith, V., Renzaho, A. M. N., Resnikoff, S., Rezaei, N., Rezaei, N., Rezapour, A., Rhinehart, P.-A., Riahi, S. M., Ribeiro, D. C., Ribeiro, D., Rickard, J., Rivera, J. A., Roberts, N. L. S., Rodríguez-Ramírez, S., Roever, L., Ronfani, L., Room, R., Roshandel, G., Roth, G. A., Rothenbacher, D., Rubagotti, E., Rwegerera, G. M., Sabour, S., Sachdev, P. S., Saddik, B., Sadeghi, E., Sadeghi, M., Saeedi, R., Saeedi Moghaddam, S., Safari, Y., Safi, S., Safiri, S., Sagar, R., Sahebkar, A., Sajadi, S. M., Salam, N., Salamati, P., Salem, H., Salem, M. R. R., Salimzadeh, H., Salman, O. M., Salomon, J. A., Samad, Z., Samadi Kafil, H., Sambala, E. Z., Samy, A. M., Sanabria, J., Sánchez-Pimienta, T. G., Santomauro, D. F., Santos, I. S., Santos, J. V., Santric-Milicevic, M. M., Saraswathy, S. Y. I., Sarmiento-Suárez, R., Sarrafzadegan, N., Sartorius, B., Sarveazad, A., Sathian, B., Sathish, T., Sattin, D., Saxena, S., Schaeffer, L. E., Schiavolin, S., Schlaich, M. P., Schmidt, M. I., Schutte, A. E., Schwebel, D. C., Schwendicke, F., Senbeta, A. M., Senthilkumaran, S., Sepanlou, S. G., Serdar, B., Serre, M. L., Shadid, J., Shafaat, O., Shahabi, S., Shaheen, A. A., Shaikh, M. A., Shalash, A. S., Shams-Beyranvand, M., Shamsizadeh, M., Sharafi, K., Sheikh, A., Sheikhtaheri, A., Shibuya, K., Shield, K. D., Shigematsu, M., Shin, J. I., Shin, M.-J., Shiri, R., Shirkoohi, R., Shuval, K., Siabani, S., Sierpinski, R., Sigfusdottir, I. D., Sigurvinsdottir, R., Silva, J. P., Simpson, K. E., Singh, J. A., Singh, P., Skiadaresi, E., Skou, S. T., Skryabin, V. Y., Smith, E. U. R., Soheili, A., Soltani, S., Soofi, M., Sorensen, R. J. D., Soriano, J. B., Sorrie, M. B., Soshnikov, S., Soyiri, I. N., Spencer, C. N., Spotin, A., Sreeramareddy, C. T., Srinivasan, V., Stanaway, J. D., Stein, C., Stein, D. J., Steiner, C., Stockfelt, L., Stokes, M. A., Straif, K., Stubbs, J. L., Sufiyan, M. a. B., Suleria, H. A. R., Suliankatchi Abdulkader, R., Sulo, G., Sultan, I., Szumowski, Ł., Tabarés-Seisdedos, R., Tabb, K. M., Tabuchi, T., Taherkhani, A., Tajdini, M., Takahashi, K., Takala, J. S., Tamiru, A. T., Taveira, N., Tehrani-Banihashemi, A., Temsah, M.-H., Tesema, G. A., Tessema, Z. T., Thurston, G. D., Titova, M. V., Tohidinik, H. R., Tonelli, M., Topor-Madry, R., Topouzis, F., Torre, A. E., Touvier, M., Tovani-Palone, M. R. R., Tran, B. X., Travillian, R., Tsatsakis, A., Tudor Car, L., Tyrovolas, S., Uddin, R., Umeokonkwo, C. D., Unnikrishnan, B., Upadhyay, E., Vacante, M., Valdez, P. R., van Donkelaar, A., Vasankari, T. J., Vasseghian, Y., Veisani, Y., Venketasubramanian, N., Violante, F. S., Vlassov, V., Vollset, S. E., Vos, T., Vukovic, R., Waheed, Y., Wallin, M. T., Wang, Y., Wang, Y.-P., Watson, A., Wei, J., Wei, M. Y. W., Weintraub, R. G., Weiss, J., Werdecker, A., West, J. J., Westerman, R., Whisnant, J. L., Whiteford, H. A., Wiens, K. E., Wolfe, C. D. A., Wozniak, S. S., Wu, A.-M., Wu, J., Wulf Hanson, S., Xu, G., Xu, R., Yadgir, S., Yahyazadeh Jabbari, S. H., Yamagishi, K., Yaminfirooz, M., Yano, Y., Yaya, S., Yazdi-Feyzabadi, V., Yeheyis, T. Y., Yilgwan, C. S., Yilma, M. T., Yip, P., Yonemoto, N., Younis, M. Z., Younker, T. P., Yousefi, B., Yousefi, Z., Yousefinezhadi, T., Yousuf, A. Y., Yu, C., Yusefzadeh, H., Zahirian Moghadam, T., Zamani, M., Zamanian, M., Zandian, H., Zastrozhin, M. S., Zhang, Y., Zhang, Z.-J., Zhao, J. T., Zhao, X.-J. G., Zhao, Y., Zhou, M., Ziapour, A., Zimsen, S. R. M., Brauer, M., Afshin, A., and Lim, S. S.: Global burden of 87 risk factors in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019, Lancet, 396, 1223–1249,, 2020. 

Nolte, C. G., Spero, T. L., Bowden, J. H., Mallard, M. S., and Dolwick, P. D.: The potential effects of climate change on air quality across the conterminous US at 2030 under three Representative Concentration Pathways, Atmos. Chem. Phys., 18, 15471–15489,, 2018. 

Pozzer, A., Anenberg, S. C., Dey, S., Haines, A., Lelieveld, J., and Chowdhury, S.: Mortality Attributable to Ambient Air Pollution: A Review of Global Estimates, GeoHealth, 7, e2022GH000711,, 2023. 

Rasmussen, D. J., Fiore, A. M., Naik, V., Horowitz, L. W., McGinnis, S. J., and Schultz, M. G.: Surface ozone-temperature relationships in the eastern US: A monthly climatology for evaluating chemistry-climate models, Atmos. Environ., 47, 142–153,, 2012. 

Rieder, H. E., Fiore, A. M., Clifton, O. E., Correa, G., Horowitz, L. W., and Naik, V.: Combining model projections with site-level observations to estimate changes in distributions and seasonality of ozone in surface air over the U.S.A., Atmos. Environ., 193, 302–315,, 2018. 

Rieder, H. E., Fiore, A. M., Horowitz, L. W., and Naik, V.: Projecting policy-relevant metrics for high summertime ozone pollution events over the eastern United States due to climate and emission changes during the 21st century, J. Geophys. Res.-Atmos., 120, 784–800,, 2015. 

Schnell, J. L., Holmes, C. D., Jangam, A., and Prather, M. J.: Skill in forecasting extreme ozone pollution episodes with a global atmospheric chemistry model, Atmos. Chem. Phys., 14, 7721–7739,, 2014. 

Schnell, J. L. and Prather, M. J.: Co-occurrence of extremes in surface ozone, particulate matter, and temperature over eastern North America, P. Natl. Acad. Sci. USA, 114, 2854–2859,, 2017. 

Schnell, J. L., Prather, M. J., Josse, B., Naik, V., Horowitz, L. W., Cameron-Smith, P., Bergmann, D., Zeng, G., Plummer, D. A., Sudo, K., Nagashima, T., Shindell, D. T., Faluvegi, G., and Strode, S. A.: Use of North American and European air quality networks to evaluate global chemistry–climate modeling of surface ozone, Atmos. Chem. Phys., 15, 10581–10596,, 2015. 

Shen, L., Jacob, D. J., Zhu, L., Zhang, Q., Zheng, B., Sulprizio, M. P., Li, K., De Smedt, I., González Abad, G., Cao, H., Fu, T.-M., and Liao, H.: The 2005–2016 Trends of Formaldehyde Columns Over China Observed by Satellites: Increasing Anthropogenic Emissions of Volatile Organic Compounds and Decreasing Agricultural Fire Emissions, Geophys. Res. Lett., 46, 4468–4475,, 2019. 

Sillman, S.: The relation between ozone, NOx and hydrocarbons in urban and polluted rural environments, Atmos. Environ., 33, 1821–1845,, 1999. 

Sillman, S., Logan, J. A., and Wofsy, S. C.: The sensitivity of ozone to nitrogen oxides and hydrocarbons in regional ozone episodes, J. Geophys. Res.-Atmos., 95, 1837–1851,, 1990. 

Staehle, C.: Gridded MDA8 surface ozone observations for the EU and US during 1993–2014, Zenodo [data set],, 2024. 

Stohl, A., Bonasoni, P., Cristofanelli, P., Collins, W., Feichter, J., Frank, A., Forster, C., Gerasopoulos, E., Gäggeler, H., James, P., Kentarchos, T., Kromp-Kolb, H., Krüger, B., Land, C., Meloen, J., Papayannis, A., Priller, A., Seibert, P., Sprenger, M., Roelofs, G. J., Scheel, H. E., Schnabel, C., Siegmund, P., Tobler, L., Trickl, T., Wernli, H., Wirth, V., Zanis, P., and Zerefos, C.: Stratosphere-troposphere exchange: A review, and what we have learned from STACCATO, J. Geophys. Res.-Atmos., 108, 8516,, 2003.  

Turnock, S. T., Allen, R. J., Andrews, M., Bauer, S. E., Deushi, M., Emmons, L., Good, P., Horowitz, L., John, J. G., Michou, M., Nabat, P., Naik, V., Neubauer, D., O'Connor, F. M., Olivié, D., Oshima, N., Schulz, M., Sellar, A., Shim, S., Takemura, T., Tilmes, S., Tsigaridis, K., Wu, T., and Zhang, J.: Historical and future changes in air pollutants from CMIP6 models, Atmos. Chem. Phys., 20, 14547–14579,, 2020. 

Westervelt, D. M., Ma, C. T., He, M. Z., Fiore, A. M., Kinney, P. L., Kioumourtzoglou, M. A., Wang, S., Xing, J., Ding, D., and Correa, G.: Mid-21st century ozone air quality and health burden in China under emissions scenarios and climate change, Environ. Res. Lett., 14, 074030,, 2019. 

Young, P. J., Naik, V., Fiore, A. M., Gaudel, A., Guo, J., Lin, M. Y., Neu, J. L., Parrish, D. D., Rieder, H. E., Schnell, J. L., Tilmes, S., Wild, O., Zhang, L., Ziemke, J., Brandt, J., Delcloo, A., Doherty, R. M., Geels, C., Hegglin, M. I., Hu, L., Im, U., Kumar, R., Luhar, A., Murray, L., Plummer, D., Rodriguez, J., Saiz-Lopez, A., Schultz, M. G., Woodhouse, M. T., and Zeng, G.: Tropospheric Ozone Assessment Report: Assessment of global-scale model performance for global and regional ozone distributions, variability, and trends, Elementa, 6, 49 pp.,, 2018. 

Zhang, J., Wei, Y., and Fang, Z.: Ozone Pollution: A Major Health Hazard Worldwide, Frontiers in Immunology, 10, (last access: 1 March 2024), 2019. 

Zhu, L., Jacob, D. J., Keutsch, F. N., Mickley, L. J., Scheffe, R., Strum, M., González Abad, G., Chance, K., Yang, K., Rappenglück, B., Millet, D. B., Baasandorj, M., Jaeglé, L., and Shah, V.: Formaldehyde (HCHO) As a Hazardous Air Pollutant: Mapping Surface Air Concentrations from Satellite and Inferring Cancer Risks in the United States, Environ. Sci. Technol., 51, 5650–5657,, 2017. 

Short summary
Chemistry–climate models show biases compared to surface ozone observations and thus require bias correction for impact studies and the assessment of air quality changes. We compare the performance of commonly used correction techniques for model outputs available via CMIP6. While all methods can reduce model biases, better results are obtained from more complex approaches. Thus, our study suggests broader use of these techniques in studies seeking to inform air quality management and policy.
Final-revised paper