Applying deep learning to a chemistry-climate model for improved ozone prediction

Liu, Zhenze; Li, Ke; Wild, Oliver; Doherty, Ruth M.; O’Connor, Fiona M.; Turnock, Steven T.

doi:10.5194/acp-25-16969-2025

Articles | Volume 25, issue 22

https://doi.org/10.5194/acp-25-16969-2025

Articles | Volume 25, issue 22

Research article

27 Nov 2025

Research article |

| 27 Nov 2025

Applying deep learning to a chemistry-climate model for improved ozone prediction

Zhenze Liu, Ke Li, Oliver Wild, Ruth M. Doherty, Fiona M. O’Connor, and Steven T. Turnock

Abstract

Chemistry-climate models have developed significantly over the decades, yet they still exhibit substantial systematic biases in simulating atmospheric composition due to gaps in our understanding of underlying processes. Building on deep learning's success in different domains, we explore its application to correct surface ozone biases in the state-of-the-art chemistry-climate model UKESM1. Six statistical models have been developed, and the model Transformer outperforms others due to its advanced architecture. A simple weighted ensemble approach is further proved to enhance performance by 14 % over the best single model Transformer, reducing RMSE to 0.69 ppb. Applied to future scenarios (SSP3-7.0 and SSP3-7.0-lowNTCF), the UKESM1 shows a larger overestimation of ozone changes by up to 25 ppb compared to present-day conditions. Despite biases, UKESM1 captures the non-linear ozone sensitivity to precursors, with temperature-sensitive processes identified as a dominant contributor to biases. We highlight that simulations of future surface ozone are likely to become less accurate under a warmer climate. Therefore, the bias correction approaches introduced here have substantial potential to improve the accuracy of ozone impact assessments. These methods are also applicable to other chemistry-climate models, which is critical for informing air quality and climate policy decisions.

Download & links

Article (PDF, 5050 KB)

Download & links

How to cite.

Received: 17 Mar 2025 – Discussion started: 10 Jun 2025 – Revised: 23 Sep 2025 – Accepted: 16 Oct 2025 – Published: 27 Nov 2025

1 Introduction

Global chemistry-climate models are vital for simulating atmospheric composition and its changes by representing the relevant physical and chemical processes in the atmosphere. However, these models face challenges in accurately reproducing observed concentrations of short-lived species, such as ozone (O₃). Global models typically have coarse spatial resolution, and this limitation hampers the representation of small-scale processes, leading to systematic biases in simulations (Stock et al., 2014; Fenech et al., 2018). There are currently no simple methods to address these issues effectively without increasing model resolution. However, higher resolution significantly increases computational demands. Besides, increasing model resolution does not consistently improve accuracy, sometimes even introducing new biases (Wild and Prather, 2006; Iles et al., 2020). Moreover, evaluating model performance is challenging due to uncertainties in comparing grid-scale outputs with localized, site-based measurements (Schultz et al., 2017).

Considering these issues, surface ozone simulations in current global chemistry-climate models exhibit notable biases, particularly at regional scales (Turnock et al., 2020). Although large-scale ozone distributions are generally well-captured (Fleming et al., 2018; Griffiths et al., 2021), regional ozone concentrations remain challenging to reproduce, especially at the surface where precursor emissions and surface deposition exert strong influences. The assessment of the Tropospheric Ozone Assessment Report (TOAR) also reported that global models exhibit systematic biases in their surface ozone simulations across all seasons, with a multi-model mean bias of 7.7 ppb (approximately a 20 % overestimation) in the Northern Hemisphere (Young et al., 2018). These biases may stem from inadequate representation of dynamics (e.g., meteorology and deposition), and oversimplified ozone chemistry (Archibald et al., 2020 a). However, efforts to improve individual modules, such as chemistry schemes, can even result in greater biases in ozone simulation (Archer-Nicholls et al., 2021). Progress in addressing these issues has been limited over recent decades (Revell et al., 2018; Wild et al., 2020).

Deep learning, a transformative approach in fields like computer vision and natural language processing (LeCun et al., 2015), is increasingly applied in physical science (Reichstein et al., 2019). Recent studies have demonstrated its growing use in atmospheric science. It has shown promise in weather modeling and data generation. Specific applications include mimicking atmospheric photochemical processes (Xing et al., 2022), and directly predicting future weather (Bi et al., 2023; Lam et al., 2023), often outperforming traditional numerical methods in speed and accuracy. The uncertain parameterizations e.g., moist physics and radiation processes in climate models can also be replaced by deep learning models (Wang et al., 2022). Another key advantage of deep learning is its ability to fuse multi-source data, enabling the creation of global datasets, such as surface ozone concentrations (Betancourt et al., 2022). However, its application to air pollution modeling, particularly for ozone, is challenging due to the localized nature of pollution and limited observational data for key variables. To address this, we adopt a hybrid approach, integrating process-based chemistry-climate models with deep learning to improve the accuracy of ozone simulations.

Bias correction, as a way to further improve model accuracy, has been developed for different goals. For instance, Vaittinada Ayar et al. (2021) aim to distinguish the impacts of different uncertainties (e.g., emissions, scenario, model designs, etc.) on model biases. Vrac and Friederichs (2015) and Nivron et al. (2024) focus on preserving temporal properties in bias correction, such as the frequency of heatwaves over long periods. Machine learning has also been applied to surface ozone bias correction (e.g., Ivatt and Evans, 2020; Miyazaki et al., 2025), achieving substantial error reduction. However, most studies have not fully explored the performance of different deep learning approaches, and their impacts on prediction remain uncertain. In our work, we therefore explore several deep learning models for ozone bias correction and propose a weighted ensemble to achieve more robust results.

In this study we investigate the potential of deep learning to correct surface ozone biases in a global chemistry-climate model. In Sect. 2, we describe the chemistry-climate model and introduce six statistical models used for bias correction. Section 3 evaluates their performance and proposes a weighting scheme to optimize results. Section 4 demonstrates the advantages of this approach for projecting future surface ozone changes. In Sect. 5, we analyze the sensitivity of ozone in both the original and bias-corrected models. Finally, Sect. 6 presents our conclusions.

2 Approach

2.1 Chemistry–climate model and experiments

We use version 1 of the United Kingdom Earth System Model (UKESM1; Sellar et al. (2019)) to simulate present-day (2004–2014) and future (2045–2055) surface O₃ mixing ratios under different emission and climate scenarios. UKESM1 incorporates a physical climate model, the Hadley Centre Global Environment Model version 3 (HadGEM3), configured with the Global Atmosphere 7.1 and Global Land 7.0 (GA7.1/GL7.0; Walters et al., 2019). Chemistry is simulated using the state-of-the-art United Kingdom Chemistry and Aerosol module (UKCA; O'Connor et al., 2014), which includes a unified stratosphere–troposphere gas-phase chemistry scheme (StratTrop; Archibald et al., 2020 b). In this study, an extended version of this chemistry scheme incorporating additional reactive volatile organic compounds (VOCs) is employed to improve the representation of O₃ production (Liu et al., 2021). The model resolution is N96L85 in the atmosphere, with 1.875° in longitude by 1.25° in latitude, 85 terrain-following hybrid height layers, and a model top at 85 km. The model is nudged with ERA-Interim reanalyses every 6 h for present-day simulations.

In our present-day simulations (2004–2014), we use anthropogenic (Hoesly et al., 2018) and biomass (van Marle et al., 2017) emissions from the Coupled-Model Intercomparison Project Phase 6 (CMIP6; Eyring et al., 2016). Biogenic VOC emissions are calculated online in the Joint UK Land Environmental Simulator (JULES) land-surface scheme (Eyring et al., 2016). For future simulations (2045–2055), we use the shared socio-economic pathways (SSP; O’Neill et al., 2014), which represent various trajectories for emission and climate policies, considering social, economic and environmental development (Rao et al., 2017). We select the SSP3-7.0 and SSP3-7.0-lowNTCF pathways to illustrate the effects of weaker and stronger air pollutant emission controls, respectively. Both pathways anticipate a warmer and more humid climate, although SSP3-7.0-lowNTCF includes significant reductions in anthropogenic emissions of near-term climate forcers (NTCF), such as O₃ precursors and aerosols. Details of the present-day and future emissions under SSP3-7.0 and SSP3-7.0-lowNTCF are provided in Liu et al. (2022 b). Other emissions, including sea salt, dust, and lightning NO_x, are the same as those used in UKESM1 simulations for CMIP6 (Turnock et al., 2020). The atmosphere-only configuration of UKESM1 is applied with prescribed sea surface temperatures and sea ice to examine the transient impacts of emissions under present-day and future climates.

2.2 Six approaches for O₃ bias correction

Surface ozone concentrations are typically underestimated in winter and overestimated in summer when simulated with UKESM1 (Archibald et al., 2020 b). The biases with consistently high values across all seasons, are also observed in other chemistry-climate models used in CMIP6 (Young et al., 2018; Turnock et al., 2020). However, the underlying reasons for these biases in each model remain unclear. Reliable model outputs can still be achieved by bias correction, as long as the systematic biases are mitigated. Our goal is to correct these biases directly through different statistical and deep learning methods. We assume that these systematics errors are specific to different process-based models but can be learned from historical data using statistical approaches, and further infer how large the biases will be in future scenarios. The model is nudged for historical runs, so the systematic errors would only represent those caused by parameterizations of internal processes, rather than by external data sources such as meteorology.

As a reference dataset for correcting O₃ simulated with UKESM1, we consider surface O₃ reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF) Atmospheric Composition Reanalysis 4 (EAC4) under the Copernicus Atmosphere Monitoring Service (CAMS; Inness et al., 2019). One advantage of the CAMS reanalysis is its better agreement with TOAR O₃ observations, exhibiting mean seasonal biases of about 3 ppb, notably lower than the biases of up to 16 ppb in UKESM1 at locations where TOAR observations are available (Turnock et al., 2020). A comparison and evaluation of UKESM1, CAMS, and TOAR has been conducted in Liu et al. (2022 a). However, we note that CAMS still has biases, especially in regions with sparse observations (e.g., East Asia, Southeast Asia; Huijnen et al., 2020). These limitations may propagate to our corrections; however, CAMS data is still a suitable benchmark for demonstrating our methodology due to its lower biases. In addition, the spatial scale of these data closely aligns with the output of UKESM1, thereby mitigating uncertainties related to the spatial representativeness of sparse observations. We note that the large volume of the dataset, providing global coverage, is crucial for training deep learning models.Future applications of bias correction could be replaced by a measurement-based surface O₃ climatology if this becomes available in future.

Here we apply six approaches to calculate surface O₃ biases. Figure 1 illustrates the increasing complexity of these methods from left to right, starting with multiple linear regression (MLR), random forest (RF), multilayer perceptron (MLP), convolutional neural network (CNN), residual network (ResNet) and Transformer. MLR is a linear method, while RF transforms linear processes into nonlinear ones through decision tree-based layers. MLP forms the basis of deep learning, incorporating a feed-forward neural network (FFN). CNN uses convolutional operators as encoders, which are particularly effective for processing two-dimensional data, such as images. ResNet is an architecture that enables the training of deep learning models with multiple layers, addressing challenges that were prevalent during the early development of deep learning (He et al., 2016). The Transformer, a more recent architecture, demonstrates strong capabilities in processing long-sequence tasks, such as natural language understanding, with its core functionality driven by the Attention mechanism (Vaswani et al., 2017).

https://acp.copernicus.org/articles/25/16969/2025/acp-25-16969-2025-f01

Figure 1The architectures of MLR, RF, MLP, CNN, ResNet and Transformer applied in this study for calculating surface O₃ biases. Each diagram illustrates the workflow, beginning with the input of features to the prediction of O₃ biases. MLR, RF, and MLP receive input features from a single model grid cell (1×1), whereas the remaining models process features from a 9×9 block of grid cells.

Download

We assume that UKESM1 exhibits systematic biases that are associated with other self-generated variables. The main variables relevant to ozone production and transport are selected as follows (Liu et al., 2022 a). We use 20 physical, meteorological, and chemical variables as features, including location, season, temperature, humidity, wind speed, photolysis and deposition rates, and concentrations of key precursors. For MLR, RF, and MLP, the features and O₃ biases corresponding to the same model grid cell are used to train the different approaches. For the methods designed to process 2D data, input pairs consist of a 9×9 grid cell patch centered around the grid cell where O₃ biases are to be calculated. We also calculate the ensemble mean of all models to optimize predictions.

The feature data are obtained from UKESM1 simulations, and surface O₃ biases are derived from the differences between UKESM1 simulations and the CAMS reanalysis. Monthly mean O₃ mixing ratios from the lowest layer in UKESM1 are used. The dataset is split into 80 % for training, 10 % for validation, and 10 % for testing, with approximately 2.9 million data samples used for model training. We choose mean absolute error as the loss function and AdamW as the optimizer to minimize it. To increase model regularization, a weight decay value of 0.001 is applied to constrain the size of parameter weights. The initial learning rate is set to 0.01, with a cosine annealing schedule for dynamic adjustment of learning rates to improve training (Loshchilov and Hutter, 2016). As the complexity of models increases, so does the number of parameters; however, we limit the number of trainable parameters in our most complex model, the Transformer, to 9 million to manage computational resources. The Transformer requires approximately 8 h to converge on a single GPU (RTX 3090 Ti).

Table 1Overview of the 20 input features used for model training.

Download Print Version | Download XLSX

3 Statistical model evaluation and the weighting scheme

The performance of all 6 statistical models is evaluated using testing data to give an independent assessment, see Fig. 2. All models generally simulate the surface ozone biases in UKESM1 effectively, capturing both underestimations and overestimations. However, the deep learning models (Fig. 2c–f) clearly outperform the simpler linear and random forest models (Fig. 2a, b). A primary limitation of the linear and random forest models is their inability to capture extreme bias values, with many predictions clustering around 0 ppb. Overall, the systematic biases are smoothly distributed with a mean near 0, indicating that underestimations and overestimations occur with comparable frequency in UKESM1.

Both the ResNet and Transformer approaches perform best, with their predictions closely aligning with the 1:1 line across the full range of biases. These models yield higher correlation coefficients (up to 0.997) and lower root-mean-square errors (RMSE). From MLP to Transformer, the error is reduced by 64 % from 2.25 to 0.8 ppb, highlighting the importance of architecture in this task. However, the improvement from convolution-based models (CNN and ResNet) to the Transformer is marginal. In the deep learning field, the optimal architecture for processing 2D data, whether convolution-based or attention-based, remains a subject of ongoing debate (Smith et al., 2023).

https://acp.copernicus.org/articles/25/16969/2025/acp-25-16969-2025-f02

Figure 2Evaluation of the models’ performance in simulating monthly mean surface O₃ biases at each UKESM1 grid point, based on testing data. (a) Surface O₃ biases (UKESM1 minus CAMS) and biases predicted by the models. (b) Probability density function of surface O₃ biases (labelled as “Reference”) and the predicted O₃ biases. Statistics are shown in the top-right corner of each panel.

Download

https://acp.copernicus.org/articles/25/16969/2025/acp-25-16969-2025-f03

Figure 3RMSE of the weighted-mean model in simulating surface O₃ biases as a function of the tuned parameter σ. The best-performing single model, Transformer, is indicated for comparison. The weights assigned to each model corresponding to the optimal value of σ are provided in the text. The sigma values are binned on a linear scale separately into the following ranges: 0.001–0.01, 0.01–0.1, 0.1–1, and 1–10.

Download

Given that we employ a variety of models, it is logical to consider combining them to reduce the uncertainties inherent in each. Previous studies have demonstrated that integrating multiple models can effectively decrease both uncertainties and prediction errors (Stevenson et al., 2006). However, assigning weights to each model based on their respective performances can produce a more robust outcome compared to simple averaging (Amos et al., 2020). Therefore, we adopt a simple weighted ensemble mean scheme, following the approach outlined by Amos et al. (2020). The calculation of the weights for each model i is presented as follows:

\begin{matrix} (1) & w_{i} = \frac{\exp (- \frac{D_{i}^{2}}{N_{i} σ^{2}}) \times 100}{\sum_{i} \exp (- \frac{D_{i}^{2}}{N_{i} σ^{2}})} \end{matrix}

Here, $D_{i}^{2}$ represents the squared error between the predictions of an individual model and the reference data, derived from the testing data. N_i denotes the number of testing data points. The parameter σ is adjustable and can be optimized to determine the most effective weight values. As illustrated in Fig. 3, the error of the weighted-mean model is lower than that of any single model, including the best-performing single model, Transformer, which exhibits an error of 0.80 ppb. The optimal value of σ=0.35 corresponds to the lowest error of the weighted-mean model (0.69 ppb), resulting in a 14 % improvement over the Transformer model. We note that the optimal value of σ may differ across various model ensembles. High-performing models, such as ResNet and Transformer, are assigned large weights, approximately 40 % each, while the CNN model has a weight of 17 %. Models with low performance are excluded due to their limited contribution. This demonstrates that a simple weighting scheme can effectively integrate the outputs of all models, and further improve prediction accuracy. The optimal weighted-mean predictions are used for subsequent analyses.

4 Improved assessment of future changes in surface O₃

Considering the expected biases in future simulations of surface O₃ using UKESM1, we employ deep learning models to predict these biases based on input variables generated from UKESM1 future simulations. Subsequently, a bias-corrected surface O₃ concentration is derived by subtracting the O₃ bias from the simulated O₃ values. Figure 4 illustrates seasonal variations in weighted-mean surface O₃ concentrations under SSP3-7.0 and SSP3-7.0-lowNTCF scenarios. Compared with bias-corrected results, UKESM1 simulations demonstate much higher global mean O₃ concentrations in summer and similar levels in winter (Fig. 4a, d, g, j). This indicates that the UKESM1 has a greater sensitivity of seasonal O₃ changes, showing a 12 ppb increase compared to the corrected 5 ppb. Higher emissions of O₃ precursors under SSP3-7.0 lead to higher surface O₃ mixing ratios compared to SSP3-7.0-lowNTCF, with differences of 4 ppb in summer and 1.5 ppb in winter (Fig. 4b, e, h, k). In addition, seasonal O₃ variation (winter to summer) becomes more pronounced under SSP3-7.0 (4.4 ppb increase; Fig. 4b, e) than under SSP3-7.0-lowNTCF (2.0 ppb increase; Fig. 4g, h), which is also observed in UKESM1 simulations. Decreased O₃ titration by NO in winter and lower photochemical O₃ production in summer in the lower-emission scenario will both contribute to a reduced seasonal variation.

https://acp.copernicus.org/articles/25/16969/2025/acp-25-16969-2025-f04

Figure 4Comparison of UKESM1 simulated surface O₃ mixing ratios (a, d, g, j) with weighted-mean bias-corrected results (b, e, h, k), and bias-corrected O₃ changes (c, f, i, l) from present day (PD; 2004–2014) to future (2045–2055) under SSP3-7.0 and SSP3-7.0-lowNTCF scenarios. Shown for June–July–August (JJA) and December–January–February (DJF), with hatched regions denoting where the sign of bias-corrected O₃ changes differs from those simulated with UKESM1. Global area-weighted mean mixing ratios are shown in the top-right corner of each panel.

Figure 4c, f, i, l shows the changes in surface O₃ from the present day to the future, as simulated by the bias-corrected weighted-mean model. It reveals that distinct emission pathways result in divergent O₃ responses. Under SSP3-7.0, surface O₃ mixing ratios exhibit a consistent increase across both seasons, whereas under SSP3-7.0-lowNTCF, a decrease is simulated. However, the magnitude of O₃ responses is greater under SSP3-7.0-lowNTCF compared to SSP3-7.0. At regional scales, substantial reductions in surface O₃ are shown in North America during summer (Fig. 4c, i), attributable to lower precursor emissions in both scenarios. In contrast, in East Asia, surface O₃ changes vary markedly between scenarios and seasons, driven primarily by differing O₃ chemical environments due to the current high local emissions. These variations pose significant challenges for addressing regional air pollution. Additionally, we compare surface O₃ changes with and without bias correction. While the direction of surface O₃ changes remains generally consistent across most continental regions, opposing signs emerge in certain oceanic areas. This discrepancy may stem from the limited availability of observational constraints in oceanic regions, which hinders both the development of process-based models and the reliable reference data for bias correction. Overall, the influence of different emission pathways on future O₃ concentrations are certain at large scales, particularly over land areas.

In Fig. 5, we further show regional surface O₃ changes from the present day to the future, and compare the predictions of UKESM1 with those derived from the bias-corrected weighted-mean model. Under both future scenarios, surface O₃ changes in most geographical regions fall in quadrants where the signs of the changes are the same, indicating that the effects of emission changes on future O₃ are generally robust. However, in the wintertime, there are differences in sign, especially in high-emission regions such as Asia (Fig. 5a and b) and North America (Fig. 5b). This suggests that the response of O₃ to its precursors, particularly in high-NO_x environments in winter is not well represented in current models. In contrast, there is broad agreement in the sign of O₃ changes in the summertime.

https://acp.copernicus.org/articles/25/16969/2025/acp-25-16969-2025-f05

Figure 5Seasonal changes in surface O₃ mixing ratios (in ppb) under (a) SSP3-7.0 and (b) SSP3-7.0-lowNTCF scenarios in different global regions, comparing bias-corrected changes with those from UKESM1 simulations. The error bars represent one standard deviation of the surface O₃ changes in the specified region. Markers in light colors denote regions where the magnitudes of biases in UKESM1 present-day simulations rank among the top three for the respective seasons.

Download

While the sign of O₃ changes is generally consistent between UKESM1 simulations and bias-corrected predictions, the magnitudes of these changes differ substantially. Under the SSP3-7.0 scenario (Fig. 5a), surface O₃ increases in most regions are greater in UKESM1 simulations than in bias-corrected estimates, with notably larger overestimations in regions such as North America and Europe during winter, where UKESM1-simulated increases exceed bias-corrected values by more than a factor of 2. This suggests that UKESM1 may overestimate surface O₃ increases. Similarly, under the SSP3-7.0-lowNTCF scenario (Fig. 5b), surface O₃ decreases in most regions are less pronounced in bias-corrected predictions compared to UKESM1 simulations, indicating an overestimation of O₃ reductions by UKESM1. These findings imply that the impacts of emission and climate policies on surface O₃ concentrations under both scenarios may be smaller than projected by UKESM1 simulations.

It is acknowledged that large uncertainties remain in these comparisons at regional scales, as the CAMS dataset exhibits substantial biases in certain regions when compared to the TOAR dataset, particularly in East Asia and Southeast Asia (Huijnen et al., 2020). In addition, we also find that there are notable discrepancies between CAMS and UKESM1 especially in regions where observations are unavailable, such as the Middle East (shown as light markers in Fig. 5). Therefore, in these regions exhibiting large biases in UKESM1 simulations, large differences in surface O₃ predictions between UKESM1 and bias-corrected UKESM1 also tend to be observed. Bias correction in these regions may lack reliability. Nevertheless, in North America and Europe, where the CAMS data are more consistent with TOAR observations, with biases of less than 10 % (Huijnen et al., 2020), the overestimation of surface O₃ changes by UKESM1 appears more substantiated.

https://acp.copernicus.org/articles/25/16969/2025/acp-25-16969-2025-f06

Figure 6Surface O₃ biases derived from the weighted-mean statistical models for the present day, SSP3-7.0 and SSP3-7.0-lowNTCF scenarios during (a) summer (JJA) and (b) winter (DJF). The biases are presented as a function of the corresponding surface NO_x mixing ratios (in ppb). The error bars represent one standard deviation of the O₃ biases within each NO_x bin.

Download

At the global scale, it is evident that UKESM1 simulations consistently overestimate surface O₃ changes during summer (Fig. 6a). In summer, surface O₃ biases peak at approximately 15–30 ppb for NO_x mixing ratios of 10–15 ppb, typically corresponding to polluted urban areas with large populations (Kephart et al., 2023). The SSP3-7.0 scenario exhibits the largest biases, followed by SSP3-7.0-lowNTCF. Both future scenarios, characterized by high or low emissions, show greater biases (up to 25 ppb) than the present-day scenario, suggesting that emissions are not the primary driver of these larger biases. In contrast, during winter (Fig. 6b), O₃ biases are generally lower. The SSP3-7.0 and SSP3-7.0-lowNTCF biases appear to shift from negative values in the present day to positive or near-zero values. These findings indicate that the underlying biases in surface O₃ simulations are likely to increase under both emission pathways in the future, presenting a challenge to accurately assessing the impacts of future emissions, particularly during summer.

5 Sensitivity analysis of surface O₃ and O₃ biases

Given that the chemical environment affects both the magnitude and sign of surface O₃ changes, it is important for models to accurately represent the non-linear responses of surface O₃ to its precursors. We integrate monthly mean data from all surface grid cells in both scenarios to derive a relationship between surface O₃ mixing ratios and NO_x $/$ VOC ratios as simulated by UKESM1, see Fig. 7. Additionally, we show the O₃ sensitivity to the NO_x $/$ VOC ratio using bias-corrected O₃ data for comparison. The NO_x $/$ VOC ratio is a simple but effective indicator that distinguishes high- and low-NO_x environments, which reflect different O₃ chemical regimes (Liu et al., 2022 b). We calculate NO_x concentrations by aggregating NO and NO₂ values, and VOC concentrations are calculated by summing the concentrations of all primary emitted non-methane VOC species.

We find that the NO_x $/$ VOC ratios corresponding to the peaks of surface O₃ concentrations are similar between corrected and uncorrected UKESM1 across different seasons (Fig. 7). The NO_x $/$ VOC ratio thresholds, which indicate the transition from NO_x-limited to VOC-limited O₃ production regimes, are higher in summer (1.0–2.0) than in winter (about 0.1). This demonstrates that UKESM1 effectively captures the seasonal variation in critical NO_x $/$ VOC ratios. The chemical mechanism of UKESM1 accurately represents this transition. In addition, we see that as the NO_x $/$ VOC ratio increases, the differences between corrected and uncorrected surface O₃ concentrations become more pronounced in summer, but this is less apparent in winter. This suggests that biases in O₃ simulations are amplified under two specific conditions: (1) in regions with high NO_x levels, such as polluted environments, and (2) in warmer climates, such as during summer. It is noteworthy that NO_x $/$ VOC thresholds may vary across different chemistry-climate models; however, analyzing O₃ sensitivity to these ratios provides valuable insights into model limitations.

https://acp.copernicus.org/articles/25/16969/2025/acp-25-16969-2025-f07

Figure 7Relationship between surface O₃ mixing ratios (mean per bin) and the NO_x $/$ VOC ratio (in ppb ppb⁻¹) in different seasons, as simulated by UKESM1 and bias-corrected UKESM1. Data are aggregated from monthly means across all global surface grid cells and binned by NO_x $/$ VOC ratio. Vertical lines denote the NO_x $/$ VOC ratios corresponding to the maximum surface O₃ concentrations.

Download

We further investigate the sensitivity of surface O₃ biases to different input variables in the statistical models, usually termed the “feature importance”, see Fig. 8. This is calculated as the response of the O₃ bias to a minor perturbation (10 %) in each variable, then normalized across all variables and expressed as a percentage. Figure 8 shows the feature importance of the eight most influential variables. It reveals that temperature is the primary contributor to O₃ biases, associated with the overestimation of O₃ in summer, as demonstrated in Fig. 7. While other variables also play a role, their impacts are substantially less pronounced than that of temperature. This suggests that temperature-sensitive processes are likely the dominant source of O₃ biases in the model. Other physical variables, including photolysis rates, humidity, boundary layer height and dry deposition, are also associated with surface O₃ biases. Chemical species such as hydroxyl radicals (OH) and peroxyacetyl nitrate (PAN), which are linked to the oxidation of O₃ precursors and regional transport, play a notable role in influencing these biases. While deep learning models highlight the importance of these variables, simpler statistical models, such as MLR and RF, show little sensitivity to them. This suggests that simpler models tend to overemphasize the most dominant variables, whereas complex models may overdistribute feature importance across a broader range of variables. Furthermore, we find that the positive or negative values of feature importance are generally consistent with physical expectations. For example, an increase in the NO₂ photolysis rate, j(NO₂), enhances O₃ production and tend to result in higher O₃ biases, which is hence reflected by the positive feature importance of j(NO₂). In contrast, an increase in the O(¹D) photolysis rate, j(O₁D), promotes O₃ destruction and leads to lower O₃ biases, which is reflected by its negative feature importance. Although MLR and RF models fail to capture these nuanced relationships, they remain useful for identifying the most influential variables. We highlight that the underlying causes of O₃ biases are complex; however, temperature consistently emerges as the dominant factor, potentially exerting a significant influence on the accuracy of O₃ simulations under future warmer climate conditions.

https://acp.copernicus.org/articles/25/16969/2025/acp-25-16969-2025-f08

Figure 8The importance of different input features to surface O₃ biases in each statistical model.

Download

6 Conclusions

We have successfully applied a range of statistical approaches to correct surface O₃ biases in UKESM1, a state-of-the-art chemistry-climate model. This model typically overestimates surface O₃ concentrations in summer and underestimates them in winter. While these model biases can be corrected using any of the statistical approaches, deep learning models significantly outperform traditional approaches such as multiple linear regression (MLR) and random forest (RF). Among the deep learning architectures, the residual network (ResNet) and Transformer models yield consistent results, with small differences between them. The convolutional neural network (CNN) also produces comparable predictions to ResNet and Transformer. We note that while complex models generally achieve higher prediction accuracy, the full potential of the Transformer architecture may not be fully realized in this task due to the specific nature of the task.

A simple weighted ensemble mean scheme is proposed, demonstrating an additional 14 % improvement in performance compared to the best individual approach, the Transformer model. To assess future changes in surface O₃, we apply bias correction to simulations generated by UKESM1. The signs of surface O₃ changes are generally consistent between corrected and uncorrected UKESM1. However, the magnitudes of these changes differ. Surface O₃ changes simulated by UKESM1 are typically overestimated in both seasons compared to the bias-corrected changes. Under the SSP3-7.0 scenario, the corrected global summer mean O₃ mixing ratios are projected to increase by 1.2 ppb, whereas under the SSP3-7.0-lowNTCF scenario, they are expected to decrease by 2.8 ppb. In winter, the corrected surface O₃ mixing ratios are projected to increase by 0.5 ppb under SSP3-7.0 and to decrease by 1.1 ppb under SSP3-7.0-lowNTCF.

The sensitivities of surface O₃ to its precursors are also investigated for both UKESM1 and the bias-corrected UKESM1. It reveals that UKESM1 effectively captures the seasonal differences in O₃ sensitivities, as represented by NO_x $/$ VOC ratios in different seasons. However, under high NO_x $/$ VOC conditions, UKESM1 notably overestimates O₃ concentrations, particularly during summer. This suggests that under warmer conditions in the future, UKESM1 tends to overestimate O₃ concentrations. This is further confirmed by examining the feature importance for simulated O₃ biases, which identifies temperature as the most important variable influencing these biases. Deep learning models also highlight the importance of other variables; however, their importance is considerably less substantial than that of temperature. This suggests that processes sensitive to temperature variations may have a pronounced influence on O₃ concentrations simulated by UKESM1.

Despite the demonstrated capabilities of deep learning models in capturing surface O₃ biases, we acknowledge that uncertainties remain, particularly regarding the use of CAMS data as a reference for model training. Nevertheless, this exploratory study tests the methodology’s feasibility and provides insights into mitigating uncertainties associated with approach selection. It establishes a robust foundation for the broader application of bias correction techniques, particularly through the integration of deep learning with chemistry-climate models. This integration presents a promising pathway for addressing systematic errors in chemistry-climate models, while also facilitating the diagnosis of the underlying causes of model biases. Bias correction techniques stand to gain from the increasing availability of high-quality observational data, with applications extending beyond O₃ to other atmospheric components. This will strengthen the robustness of assessments in regions where observations are currently lacking, ultimately producing more reliable projections of O₃ changes across different climate scenarios.

Data availability

The data generated in this study are available upon request.

Author contributions

All authors participated in designing the study. ZL conducted the UKESM1 simulations and deep learning analysis with KL. OW, RMD, FMO, and ST provided scientific guidance and interpretation of results. ZL drafted the manuscript, with contributions and revisions from all co-authors.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Financial support

Zhenze Liu thanks the National Natural Science Foundation of China (NSFC), the Natural Science Foundation of Jiangsu Province and the China Postdoctoral Science Foundation for funding under grants 42307140, SBK2023043946 and 2023M731749. Ke Li thanks the National Natural Science Foundation of China for funding under grant 42293323. Oliver Wild and Ruth M. Doherty thank the Natural Environment Research Council (NERC) for funding under grants NE/N006925/1, NE/N006976/1 and NE/N006941/1. Fiona M. O'Connor was supported by the Met Office Hadley Centre Climate Programme funded by BEIS and also acknowledges support from the EU Horizon 2020 Research Programme CRESCENDO (grant agreement number 641816). Steven Turnock would like to acknowledge support from the UK–China Research and Innovation Partnership Fund through the Met Office Climate Science for Service Partnership (CSSP) China as part of the Newton Fund.

Review statement

This paper was edited by Pedro Jimenez-Guerrero and reviewed by two anonymous referees.

References

Amos, M., Young, P. J., Hosking, J. S., Lamarque, J.-F., Abraham, N. L., Akiyoshi, H., Archibald, A. T., Bekki, S., Deushi, M., Jöckel, P., Kinnison, D., Kirner, O., Kunze, M., Marchand, M., Plummer, D. A., Saint-Martin, D., Sudo, K., Tilmes, S., and Yamashita, Y.: Projecting ozone hole recovery using an ensemble of chemistry–climate models weighted by model performance and independence, Atmos. Chem. Phys., 20, 9961–9977, https://doi.org/10.5194/acp-20-9961-2020, 2020. a, b

Archer-Nicholls, S., Abraham, N. L., Shin, Y. M., Weber, J., Russo, M. R., Lowe, D., Utembe, S. R., O'Connor, F. M., Kerridge, B., Latter, B., Siddans, R., Jenkin, M., Wild, O., and Archibald, A. T.: The Common Representative Intermediates Mechanism version 2 in the United Kingdom Chemistry and Aerosols Model, Journal of Advances in Modeling Earth Systems, 13, e2020MS002420, https://doi.org/10.1029/2020MS002420, 2021. a

Archibald, A. T., Neu, J. L., Elshorbany, Y. F., Cooper, O. R., Young, P. J., Akiyoshi, H., Cox, R. A., Coyle, M., Derwent, R. G., Deushi, M., Finco, A., Frost, G. J., Galbally, I. E., Gerosa, G., Granier, C., Griffiths, P. T., Hossaini, R., Hu, L., Jöckel, P., Josse, B., Lin, M. Y., Mertens, M., Morgenstern, O., Naja, M., Naik, V., Oltmans, S., Plummer, D. A., Revell, L. E., Saiz-Lopez, A., Saxena, P., Shin, Y. M., Shahid, I., Shallcross, D., Tilmes, S., Trickl, T., Wallington, T. J., Wang, T., Worden, H. M., and Zeng, G.: Tropospheric Ozone Assessment Report: A critical review of changes in the tropospheric ozone burden and budget from 1850 to 2100, Elementa: Science of the Anthropocene, 8, 34, https://doi.org/10.1525/elementa.2020.034, 2020a. a

Archibald, A. T., O'Connor, F. M., Abraham, N. L., Archer-Nicholls, S., Chipperfield, M. P., Dalvi, M., Folberth, G. A., Dennison, F., Dhomse, S. S., Griffiths, P. T., Hardacre, C., Hewitt, A. J., Hill, R. S., Johnson, C. E., Keeble, J., Köhler, M. O., Morgenstern, O., Mulcahy, J. P., Ordóñez, C., Pope, R. J., Rumbold, S. T., Russo, M. R., Savage, N. H., Sellar, A., Stringer, M., Turnock, S. T., Wild, O., and Zeng, G.: Description and evaluation of the UKCA stratosphere–troposphere chemistry scheme (StratTrop vn 1.0) implemented in UKESM1, Geosci. Model Dev., 13, 1223–1266, https://doi.org/10.5194/gmd-13-1223-2020, 2020b. a, b

Betancourt, C., Stomberg, T. T., Edrich, A.-K., Patnala, A., Schultz, M. G., Roscher, R., Kowalski, J., and Stadtler, S.: Global, high-resolution mapping of tropospheric ozone – explainable machine learning and impact of uncertainties, Geosci. Model Dev., 15, 4331–4354, https://doi.org/10.5194/gmd-15-4331-2022, 2022. a

Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., and Tian, Q.: Accurate medium‐range global weather forecasting with 3D neural networks, Nature, 619, 533–538, https://doi.org/10.1038/s41586-023-06185-3, 2023. a

Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geosci. Model Dev., 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016, 2016. a, b

Fenech, S., Doherty, R. M., Heaviside, C., Vardoulakis, S., Macintyre, H. L., and O'Connor, F. M.: The influence of model spatial resolution on simulated ozone and fine particulate matter for Europe: implications for health impact assessments, Atmos. Chem. Phys., 18, 5765–5784, https://doi.org/10.5194/acp-18-5765-2018, 2018. a

Fleming, Z. L., Doherty, R. M., von Schneidemesser, E., Malley, C. S., Cooper, O. R., Pinto, J. P., Colette, A., Xu, X., Simpson, D., Schultz, M. G., Lefohn, A. S., Hamad, S., Moolla, R., Solberg, S., and Feng, Z.: Tropospheric Ozone Assessment Report: Present-day ozone distribution and trends relevant to human health, Elementa: Science of the Anthropocene, 6, 12, https://doi.org/10.1525/elementa.273, 2018. a

Griffiths, P. T., Murray, L. T., Zeng, G., Shin, Y. M., Abraham, N. L., Archibald, A. T., Deushi, M., Emmons, L. K., Galbally, I. E., Hassler, B., Horowitz, L. W., Keeble, J., Liu, J., Moeini, O., Naik, V., O'Connor, F. M., Oshima, N., Tarasick, D., Tilmes, S., Turnock, S. T., Wild, O., Young, P. J., and Zanis, P.: Tropospheric ozone in CMIP6 simulations, Atmos. Chem. Phys., 21, 4187–4218, https://doi.org/10.5194/acp-21-4187-2021, 2021. a

He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 770–778, https://doi.org/10.1109/CVPR.2016.90, 2016. a

Hoesly, R. M., Smith, S. J., Feng, L., Klimont, Z., Janssens-Maenhout, G., Pitkanen, T., Seibert, J. J., Vu, L., Andres, R. J., Bolt, R. M., Bond, T. C., Dawidowski, L., Kholod, N., Kurokawa, J.-I., Li, M., Liu, L., Lu, Z., Moura, M. C. P., O'Rourke, P. R., and Zhang, Q.: Historical (1750–2014) anthropogenic emissions of reactive gases and aerosols from the Community Emissions Data System (CEDS), Geosci. Model Dev., 11, 369–408, https://doi.org/10.5194/gmd-11-369-2018, 2018. a

Huijnen, V., Miyazaki, K., Flemming, J., Inness, A., Sekiya, T., and Schultz, M. G.: An intercomparison of tropospheric ozone reanalysis products from CAMS, CAMS interim, TCR-1, and TCR-2, Geosci. Model Dev., 13, 1513–1544, https://doi.org/10.5194/gmd-13-1513-2020, 2020. a, b

Iles, C. E., Vautard, R., Strachan, J., Joussaume, S., Eggen, B. R., and Hewitt, C. D.: The benefits of increasing resolution in global and regional climate simulations for European climate extremes, Geosci. Model Dev., 13, 5583–5607, https://doi.org/10.5194/gmd-13-5583-2020, 2020. a

Inness, A., Ades, M., Agustí-Panareda, A., Barré, J., Benedictow, A., Blechschmidt, A.-M., Dominguez, J. J., Engelen, R., Eskes, H., Flemming, J., Huijnen, V., Jones, L., Kipling, Z., Massart, S., Parrington, M., Peuch, V.-H., Razinger, M., Remy, S., Schulz, M., and Suttie, M.: The CAMS reanalysis of atmospheric composition, Atmos. Chem. Phys., 19, 3515–3556, https://doi.org/10.5194/acp-19-3515-2019, 2019. a

Ivatt, P. D. and Evans, M. J.: Improving the prediction of an atmospheric chemistry transport model using gradient-boosted regression trees, Atmos. Chem. Phys., 20, 8063–8082, https://doi.org/10.5194/acp-20-8063-2020, 2020. a

Kephart, J. L., Gouveia, N., Rodriguez, D. A., Indvik, K., Alfaro, T., Texcalac-Sangrador, J. L., Miranda, J. J., Bilal, U., and Diez-Roux, A. V.: Ambient nitrogen dioxide in 47 187 neighbourhoods across 326 cities in eight Latin American countries: population exposures and associations with urban features, The Lancet Planetary Health, 7, e976–e984, https://doi.org/10.1016/S2542-5196(23)00237-1, 2023. a

Lam, R., Sanchez-Gonzalez, A., Willson, M., Wirnsberger, P., Fortunato, M., Alet, F., Ravuri, S., Ewalds, T., Eaton-Rosen, Z., Hu, W., Merose, A., Hoyer, S., Holland, G., Vinyals, O., Stott, J., Pritzel, A., Mohamed, S., and Battaglia, P.: Learning skillful medium-range global weather forecasting, Science, 382, 1416–1421, https://doi.org/10.1126/science.adi2336, 2023. a

LeCun, Y., Bengio, Y., and Hinton, G.: Deep learning, Nature, 521, 436–444, https://doi.org/10.1038/nature14539, 2015. a

Liu, Z., Doherty, R. M., Wild, O., Hollaway, M., and O’Connor, F. M.: Contrasting chemical environments in summertime for atmospheric ozone across major Chinese industrial regions: the effectiveness of emission control strategies, Atmos. Chem. Phys., 21, 10689–10706, https://doi.org/10.5194/acp-21-10689-2021, 2021. a

Liu, Z., Doherty, R. M., Wild, O., O'Connor, F. M., and Turnock, S. T.: Correcting ozone biases in a global chemistry–climate model: implications for future ozone, Atmos. Chem. Phys., 22, 12543–12557, https://doi.org/10.5194/acp-22-12543-2022, 2022a. a, b

Liu, Z., Doherty, R. M., Wild, O., O'Connor, F. M., and Turnock, S. T.: Tropospheric ozone changes and ozone sensitivity from the present day to the future under shared socio-economic pathways, Atmos. Chem. Phys., 22, 1209–1227, https://doi.org/10.5194/acp-22-1209-2022, 2022b. a, b

Loshchilov, I. and Hutter, F.: SGDR: Stochastic gradient descent with warm restarts, arXiv [preprint], https://doi.org/10.48550/arXiv.1608.03983, 2016. a

Miyazaki, K., Marchetti, Y., Montgomery, J., Lu, S., and Bowman, K.: Identifying drivers of surface ozone bias in global chemical reanalysis with explainable machine learning, Atmos. Chem. Phys., 25, 8507–8532, https://doi.org/10.5194/acp-25-8507-2025, 2025. a

Nivron, O., Wischik, D. J., Vrac, M., Shuckburgh, E., and Archibald, A. T.: A Temporal Stochastic Bias Correction using a Machine Learning Attention model, Environmental Data Science, 3, e36, https://doi.org/10.1017/eds.2024.42, 2024. a

O'Connor, F. M., Johnson, C. E., Morgenstern, O., Abraham, N. L., Braesicke, P., Dalvi, M., Folberth, G. A., Sanderson, M. G., Telford, P. J., Voulgarakis, A., Young, P. J., Zeng, G., Collins, W. J., and Pyle, J. A.: Evaluation of the new UKCA climate-composition model – Part 2: The Troposphere, Geosci. Model Dev., 7, 41–91, https://doi.org/10.5194/gmd-7-41-2014, 2014. a

O’Neill, B. C., Kriegler, E., Riahi, K., Ebi, K. L., Hallegatte, S., Carter, T. R., Mathur, R., and van Vuuren, D. P.: A new scenario framework for climate change research: the concept of shared socioeconomic pathways, Climatic Change, 122, 387–400, https://doi.org/10.1007/s10584-013-0905-2, 2014. a

Rao, S., Klimont, Z., Smith, S. J., van Dingenen, R., Dentener, F., Bouwman, L., Riahi, K., Amann, M., Bodirsky, B. L., van Vuuren, D. P., Aleluia Reis, L., Calvin, K., Drouet, L., Fricko, O., Fujimori, S., Gernaat, D., Havlík, P., Harmsen, M., Hasegawa, T., Heyes, C., Hilaire, J., Luderer, G., Stehfest, E., Strefler, J., van der Sluis, S., and Tavoni, M.: Future air pollution in the Shared Socio-economic Pathways, Global Environmental Change, 42, 346–358, https://doi.org/10.1016/j.gloenvcha.2016.05.012, 2017. a

Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., and Prabhat, A.: Deep learning and process understanding for data-driven Earth system science, Nature, 566, 195–204, https://doi.org/10.1038/s41586-019-0912-1, 2019. a

Revell, L. E., Stenke, A., Tummon, F., Feinberg, A., Rozanov, E., Peter, T., Abraham, N. L., Akiyoshi, H., Archibald, A. T., Butchart, N., Deushi, M., Jöckel, P., Kinnison, D., Michou, M., Morgenstern, O., O'Connor, F. M., Oman, L. D., Pitari, G., Plummer, D. A., Schofield, R., Stone, K., Tilmes, S., Visioni, D., Yamashita, Y., and Zeng, G.: Tropospheric ozone in CCMI models and Gaussian process emulation to understand biases in the SOCOLv3 chemistry–climate model, Atmos. Chem. Phys., 18, 16155–16172, https://doi.org/10.5194/acp-18-16155-2018, 2018. a

Schultz, M. G., Schröder, S., Lyapina, O., Cooper, O. R., Galbally, I. E., Petropavlovskikh, I., von Schneidemesser, E., Tanimoto, H., Elshorbany, Y. F., Naja, M., Seguel, R. J., Dauert, U., Eckhardt, P., Feigenspan, S., Fiebig, M., Hjellbrekke, A., Hong, Y., Kjeld, P. C., Koide, H., Lear, G., Tarasick, D., Ueno, M., Wallasch, M., Baumgardner, D., Chuang, M., Gillett, R., Lee, M., Molloy, S., Moolla, R., Wang, T., Sharps, K., Adame, J. A., Ancellet, G., Apadula, F., Artaxo, P., Barlasina, M. E., Bogucka, M., Bonasoni, P., Chang, L., Colomb, A., Cuevas-Agulló, E., Cupeiro, M., Degórska, A., Ding, A., Fröhlich, M., Frolova, M., Gadhavi, H., Gheusi, F., Gilge, S., González, M. Y., Gros, V., Hamad, S. H., Helmig, D., Henriques, D., Hermansen, O., Holla, R., Hueber, J., Im, U., Jaffe, D. A., Komala, N., Kubistin, D., Lam, K., Laurila, T., Lee, H., Levy, I., Mazzoleni, C., Mazzoleni, L. R., McClure-Begley, A., Mohamad, M., Murovec, M., Navarro-Comas, M., Nicodim, F., Parrish, D., Read, K. A., Reid, N., Ries, L., Saxena, P., Schwab, J. J., Scorgie, Y., Senik, I., Simmonds, P., Sinha, V., Skorokhod, A. I., Spain, G., Spangl, W., Spoor, R., Springston, S. R., Steer, K., Steinbacher, M., Suharguniyawan, E., Torre, P., Trickl, T., Lin, W., Weller, R., Xiaobin, X., Xue, L., and Zhiqiang, M.: Tropospheric Ozone Assessment Report: Database and metrics data of global surface ozone observations, Elementa: Science of the Anthropocene, 5, 58, https://doi.org/10.1525/elementa.244, 2017. a

Sellar, A. A., Jones, C. G., Mulcahy, J. P., Tang, Y., Yool, A., Wiltshire, A., O’Connor, F. M., Stringer, M., Hill, R., Palmieri, J., Woodward, S., de Mora, L., Kuhlbrodt, T., Rumbold, S. T., Kelley, D. I., Ellis, R., Johnson, C. E., Walton, J., Abraham, N. L., Andrews, M. B., Andrews, T., Archibald, A. T., Berthou, S., Burke, E., Blockley, E., Carslaw, K., Dalvi, M., Edwards, J., Folberth, G. A., Gedney, N., Griffiths, P. T., Harper, A. B., Hendry, M. A., Hewitt, A. J., Johnson, B., Jones, A., Jones, C. D., Keeble, J., Liddicoat, S., Morgenstern, O., Parker, R. J., Predoi, V., Robertson, E., Siahaan, A., Smith, R. S., Swaminathan, R., Woodhouse, M. T., Zeng, G., and Zerroukat, M.: UKESM1: Description and evaluation of the U.K. Earth System Model, Journal of Advances in Modeling Earth Systems, 11, 4513–4558, https://doi.org/10.1029/2019MS001739, 2019. a

Smith, S. L., Brock, A., Berrada, L., and De, S.: ConvNets match vision transformers at scale, arXiv [preprint], https://doi.org/10.48550/arXiv.2310.16764, 2023. a

Stevenson, D. S., Dentener, F. J., Schultz, M. G., Ellingsen, K., van Noije, T. P. C., Wild, O., Zeng, G., Amann, M., Atherton, C. S., Bell, N., Bergmann, D. J., Bey, I., Butler, T., Cofala, J., Collins, W. J., Derwent, R. G., Doherty, R., Drevet, J., Eskes, H. J., Fiore, A. M., Gauss, M., Hauglustaine, D. A., Horowitz, L. W., Isaksen, I. S. A., Krol, M. C., Lamarque, J. F., Lawrence, M. G., Montanaro, V., Müller, J. F., Pitari, G., Prather, M. J., Pyle, J. A., Rast, S., Rodriguez, J. M., Sanderson, M. G., Savage, N. H., Shindell, D. T., Strahan, S. E., Sudo, K., and Szopa, S.: Multimodel ensemble simulations of present-day and near-future tropospheric ozone, Journal of Geophysical Research: Atmospheres, 111, D08301, https://doi.org/10.1029/2005JD006338, 2006. a

Stock, Z. S., Russo, M. R., and Pyle, J. A.: Representing ozone extremes in European megacities: the importance of resolution in a global chemistry climate model, Atmos. Chem. Phys., 14, 3899–3912, https://doi.org/10.5194/acp-14-3899-2014, 2014. a

Turnock, S. T., Allen, R. J., Andrews, M., Bauer, S. E., Deushi, M., Emmons, L., Good, P., Horowitz, L., John, J. G., Michou, M., Nabat, P., Naik, V., Neubauer, D., O'Connor, F. M., Olivié, D., Oshima, N., Schulz, M., Sellar, A., Shim, S., Takemura, T., Tilmes, S., Tsigaridis, K., Wu, T., and Zhang, J.: Historical and future changes in air pollutants from CMIP6 models, Atmos. Chem. Phys., 20, 14547–14579, https://doi.org/10.5194/acp-20-14547-2020, 2020. a, b, c, d

Vaittinada Ayar, P., Garnero, R. J. O., Gutiérrez, L., Donnelly, M. G., Beltran, A. C. M., Martín, M. C., and Jones, P. D.: Ensemble bias correction of climate simulations: preserving internal variability, Scientific Reports, 11, 3098, https://doi.org/10.1038/s41598-021-82715-1, 2021. a

van Marle, M. J. E., Kloster, S., Magi, B. I., Marlon, J. R., Daniau, A.-L., Field, R. D., Arneth, A., Forrest, M., Hantson, S., Kehrwald, N. M., Knorr, W., Lasslop, G., Li, F., Mangeon, S., Yue, C., Kaiser, J. W., and van der Werf, G. R.: Historic global biomass burning emissions for CMIP6 (BB4CMIP) based on merging satellite observations with proxies and fire models (1750–2015), Geosci. Model Dev., 10, 3329–3357, https://doi.org/10.5194/gmd-10-3329-2017, 2017. a

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I.: Attention Is All You Need, arXiv [preprint], https://doi.org/10.48550/arXiv.1706.03762, 2017. a

Vrac, M. and Friederichs, P.: Multivariate–Intervariable, Spatial, and Temporal–Bias Correction, Journal of Climate, 28, 218–237, https://doi.org/10.1175/JCLI-D-14-00059.1, 2015. a

Walters, D., Baran, A. J., Boutle, I., Brooks, M., Earnshaw, P., Edwards, J., Furtado, K., Hill, P., Lock, A., Manners, J., Morcrette, C., Mulcahy, J., Sanchez, C., Smith, C., Stratton, R., Tennant, W., Tomassini, L., Van Weverberg, K., Vosper, S., Willett, M., Browse, J., Bushell, A., Carslaw, K., Dalvi, M., Essery, R., Gedney, N., Hardiman, S., Johnson, B., Johnson, C., Jones, A., Jones, C., Mann, G., Milton, S., Rumbold, H., Sellar, A., Ujiie, M., Whitall, M., Williams, K., and Zerroukat, M.: The Met Office Unified Model Global Atmosphere 7.0/7.1 and JULES Global Land 7.0 configurations, Geosci. Model Dev., 12, 1909–1963, https://doi.org/10.5194/gmd-12-1909-2019, 2019. a

Wang, X., Han, Y., Xue, W., Yang, G., and Zhang, G. J.: Stable climate simulations using a realistic general circulation model with neural network parameterizations for atmospheric moist physics and radiation processes, Geosci. Model Dev., 15, 3923–3940, https://doi.org/10.5194/gmd-15-3923-2022, 2022. a

Wild, O. and Prather, M. J.: Global tropospheric ozone modeling: Quantifying errors due to grid resolution, Journal of Geophysical Research: Atmospheres, 111, D11305, https://doi.org/10.1029/2005JD006605, 2006. a

Wild, O., Voulgarakis, A., O'Connor, F., Lamarque, J.-F., Ryan, E. M., and Lee, L.: Global sensitivity analysis of chemistry–climate model budgets of tropospheric ozone and OH: exploring model diversity, Atmos. Chem. Phys., 20, 4047–4058, https://doi.org/10.5194/acp-20-4047-2020, 2020. a

Xing, J., Zheng, S., Li, S., Huang, L., Wang, X., Kelly, J. T., Wang, S., Liu, C., Jang, C., Zhu, Y., Zhang, J., Bian, J., Liu, T., and Hao, J.: Mimicking atmospheric photochemical modeling with a deep neural network, Atmospheric Research, 265, 105919, https://doi.org/10.1016/j.atmosres.2021.105919, 2022. a

Young, P. J., Naik, V., Fiore, A. M., Gaudel, A., Guo, J., Lin, M. Y., Neu, J. L., Parrish, D. D., Rieder, H. E., Schnell, J. L., Tilmes, S., Wild, O., Zhang, L., Ziemke, J. R., Brandt, J., Delcloo, A., Doherty, R. M., Geels, C., Hegglin, M. I., Hu, L., Im, U., Kumar, R., Luhar, A., Murray, L., Plummer, D., Rodriguez, J., Saiz-Lopez, A., Schultz, M. G., Woodhouse, M. T., and Zeng, G.: Tropospheric Ozone Assessment Report: Assessment of global‐scale model performance for global and regional ozone distributions, variability, and trends, Elementa: Science of the Anthropocene, 6, 10, https://doi.org/10.1525/elementa.265, 2018. a, b

Articles

Short summary

Chemistry-climate models have advanced substantially over the decades, yet they still exhibit substantial systematic biases in simulating atmospheric composition due to gaps in our understanding of underlying processes. We improve the predictions of an Earth system model using deep learning, and evaluate the performance of difference types of statistical models. We find that simulations of future surface ozone are likely to become less accurate under a warmer climate.

Applying deep learning to a chemistry-climate model for improved ozone prediction

2.1 Chemistry–climate model and experiments

2.2 Six approaches for O3 bias correction

2.2 Six approaches for O₃ bias correction