Articles | Volume 24, issue 12
 | Highlight paper
19 Jun 2024
Opinion | Highlight paper |  | 19 Jun 2024

Opinion: Optimizing climate models with process knowledge, resolution, and artificial intelligence

Tapio Schneider, L. Ruby Leung, and Robert C. J. Wills

Accelerated progress in climate modeling is urgently needed for proactive and effective climate change adaptation. The central challenge lies in accurately representing processes that are small in scale yet climatically important, such as turbulence and cloud formation. These processes will not be explicitly resolvable for the foreseeable future, necessitating the use of parameterizations. We propose a balanced approach that leverages the strengths of traditional process-based parameterizations and contemporary artificial intelligence (AI)-based methods to model subgrid-scale processes. This strategy employs AI to derive data-driven closure functions from both observational and simulated data, integrated within parameterizations that encode system knowledge and conservation laws. In addition, increasing the resolution to resolve a larger fraction of small-scale processes can aid progress toward improved and interpretable climate predictions outside the observed climate distribution. However, currently feasible horizontal resolutions are limited to O(10 km) because higher resolutions would impede the creation of the ensembles that are needed for model calibration and uncertainty quantification, for sampling atmospheric and oceanic internal variability, and for broadly exploring and quantifying climate risks. By synergizing decades of scientific development with advanced AI techniques, our approach aims to significantly boost the accuracy, interpretability, and trustworthiness of climate predictions.

1 Introduction

Climate models serve two distinct purposes. First, they encode our collective knowledge about the climate system. They instantiate theories and provide a quantitative account of climate processes – the complex interplay of causes and effects that governs how the climate system operates. In this role, they belong to the realm of episteme, or explanatory science (Russo2000; Parry2021). Second, climate models function as practical tools that allow us to calculate how the climate system might behave under different circumstances that have not yet been directly observed. In this role, they fall under the realm of techne, or goal-oriented applied science (Russo2000; Parry2021). The requirements for climate models differ depending on their primary role as episteme or techne. As encodings of our understanding (episteme), climate models should strive for explainability and simplicity, even if it means sacrificing a certain level of accuracy. An understanding of the climate system at different levels of description emerges through a hierarchy of models, ranging from simpler ones such as one-dimensional radiative–convective equilibrium models to more complex ones such as atmospheric general circulation models with simplified parameterizations of subgrid-scale processes (Held2005; Jeevanjee et al.2017; Mansfield et al.2023). On the other hand, as calculation tools (techne), climate models should aim to simulate the climate system as accurately as possible under unobserved circumstances.

Over the past 6 decades, climate modeling has operated under the tacit assumption that these two roles of climate models align, implying that the most complex models reflecting our understanding of the system are also the most accurate tools for predicting its behavior in unobserved conditions. This is a desirable goal, but it may not always be attainable in systems as complex as the climate system.

In this essay, we focus on the use of climate models as techne, emphasizing their role as tools for accurately calculating the behavior of the climate system in unobserved circumstances, although, as we will see, this role cannot entirely be decoupled from episteme.

The goal of calculating the behavior of the climate system is to obtain its statistics, including average temperatures at specific locations and seasons, the probability that daily precipitation in a given region exceeds some threshold, or the covariance between temperature and humidity, which can lead to potentially dangerous humid heat extremes. These calculations correspond to what Lorenz (1975) defined as predictions of the second kind, where future climate statistics are estimated given evolving boundary conditions, such as human-induced greenhouse gas emissions. This contrasts with predictions of the first kind, which focus on forecasting the future state of a system given its initial conditions ζ0, as seen in weather forecasting. Consequently, when used as techne, climate models should aim to minimize a loss function of the form (Schneider et al.2017a)

(1) L = y ( t ) - H G t ; θ , λ , ν ; ζ 0 Γ 2 .

Here, the angle brackets 〈⋅〉 indicate an appropriate time averaging, such as a seasonal average over multiple years. The vector y(t) represents time-varying observables of the climate system, including those whose time average y(t)〉 gives rise to higher-order statistics such as the frequency of exceeding a daily precipitation threshold in a specific region. It may also include frequency-space observables, such as the amplitude and phase of the diurnal cycle of precipitation. The climate model, denoted as G(t;θ,λ,ν;ζ0), is a mapping of parameter vectors (θ,λ,ν) and an initial condition vector ζ0 (usually important only for slowly varying components of the climate system, such as oceans and ice sheets) to time-varying simulated climate states ζ(t)=G(t;θ,λ,ν;ζ0). The observation operator 𝓗 maps simulated climate states ζ(t) to the desired observables y(t). Lastly, Γ=Γ-1/22 represents a weighted Euclidean norm, or Mahalanobis distance. The weight is determined by the inverse of the covariance matrix Γ, which reflects model and observational errors and noise due to fluctuations from internal variability in the finite-time average 〈⋅〉. The weighted Euclidean norm is chosen because the climate statistics are aggregated over time, meaning that, due to the central limit theorem, it is reasonable to assume that these statistics exhibit Gaussian fluctuations (Iglesias et al.2013; Schneider et al.2017a; Dunbar et al.2021). However, the specific choice of norm in the loss function is not crucial for the following discussion. The essence is that the loss function penalizes mismatches between simulated and observed climate statistics, with less-noisy statistics receiving greater weight. This can be done for longer-term aggregate statistics or for shorter-term predictions (for example, those of El Niño and its impact on the climate system). The relatively sparse statistics available from reconstructions of past climates can additionally serve as a useful test of climate models outside the distribution of the present climate (Zhu et al.2022).

To achieve accurate simulations of climate statistics, the objective is to minimize the loss function (Eq. 1) for unobserved climate statistics y with respect to the parameters (θ,λ,ν). Importantly, the climate statistics may fall outside the distribution of observed climate statistics, particularly in the context of global warming projections. Therefore, the ability of a model to generalize beyond the distribution of the observed data becomes essential. Merely minimizing the loss over observed climate statistics or even driving the loss to zero in an attempt to imitate observations and pass a “climate Turing test” (Palmer2016) is not sufficient. Instead, fundamental science and data science tools, such as cross-validation and Bayesian tools, need to be brought to bear to plausibly minimize the loss for unobserved statistics.

In the loss function, we distinguish three types of parameters:

  1. The parameters θ appear in process-based models of subgrid-scale processes, such as entrainment and detrainment rates in parameterizations of convection. These parameters are directly interpretable and theoretically measurable, although their practical measurement can be challenging.

  2. The parameters λ represent the characteristics of the climate model's resolution, such as the horizontal and vertical resolution in atmosphere and ocean models.

  3. The parameters ν pertain to artificial intelligence (AI)-based data-driven models that capture subgrid-scale processes or correct for structural model errors, either within process-based models of subgrid-scale processes or holistically for an entire climate model (Kennedy and O'Hagan2001; Levine and Stuart2022; Bretherton et al.2022; Wu et al.2024). These parameters are neither easily interpretable nor directly measurable but are learned from data.

This distinction among the parameters is useful as it reflects three different dimensions along which climate models can be optimized. First, optimization can be achieved by calibrating parameters and improving the structure of process-based models that represent subgrid-scale processes such as turbulence, convection, and clouds. These processes have long been identified as a dominant source of biases and uncertainties in climate simulations (Cess et al.1989; Bony and Dufresne2005; Stephens2005; Vial et al.2013; Schneider et al.2017b; Zelinka et al.2020). Second, optimization can be accomplished by increasing the resolution of the models, which reduces the need for parameterization (Bauer et al.2021; Slingo et al.2022). Finally, optimization can be pursued by integrating AI-based data-driven models. These models have the potential to replace (Gentine et al.2018; O'Gorman and Dwyer2018; Yuval and O'Gorman2020; Yuval et al.2021) or complement (Schneider et al.2017a; Lopez-Gomez et al.2022) process-based models for subgrid-scale processes. Additionally, they can serve as comprehensive error corrections for climate models (Watt-Meyer et al.2021; Bretherton et al.2022; Wu et al.2024).

In the past 2 decades, efforts to optimize climate models have often focused on individual dimensions in isolation. For example, Climate Process Teams initiated under the US Climate Variability and Predictability Program have concentrated on enhancing process-based models by incorporating knowledge from observational and process-oriented studies into climate modeling (Subramanian et al.2016). The resolution of atmosphere and ocean models has gradually increased, albeit at a pace slower than the advances in computer performance would have allowed (Schneider et al.2017b). More recently, there have been calls to prioritize increasing the resolution; the aim is to achieve kilometer-scale resolutions in the horizontal, with the expectation that this would alleviate the need for subgrid-scale process parameterizations, such as those for deep convection, and substantially increase the reliability of climate predictions (Bauer et al.2021; Slingo et al.2022). Moreover, there is rapidly growing interest in advancing climate modeling by using AI tools, broadly understood to include tools such as Bayesian learning, deep learning, and generative AI (e.g., Schneider et al.2017a; Reichstein et al.2019; Chantry et al.2021; Watson-Parris2021; Balaji et al.2022; Irrgang et al.2022; Schneider et al.2023).

Beginning with a review of recent advances in the goodness of fit between climate simulations and observed records, here we will explore the potential benefits and challenges associated with optimizing each of the three dimensions mentioned earlier. Our analysis will highlight the importance of adopting a balanced approach that encompasses progress along each dimension, as this is likely to yield the most robust and accurate climate models and the most trustworthy and usable predictions.

2 Evolution of climate models

The climate statistics y used in the loss function (Eq. 1) can vary depending on the specific application. For example, a national climate model may prioritize minimizing the loss within a particular country. However, there are several climate statistics that are generally considered important and should be included in any comprehensive loss function. Two such examples are the top-of-atmosphere (TOA) radiative energy fluxes and surface precipitation rates.

The inclusion of TOA radiative energy fluxes is crucial because accurately simulating these fluxes is a prerequisite for accurately simulating the changes in any climate statistic. After all, radiative energy is the primary driver of the climate system. Changes in radiative energy fluxes caused by changes in greenhouse gas concentrations drive global climate change; climate models must accurately simulate changes in these energy fluxes and their effect on multiple climate system components, from oceans and land surfaces to clouds. As a consequence, errors in radiative energy fluxes affect many aspects of a simulated climate, from wind to precipitation distributions. The balance of TOA radiative energy fluxes must also be closed to machine precision. A closed energy balance is necessary to achieve a steady climate in unforced centennial to millennial integrations in which tiny imbalances of the energy budget otherwise accumulate over 107 discrete time steps, leading to large-scale climate drift. The conservation requirements for climate predictions – for what John von Neumann called the “infinite forecast” (Edwards2010) – are more stringent than those for the short-term integrations needed for weather forecasting. Similarly, precipitation rates are of significant importance as they are part of what closes the water balance and they directly impact human activities. Achieving accurate simulations of precipitation rates relies on accurately simulating numerous subgrid-scale processes within the climate system. Therefore, precipitation is an emergent property that serves as a holistic metric to assess the goodness of fit of a climate model.

Figure 1Normalized rms error (RMSE) in the seasonal climatology of precipitation, top-of-atmosphere (TOA) longwave (LW) radiation, and TOA shortwave (SW) radiation for different models and model intercomparison projects. The rms errors are relative to climatologies from the Global Precipitation Climatology Project (GPCP) (Adler et al.2018) and CERES-EBAF (Loeb et al.2009) datasets over the period 2001–2020. CMIP and AMIP rms errors represent median values of the RMSE computed separately for each of the included models. Climatologies were computed as follows: for CMIP3, over 2001–2020 for the B1 scenario; for CMIP5/6, over 2001–2020 for a combination of the historical and RCP4.5/SSP2.45 scenarios; for AMIP, over 1995–2014 for models contributing to CMIP6; for HadGEM3-GC31 and MPI-ESM1-2, over 1995–2014 for the historical simulations and over 4 simulated years for the kilometer-scale nextGEMS cycle 3 simulations (Hohenegger et al.2023; Rackow et al.2024), which are shown together with averages over the equivalent simulation length in CMIP6. The rms error is normalized by the median across the CMIP3 and CMIP5 models for each field and across all seasons, with normalization constants shown below the color bar. HadGEM3-GC3.1 and MPI-ESM1.2 from HighResMIP (Haarsma et al.2016) are sorted in order of increasing horizontal resolution, with the atmospheric resolution for each configuration stated over the respective column (see Table 1).


Figure 1 assesses the evolution of climate models over the past 2 decades in simulating the observed climatology of TOA radiative energy fluxes and precipitation rates, temporarily setting aside that the loss minimization should occur for unobserved records. The figure displays the median root mean square (rms) error between model seasonal climatologies and observations, with all data conservatively remapped to a common 2.5° latitude–longitude grid using Climate Data Operators (Schulzweida2023).1 The plot includes three generations of climate models from the Coupled Model Intercomparison Project (CMIP) as well as recent higher-resolution simulations. It is evident that, over time, there has been a gradual improvement in the fidelity of models in simulating TOA radiative energy fluxes and precipitation. For example, in CMIP6 (late 2010s), the median rms error relative to CMIP3 (mid-2000s) is reduced by 15 % for precipitation, 31 % for TOA outgoing longwave flux, and 30 % for TOA reflected shortwave flux, with all values indicating average seasonal-mean improvements (Fig. 1, upper row). Individual modeling centers have surpassed this median rate of improvement; for example, there are rms error reductions of 30 % for precipitation, 49 % for TOA outgoing longwave flux, and 36 % for TOA reflected shortwave flux in the progression from CCSM3 to CESM2 at the National Center for Atmospheric Research (NCAR) (Fig. 1, lower row). These improvements primarily stem from advances in process-based parameterizations and model tuning (e.g., Danabasoglu et al.2020). The average resolution has also increased across the model generations, shifting from around 200–400 km horizontally in the atmosphere in CMIP3 to around 100–200 km in CMIP6 (Schneider et al.2017b; Intergovernmental Panel on Climate Change2021).

To specifically examine the impact of resolution, we consider two models from the High Resolution Model Intercomparison Project (HighResMIP; Haarsma et al.2016): HadGEM3-GC3.1 (Roberts et al.2019) and MPI-ESM1.2 (Gutjahr et al.2019). These models have conducted simulations at three different resolutions, with horizontal resolutions in the atmosphere between 25 and 200 km, without resolution-specific tuning (Table 1).

Table 1Atmosphere and ocean model resolutions of the HighResMIP simulations included in Fig. 1.

Download Print Version | Download XLSX

Both models exhibit a modest but consistent reduction in error metrics as resolution increases. However, there is one exception: the doubling in atmospheric horizontal resolution from MPI-ESM1.2-HR (100 km) to MPI-ESM1.2-XR (50 km), without an increase in ocean resolution or atmospheric vertical resolution, did not result in an improvement in error metrics. This finding suggests that ocean resolution and atmospheric vertical resolution are also important factors contributing to the improvements with resolution.

Recently, there has been a push to increase the resolution of climate models even further, to kilometer scales, allowing for partial resolution of deep convection and potential improvements in simulating precipitation and its extremes (Bauer et al.2021; Slingo et al.2022). In numerical weather prediction, enhanced horizontal resolution has led to improvements, for example, in rainfall predictions on timescales from hours to days (Clark et al.2016). However, whereas the assimilation of data at the initialization of a forecast continuously pulls numerical weather predictions close to the climate attractor, long-term climate simulations require a realistically closed energy balance to remain on the climate attractor. The energy balance also depends on dynamics at scales well below 1 km (e.g., in tropical low clouds, which are crucial for climate but less important for weather prediction), making it less clear that increased resolution by itself results in better climate simulations. Figure 1 displays the rms errors of two kilometer-scale coupled models (IFS and ICON) in simulating the seasonal climatology of TOA radiative energy fluxes and precipitation.2 For direct comparison to these short kilometer-scale simulations, Fig. 1 includes the rms errors in CMIP6 simulations for the equivalent averaging period of 4 years. Compared to the coarser-resolution simulations, the kilometer-scale simulations show improvements in TOA shortwave fluxes and longwave fluxes but little improvement – or in some cases even increases – in the errors in precipitation. These simulations highlight that a higher resolution alone does not guarantee an improved fit in climate simulations. Many crucial climate-regulating processes, such as shallow clouds and cloud microphysics, remain unresolved at kilometer scales, requiring appropriate parameterization. Extensive calibration or even a re-design of the subgrid-scale parameterizations at kilometer-scale resolution is necessary to reduce large-scale biases that can otherwise exceed those seen in coarser-resolution models (Wedi et al.2020; Hohenegger et al.2023). However, the high computational cost at kilometer-scale resolutions has so far inhibited systematic model calibration or the exploration of novel parameterization approaches needed at these resolutions.

Figure 2August precipitation in satellite observations (top), coarse-resolution AMIP simulations (middle), and kilometer-scale DYAMOND simulations (bottom). Observations are based on GPCP (Adler et al.2018). AMIP simulations are from 14 models that participated in CMIP6. Both the GPCP data and the AMIP simulations are interpolated to a common 1°×1° grid, and the August climatology is derived from 1979–2014. The DYAMOND results, shown at the models' native resolutions, are based on the average of five models with horizontal resolutions ranging from 3.3 to 7.8 km for August 2020. The figure was adapted from Zhou et al. (2022).

Figure 2 provides a more detailed illustration of how kilometer-scale models can inherit longstanding biases from coarse-resolution models. The figure compares August precipitation between observations and simulations. The simulations include coarse-resolution AMIP models and a set of kilometer-scale simulations conducted under the DYAMOND project (Stevens et al.2019). The figure reveals that the kilometer-scale simulations capture more intricate details in the precipitation patterns, such as the strong orographic precipitation in the Himalayas, New Guinea, and the Sierra Madre Occidental. However, they still exhibit similar large-scale biases to the coarse-resolution simulations, such as excessive precipitation over the tropical regions of the south Pacific and Indian Oceans, commonly referred to as the double-intertropical convergence zone (ITCZ) bias (Tian and Dong2020). The double-ITCZ bias has important implications for regional precipitation projections over land (Dong et al.2021).

Over the past 2 decades, then, climate models have shown gradual improvements in key metrics, with error reductions of 10 %–20 % per decade, as seen in Fig. 1 and in other studies (Bock et al.2020). However, there are still errors that are large compared to the climate change signals we aim to predict. For instance, the radiative forcing due to doubling CO2 concentrations is about 4 W m−2, while rms errors in TOA radiative energy fluxes are O(10 W m−2). The response of climate models to increasing greenhouse gas concentrations also varies widely across models. For example, the time when the 2 °C warming threshold of the Paris Agreement is exceeded varies by several decades among models (Schneider et al.2017b; Intergovernmental Panel on Climate Change2021). This indicates that there is significant room for further improvement.

Given the significant errors in simulating the current climate and the uncertainties in future projections, there exists a large gap between the demands placed on climate models for adaptation decisions – such as designing stormwater management systems or sea walls to handle a 100-year flood in the decades ahead – and the capabilities of models today (Fiedler et al.2021; President's Council of Advisors on Science and Technology2023). Yet, the need for such decision-making is immediate. Therefore, it is urgently necessary to accelerate the improvement of climate models beyond the gradual advances of recent decades, i.e., to achieve a step-change enhancement in both accuracy and usability for decision-making. The question is how to achieve such a step change.

3 Process-based parameterizations

The uncertainties and biases in climate simulations, as shown in Figs. 1 and 2, have their roots in the parameterization of unresolved small-scale processes. So far, these processes have been primarily parameterized based on process knowledge, in a reductionist approach. For example, the influential work of Arakawa and Schubert (1974) laid the foundation for widely used parameterizations of moist convection, employing a reductionist process model of convective plumes that are, at all times, in statistical equilibrium with their environment and incorporate environmental air by entrainment. Research over the past 2 decades has focused on refining the formulation of the entrainment rate, a key control on climate model sensitivity to greenhouse gas concentrations (Stainforth et al.2005; Knight et al.2007). Typically, this rate is represented by a constant parameter ϵ=θ or a parametric function ϵ=ϵ(z,ζ;θ) of height z and (usually local) plume and environmental properties encoded in the model state ζ (e.g., de Rooy et al.2013; Yeo and Romps2013; Anber et al.2019; Savre and Herzog2019; Cohen et al.2020). Similarly, diffusive closures of various types have been commonly employed for boundary layer turbulence in the atmosphere and oceans. These closures employ diffusivities that may depend on height, other flow variables, or a turbulence kinetic energy determined by separate equations, and they are sometimes augmented by correction terms to represent upgradient fluxes in convective boundary layers (e.g., Mellor and Yamada1982; Large et al.1994; Lock et al.2000).

The process-based approach offers the advantage that the parameters or parametric functions that require closure are interpretable and theoretically measurable. For example, Monin–Obukhov similarity theory reduced the problem of parameterizing turbulence in a thin (∼100 m) layer near the surface to finding universal functions that characterize the vertical structure of turbulent fluxes (Foken2006). Later, these functions were empirically derived based on measurements over a field of wheat stubble in Kansas during the summer of 1968 (Businger et al.1971); they have since been widely incorporated into climate models. This represents a success story for the process-based approach. It led to a parametrically sparse and interpretable representation of near-surface turbulent fluxes. It also demonstrates invariance properties that make it applicable not only to summer conditions over Kansas wheat fields but across most of the globe, with relatively few limitations, particularly in convective situations.

However, despite this progress, the dominant source of uncertainties and biases in climate simulations, even 50 years after the introduction of the convection parameterization by Arakawa and Schubert (1974), lies in the representation of turbulence, convection, and clouds (including their microphysics) above the near-surface layer. This indicates that the reductionist approach to developing process-based models for these components has encountered significant challenges. For example, measuring entrainment rates directly, despite being theoretically possible, remains challenging in practice, whether using observational data or high-resolution simulations (e.g., Romps2010). The search for universal functions to accurately represent entrainment has been unsuccessful thus far. Consequently, the process-based approach to modeling convection and clouds is widely perceived as being deadlocked (Randall et al.2003; Randall2013; Gentine et al.2018).

However, prematurely dismissing process-based modeling as obsolete would ignore its advantages and its potential for further development. Contrasting the achievements of Monin–Obukhov similarity theory and moist-convection parameterizations is illuminating. Monin–Obukhov similarity theory systematically coarse-grained the equations of motion, employing controlled approximations and identifying the nondimensional groups of variables that govern near-surface turbulent fluxes. The approach reduced the closure problem to finding universal functions of the identified nondimensional groups, with well-defined limits in different scenarios. This led to its near-universal applicability. In contrast, the moist-convection parameterizations in current use emerged phenomenologically, without a systematic coarse-graining of the known equations of motion through controlled approximations. Even when starting from a rigorous basis like the Arakawa and Schubert (1974) parameterization, operational parameterizations often introduced artificial scale breaks between boundary layer turbulence, shallow convection, and deep convection or even between convection over land and oceans, leading to separate parameterizations with discontinuous differences in parameters and structure. Such discontinuities do not exist in nature. As a result, these parameterizations lack well-defined limits. For example, they do not converge to a well-defined dry limit when the latent heats of fusion and vaporization of water approach zero, and they do not converge to the Navier–Stokes equation as resolution increases. This approach hindered the systematic removal of unnecessary approximations, particularly as model resolution increased and common assumptions, such as small plume area fractions relative to the host model's grid scale or statistical equilibrium between moist convection and the environment, became inadequate (Arakawa et al.2011; Arakawa and Wu2013; Randall2013).

Therefore, rather than declaring that process-based modeling for moist convection and other complex processes is at a dead end, a more nuanced perspective recognizes the need for further development with greater mathematical and physical rigor, particularly in light of the abundant data and enhanced computational capabilities available today, which surpass what the early pioneers of these approaches had at their disposal. The invariance properties, such as conservation laws and symmetries, inherited by this approach from the underlying equations of motion may well hold the key to developing universal parameterizations that enable us to minimize the loss (Eq. 1) for unobserved climate statistics and to generalize beyond the observed distribution. That is, progress on macroscopic techne here hinges on microscopic episteme.

These considerations suggest that successful process-informed parameterizations satisfy four clear requirements:

  1. Parameterizations should be grounded in the governing equations of subgrid-scale processes whenever feasible. Equations for parameterizations can be obtained by systematic coarse-graining through methods such as conditional averaging – for example, resulting in distinct equation sets for coherent structures like updrafts and their more isotropically turbulent environment – or the derivation of moment equations rooted in distribution assumptions on subgrid-scale fluctuations. Whatever approach is adopted, it is crucial that assumptions are explicitly laid out and subject to empirical validation or revision. Examples of such approaches for turbulence and convection include Lappen and Randall (2001), Golaz et al. (2002), Soares et al. (2004), Siebesma et al. (2007), Witek et al. (2011), Guo et al. (2015), Firl and Randall (2015), Tan et al. (2018), Thuburn et al. (2018), Cohen et al. (2020), and Lopez-Gomez et al. (2020).

  2. Artificial scale breaks, such as those between shallow and deep convection, should be avoided. These breaks lack correspondence to nature and introduce unphysically discontinuous dynamical transitions. They also lead to correlated parameters that are difficult to calibrate with data. For example, discontinuous transitions between shallow and deep convection impede an accurate simulation of the diurnal cycle of convection (Christopoulos and Schneider2021; Tao et al.2024) and result in correlated parameters in the shallow and deep convection schemes that are difficult to identify from data. Moreover, bridging these discontinuous scale breaks becomes problematic as resolution increases, for example, through the “gray zone”, where processes such as deep convection become partially resolved.

  3. When scale separation is absent between the parameterized subgrid-scale processes and the resolved grid scale, parameterizations must incorporate subgrid-scale memory and stochastic terms. This implies that convection and cloud parameterizations, for example, must be explicitly time dependent (i.e., have memory) and cannot be assumed to be in instantaneous equilibrium with the environment. Homogenization theories such as those of Mori and Zwanzig (Zwanzig2001), which employ averaging but also shows how fluctuations about averages arise on the macroscale, support the inclusion of these features (Majda et al.2003; Wouters and Lucarini2013; Lucarini et al.2014; Wouters et al.2016; Lucarini and Chekroun2023).

  4. Parameterization schemes for different processes must be coupled such that they interact consistently (Devine et al.2006; Gross et al.2018). For example, models for subgrid-scale fluctuations of cloud dynamics must be coupled consistently with parameterizations for cloud microphysics, ensuring that nonlinear interactions between microphysical processes such as ice nucleation and the thermodynamics and velocities of updrafts are consistently modeled (Gettelman et al.2019). This is likely particularly important for processes such as the formation of supercooled liquid in strong updrafts, which occur out of thermodynamic equilibrium and hence are dependent on the history, and not just the instantaneous state, of air masses. Such processes are known to strongly affect the response of climate models to increased greenhouse gas concentrations (Zelinka et al.2020).

Developing process-based parameterizations by systematically coarse-graining equations of motion will lead to unclosed terms, similar to the universal functions in Monin–Obukhov similarity theory. These terms should be expressed in terms of nondimensional variable groups that make them as “climate invariant” as possible (Beucler et al.2024). Whether they contain parameters, parametric functions, or non-parametric functions, they then become excellent targets for AI-enabled learning from data.

There is accumulating evidence that a program focused on process-based parameterizations that satisfy the above requirements can achieve success. For example, Lopez-Gomez et al. (2020) and Cohen et al. (2020) have demonstrated the effectiveness of a unified parameterization approach for the spectrum of small-scale motions from boundary layer turbulence to deep moist convection. This parametrically sparse approach is based on conditionally averaged equations of motion, which leads to additional evolution equations for subgrid-scale quantities such as updraft energies and mass fluxes. The equations for the subgrid-scale quantities carry additional information, including subgrid-scale memory, augmenting the information available on the grid scale of a model. Within one continuous parameterization framework, they are able to accurately represent a wide range of cloud dynamics observed on Earth, from stable boundary layers to stratocumulus-topped boundary layers and deep convection. Furthermore, Lopez-Gomez et al. (2022) have shown that machine learning can be employed to identify closure functions in these parameterizations, such as entrainment rates that depend on climate-invariant nondimensional groups.

As climate models reach resolutions where deep convection becomes marginally resolved, using an inadequate deep-convection parameterization based on instantaneous statistical equilibrium assumptions may well be less effective than not using any parameterization at all. For this reason, in the kilometer-scale simulations shown in Figs. 1 and 2, deep-convection parameterizations are either turned off or run with reduced activity (Stevens et al.2019). However, parameterizations for boundary layer turbulence and low-cloud cover are usually kept, and sometimes those for shallow convection (e.g., in IFS in Fig. 1) are too, even though they were originally developed for resolutions in the 100 km range, where, for example, assumptions of instantaneous statistical equilibrium of subgrid scales with resolved scales are more justifiable. As seen in the above figures, this approach has not yet achieved the hoped-for success; in particular, it has not significantly improved large-scale precipitation simulations at kilometer-scale resolutions. Consequently, it is essential to advance the development of parameterizations that effectively bridge the scales between marginally resolved convection and the dynamics that remain unresolved in this resolution range, in addition to parameterizations of yet-smaller scales, such as the microphysics of cloud droplet and ice crystal formation.

Figure 3Kinetic energy spectra and cumulative energy in the atmosphere. (a) Spectral kinetic energy density based on aircraft measurements, shown as a function of inverse horizontal wavelength λ-1=k/(2π). (b) Cumulative kinetic energy from λmax=5000km to λ on the upper horizontal axis, normalized by the energy for λmin=10m. The lower horizontal axis expresses the wavelength λ through the required horizontal model grid spacing Δx, using λ≈7Δx. Blue is for horizontal-motion kinetic energy; red is for vertical-motion kinetic energy. Data from Callies et al. (2014) (MOZAIC) and Schumann (2019) (NAWDEX). Dashed lines in (a) indicate linear extrapolations in log-log space, except for the dashed blue line where NAWDEX and MOZAIC data overlap; there, the dashed line represents the NAWDEX spectrum multiplied by a fitting constant to match the MOZAIC spectrum. Cumulative energies are obtained by numerical integration over the spectra, including the extrapolations. The dashed line in (b), for wavelengths λ≤25 km, represents a power law fit (1-a(λβ-λminβ), with β≈0.4 and a≈0.5) to the cumulative vertical kinetic energy. Note that the extrapolations of vertical kinetic energy to large scales may not be very accurate due to possible deviations from a completely flat spectrum, which can slightly shift the position of the inflection point in the cumulative energy (Skamarock et al.2014; Schumann2019).


4 Resolution

Climate is regulated by turbulent motions in the atmosphere and oceans. Horizontal motions transport energy, momentum, and, in the atmosphere, water vapor, shaping surface temperatures, winds, and precipitation patterns. Vertical motions couple the atmosphere and surface, creating clouds, driving precipitation, and mixing heat and tracers such as carbon dioxide in the oceans. Representing these turbulent motions accurately is crucial for climate models, but it is challenging due to their vast range of length scales, from planetary to millimeter scales.

Figure 3a shows the kinetic energy spectra of horizontal and vertical motions in the atmosphere, measured by aircraft. The spectra are displayed as functions of the inverse horizontal wavelength λ−1, which is proportional to the horizontal wavenumber k=2π/λ. At large scales (small wavenumbers), the spectrum of horizontal kinetic energy follows a k−3 power law, as predicted by geostrophic turbulence theory (Vallis2006, Chap. 9). At mesoscales below approximately 500 km, the spectrum becomes shallower, resembling a k-5/3 power law. The reason for this change has been debated. The shallower spectrum seems to be caused by linear inertia-gravity waves, which are internal waves modified by planetary rotation that coexist with the nonlinear, primarily geostrophic, atmospheric turbulence (Dewan1979; VanZandt1982; Callies et al.2014).

At scales greater than 10–20 km, the kinetic energy of vertical motions is much weaker than that of horizontal motions, with a relatively flat spectrum. This difference is mainly due to two factors: (1) the scale of vertical motions is limited by the depth of the troposphere (about 10–20 km), which contains the most important vertical motions; and (2) the vertical velocity depends on the divergence of the horizontal velocity, which is weaker (by a factor of order Rossby number) than the dominant rotational velocity at large scales, though it becomes comparable to it on mesoscales. The divergence involves horizontal derivatives, leading to a multiplication by k2 of the kinetic energy spectra in wavenumber space at horizontal scales above 10–20 km, where the vertical depth scale is constrained by the depth of the troposphere (see Schumann2019 for a detailed model from which these insights are drawn). This results in the relatively flat spectrum with low vertical kinetic energy at larger horizontal scales. At horizontal scales smaller than about 10–20 km, where the horizontal scale is comparable to the vertical scale and the latter is no longer constrained by the depth of the troposphere, the vertical kinetic energy spectrum starts to decay at a rate of roughly k-5/3, like the horizontal kinetic energy spectrum. At yet smaller scales in the meter range, the turbulence becomes increasingly isotropic, which also results in a k-5/3 power law because three-dimensional turbulence follows a Kolmogorov spectrum. The figure shows an extrapolation of both the horizontal and vertical kinetic energy spectra from the smallest measured scale near 100 m down to 10 m for illustrative purposes. However, in reality, the spectra continue without a break to the Kolmogorov scale at millimeters, where kinetic energy is dissipated.

As the horizontal climate model resolution increases, the continuity of the atmospheric energy spectrum implies a gradual improvement as resolved motions replace imperfectly parameterized smaller scales. To quantitatively assess the benefits of higher resolution in climate models, we integrate the energy spectra E^(k) over a wavenumber interval from kmin=2π/λmax to k=2π/λ:

(2) E ( k min , k ) k min k E ^ ( k ) d k .

Figure 3b illustrates the cumulative energy between λmax=5000km and a given λ on the upper horizontal axis, normalized by the cumulative energy extrapolated to λmin=10m. Because of the steepness of the horizontal kinetic energy spectrum at large scales, the benefits of increased resolution for horizontal kinetic energy level off at wavelengths just under 1000 km. This corresponds to a grid spacing of around Δx≈150 km (lower horizontal axis in Fig. 3) because the minimum wavelength λ a model can resolve is approximately x (Skamarock2004; Wedi2014; Klaver et al.2020). Climate models reached this “geostrophic turbulence plateau” in resolution in the past decade (Schneider et al.2017b). However, the vertical kinetic energy spectrum remains relatively flat at larger scales, leading to continued benefits in resolving vertical kinetic energy as λ decreases.3 Concretely, the data in Fig. 3 indicate that resolving wavelengths of 1000, 100, and 10 km (grid spacings Δx of about 150, 15, and 1.5 km, respectively) increases the fraction of resolved vertical kinetic energy between 5000 km and 10 m from 0.6 % to 7 % and 43 %, respectively. The returns on increasing the resolution only begin to diminish for wavelengths below 1 km; that is, grid spacings Δx≲150 m. The specific results depend on the data and extrapolations used (Schumann2019), but the main finding is clear: even at kilometer-scale resolution, most vertical motions require parameterization. This is especially true for the motions at horizontal scales in the meter to 100 m range, which generate the low clouds that help control Earth's energy balance (Bony and Dufresne2005; Stephens2005; Vial et al.2013; Schneider et al.2017b, 2019). Therefore, as we push the resolution frontier, it is crucial to concurrently improve the parameterization of smaller-scale turbulent motions in ways that, as discussed in Sect. 3, are commensurate with the model resolution; for example, statistical equilibrium assumptions for subgrid-scale fluctuations must be relaxed at resolutions where scale separation between resolved and parameterized processes disappears.

Increasing the horizontal resolution of climate models incurs a substantial computational cost, which grows as x)−3 for a fixed vertical resolution. This cost arises from the increasing number of horizontal grid points, (Δx)-2, and the necessity for smaller time steps, (Δx)-1, to maintain numerical stability. To illustrate using the previous example from Fig. 3, reducing the grid spacing from Δx≈15 km to Δx≈1.5 km increases the computational cost by a factor of 1000 while enhancing the cumulative vertical kinetic energy resolved by a factor of 6, from 7 % to 43 %. That is, the rate of improvement in resolved vertical kinetic energy (a low power of Δx) is significantly smaller than the additional computational expense. From this perspective, increasing the horizontal resolution is an inefficient means of improving climate models. Moreover, the vertical grid spacing Δz must also be considered and typically should scale with Δx (Lindzen and Fox-Rabinovitz1989); however, existing process-based parameterizations are often manually calibrated to a specific vertical resolution, resulting in a reluctance to increase the vertical resolution alongside the horizontal resolution in practice. Increasing the vertical resolution entails a more modest computational cost, which generally scales as z)−1. This is because fast vertical dynamics are generally treated implicitly in climate models to circumvent time-step limitations, ideally using implicit solvers with computational costs that are linear in z)−1. As suggested in Fig. 1, increasing the vertical resolution at a coarser horizontal resolution can be advantageous because it can improve the representation of parameterized subgrid-scale dynamics (Harlaß et al.2015; Kawai et al.2019; Smalley et al.2023). At higher horizontal resolutions where the resolved dynamics become more isotropic, proportionately increasing both the vertical and horizontal resolution becomes necessary, leading to a computational cost that scales even less favorably, like x)−4.

Therefore, optimizing the parameters λ that define model resolution requires inevitable trade-offs. Even at foreseeable future resolutions, unresolved scales of atmosphere and ocean turbulence plus even finer scales controlling cloud microphysics and other processes will still require parameterization. While increasing the resolution gradually improves the representation of turbulent dynamics and enhances the resolution of the surface topography, gravity waves, and land–sea contrasts, the 1000-fold increase in computational cost from O(10 km) to O(1 km) is unlikely to justify the benefits (Wedi et al.2020). It will remain crucial to make parameterizations as resolution independent (“scale aware”) as possible and to allocate some computational resources to calibrating parameterizations with data, which requires hundreds to thousands of climate simulations. Although calibrating over shorter (weather) timescales is computationally feasible and may be beneficial, it does not guarantee improved simulations of longer-term climate statistics (Schirber et al.2013). Moreover, to quantify climate risks, it is necessary to run ensembles of climate simulations to broadly explore possible climate outcomes. Doing so requires O(10) ensemble members to sample atmospheric and oceanic internal variability (Deser et al.2020; Wills et al.2020; Bevacqua et al.2023), ideally with each of those also sampling model uncertainties by drawing from a posterior distribution over plausible models (Dunbar et al.2021; Howland et al.2022), resulting in hundreds of decades-long climate simulations. Ensemble generation lends itself well to distributed (cloud) computing as it is embarrassingly parallel. However, it also constrains the routinely achievable resolution in climate models.

Therefore, kilometer-scale resolution remains an experimental frontier. Currently, routinely achievable atmospheric resolution lies in the 1050 km range (Schneider et al.2023), while ocean resolutions of 510 km are achievable (Chang et al.2020; Silvestri et al.2024). By finding the right balance between resolution and parameterization learning and calibration, we can make significant strides in improving climate simulations within realistic computational constraints.

5 AI for learning parameterizations

Even at the highest resolutions achievable in climate modeling, parameterizing small-scale processes remains essential. While it may be tempting to learn about all small-scale processes holistically from data, this approach is more likely to be successful in weather prediction, where short-term accuracy is prioritized and energy conservation is less critical because daily data assimilation prevents model drift. In contrast, climate prediction faces two primary challenges.

First, energy conservation and predicting changes in Earth's energy balance are paramount, as exemplified by Suki Manabe's Nobel-Prize-winning work. His climate modeling work began with radiative–convective equilibrium models to explore the energetic effects of changes in atmospheric composition on the atmosphere and surface (Manabe and Strickler1964; Manabe and Wetherald1967). In the same vein, climate models must accurately predict responses to changes in atmospheric composition, such as greenhouse gas concentrations and aerosol loading. Some of these responses involve rapid adjustments that are independent of surface temperature changes. For example, changes in greenhouse gas concentrations can modulate cloud cover through rapid adjustments mediated by changes in longwave radiative fluxes, in addition to the response of clouds to surface temperature changes (Gregory and Webb2008; Sherwood et al.2015; Bretherton2015). Predicting these effects separately is essential for future climate scenarios: in future climates, changes in aerosol loadings and greenhouse gas concentrations can decorrelate, and changes in longwave radiative fluxes can decouple from surface temperature changes, as is the case in solar geoengineering scenarios (Schneider et al.2020). Learning their compound effects holistically from data will not enable such predictions because the effects cannot be disentangled from data alone, where, for example, changes in greenhouse gas concentrations and surface temperature are correlated; instead, modeling the processes separately using known physics as guardrails appears essential.

Second, climate change prediction is an out-of-distribution challenge, as we lack data for future, warmer climates. While using simulated data for learning is one approach, it restricts learning to model emulation and may not capture complex processes such as aerosol effects on clouds, which currently cannot be simulated reliably, even in limited domains. Therefore, learning from observations must be informed by known physics to ensure models generalize beyond the observed climate distribution.

To make progress and generalize beyond observed climate data, we can build on process-based parameterizations, which encode known physics through conservation laws and invariance properties. AI-based methods, broadly understood to include methods ranging from Bayesian to deep learning, can aid in learning about entire parameterizations for individual processes or unclosed terms and functions within them. This can reduce inaccuracies in climate models and potentially also allow the quantification of uncertainties.

AI approaches require the specification of a loss function. The most suitable loss function (Eq. 1) penalizes differences between simulated and observed climate statistics, weighted by the inverse of a covariance matrix representing noise sources such as observational error and internal variability. The loss function should include variables such as TOA radiative energy fluxes and global precipitation fields, as shown in Fig. 1. It may also include higher-order statistics, such as the covariance between surface temperature and cloud cover (Schneider et al.2017a). This covariance represents an emergent constraint: a statistic that, across climate models, correlates with the response of cloud cover to greenhouse gas concentration increases (e.g., Klein and Hall2015; Brient and Schneider2016; Caldwell et al.2018; Hall et al.2019). Such emergent constraints can arise, for example, from fluctuation–dissipation theorems that relate fluctuations in a system to the response of the system to external perturbations (Ruelle1998; Lucarini and Chekroun2023). Emergent-constraint statistics, previously used only for retrospective model assessments, can be proactively minimized in the loss function to improve model accuracy in simulating greenhouse gas responses.4

However, using climate statistics in a loss function challenges traditional machine-learning (ML) methods. Supervised learning (SL), the dominant ML approach, depends on labeled input–output pairs for process modeling and learns regressions of outputs onto inputs. For example, a convection parameterization requires at least temperature and humidity inputs, which must be paired with the output – the convective time tendencies of temperature and humidity – for training. Since such data are unavailable from Earth observations, SL has been limited to simulated data (e.g., O'Gorman and Dwyer2018; Rasp et al.2018; Gentine et al.2018; Yuval and O'Gorman2020; Yuval et al.2021; Yu et al.2023). Conversely, the climate statistics in the loss function (Eq. 1) provide only indirect information about processes such as convection. For example, the loss function may include fields such as precipitation and cloud cover – noisy fields with missing data that are influenced by multiple processes, including but not limited to convection (Schneider et al.2017a).

To illustrate, consider determining closures in a conservation equation:

(3) D q D t = F + S .

Here, q(x,t) is a tracer, such as total specific humidity, dependent on space x and time t, and D/Dt=/t+u is the material derivative with the fluid velocity u(x,t). The quantities on the left-hand side are taken to be resolved on the model's grid. The right-hand side consists of two components: ℱ(x,t) represents unresolved subgrid-scale flux divergences in need of parameterization; 𝒮(x,t) denotes all other, separately modeled, sources and sinks.

In SL approaches, the objective is to map the model state ζ to approximate subgrid-scale flux divergences F^(ζ;ν), with parameters ν (e.g., neural-network weights and biases). To make the problem tractable, the mapping is usually considered locally in the horizontal, mapping column states ζ(zi,t) at discrete levels zi (for i=1,,Nz) to parameterized flux divergences F^j(ζ(zi,t);ν)) at levels zj. This is achieved by using the column state ζ(zi,t) as input and the remainder Dq/Dt-S as output to learn a regression F^Dq/Dt-S+ϵ. The material derivative Dq/Dt is rolled out over time intervals typically spanning hours to days. The aim is to minimize the residual ϵ over parameters ν, typically using methods such as backpropagation that compute the gradients of a loss function with respect to the parameters ν.

This approach leverages the expressive capabilities of deep learning and has shown some promise, as evidenced in studies demonstrating that moist convection or ocean turbulence parameterizations can be effectively learned in this manner (e.g., O'Gorman and Dwyer2018; Rasp et al.2018; Gentine et al.2018; Bolton and Zanna2019; Yuval and O'Gorman2020; Yuval et al.2021; Zanna and Bolton2020; Wang et al.2022; Sane et al.2023). Error corrections to existing parameterizations have been learned in a similar manner (Watt-Meyer et al.2021; Bretherton et al.2022). However, focusing solely on minimizing the short-term residual ϵ presents several limitations:

  1. The learned parameterization F^ may not necessarily minimize the climate-relevant loss function (Eq. 1), which is concerned with longer-term statistics, as opposed to the shorter-term trajectories of states on which SL approaches have focused. Schirber et al. (2013) provide an example of how short-term optimization over trajectories can lead to no improvement or even the degradation of the longer-term climate statistics in a model.

  2. Supervised learning of the parameterization F^ is typically restricted to data generated computationally in higher-resolution simulations, restricting it to the emulation of imperfect models, because labeled parameterization output (Dq/Dt-S) is generally not available from Earth observations.

  3. The parameterization F^, typically learned for a multitude of processes jointly, usually does not generalize well out of the training distribution and is resolution dependent, necessitating training with a broad range of simulated climates (e.g., O'Gorman and Dwyer2018) and re-training whenever the resolution is changed.

  4. Climate models incorporating the learned parameterization F^ often struggle with conserving essential quantities such as energy and exhibit instabilities during extended integrations (e.g., Brenowitz and Bretherton2018) because minimizing the short-term residual ϵ does not inherently ensure conservation or stability.

Some of the challenges associated with SL approaches in climate modeling can be addressed or alleviated. For example, longer rollouts of the material derivative Dq/Dt have been shown to reduce instabilities when integrating the learned parameterization F^ into a climate model (Brenowitz et al.2020), and constraints on the loss function may be used to enforce conservation laws (Beucler et al.2021). Additionally, the issue of resolution dependence in the learned parameterization F^ can be tackled by shifting from learning a finite-dimensional discrete mapping between model grid points to learning an infinite-dimensional operator. Such operators map between function spaces; they would effectively represent the atmospheric or oceanic column state as a continuous function, rather than as a set of discrete points (e.g., Nelsen and Stuart2021; Kovachki et al.2023). This approach allows for a more flexible representation of the underlying physical processes that is potentially adaptable to different resolutions without the need for retraining.

An alternative approach that avoids the restrictions of SL views learning parameterizations F^ and ML parameters ν within them as an inverse problem (Kovachki and Stuart2019), minimizing a climate-relevant loss function (Eq. 1) that focuses on statistics. However, this loss function is based on data that are only indirectly informative about the process being modeled; that is, the parameterization F^ influences the climate model output G(t;θ,λ,ν;ζ0) in the loss function only indirectly, through the complex and nonlinear interactions of other components in the climate model (Schneider et al.2017a). In this case, calculating gradients of the loss function with respect to the parameters ν would involve differentiation through the model 𝓖, which may not be differentiable (e.g., at discontinuous phase transitions) or may be difficult to differentiate.

In this context, learning about the parameterization F^ is no longer a straightforward regression of outputs onto inputs. However, this does not preclude the inclusion of parameter-rich and expressive deep-learning models within the parameterization. The parameters ν, together with the process-model parameters θ, can be estimated by minimizing a climate-relevant loss function using derivative-free ensemble Kalman inversion techniques, which are proven to scale well to high-dimensional problems, can be used with models that are not differentiable or are difficult to differentiate, and have smoothing properties that are desirable for chaotic dynamical systems such as the climate system (Kovachki and Stuart2019). As in many inverse problems, minimizing the loss function is often an ill-posed problem with many possible solutions which may be sensitive to small changes in the data (Tarantola1987; Hansen1998; Iglesias et al.2013). This requires regularization, for example, through the use of prior information on the parameters ν to select “good” parameter sets among the many that may minimize the loss. Such prior information may be obtained, for example, by pre-training on computationally generated data, which can be more detailed than observational data (Lopez-Gomez et al.2022). This inverse problem approach, augmented with prior information, not only makes it possible to learn from heterogeneous and noisy Earth observations but also allows for the quantification of uncertainties (e.g., Cleary et al.2021; Huang et al.2022). Stochastic elements can also be incorporated into the parameterizations (e.g., Schneider et al.2021b), which, as discussed in Sect. 3, is particularly relevant in the absence of clear scale separation, offering a more principled and realistic representation of climate processes.

This expanded perspective on incorporating AI methods into climate models broadens the scope of where these methods can be effectively integrated. It moves beyond automatically calibrating scalar parameters in climate models (Zhang et al.2015; Couvreux et al.2021; Hourdin et al.2023) to encompass higher-dimensional parameter spaces, including those relevant to deep-learning approaches. Rather than focusing solely on areas where SL is feasible, the emphasis shifts to where AI can have the most significant impact. The key challenge in climate modeling and prediction is minimizing the loss function (Eq. 1) for unobserved climate statistics, especially in global-warming scenarios where these statistics may fall outside the range of observed data. While traditional methods such as withholding part of the data for cross-validation remain essential, they fall short in ensuring model generalization beyond the training dataset. This limitation becomes evident when attempting tasks such as predicting rapid adjustments in cloud cover (Gregory and Webb2008; Sherwood et al.2015; Bretherton2015) or changes in the photosynthetic productivity of the biosphere (Luo2007) in response to increased CO2 concentrations, given the limited range of CO2 concentration variations and their inextricable correlation with temperature variations in recent observations. Embedding AI-driven closures within process-based parameterizations rooted in conservation laws can help in obtaining models that generalize out of the observed climate distribution.

A valuable insight emerges from the success of similarity theories, such as the Monin–Obukhov similarity theory discussed in Sect. 3, which generalized effectively from a few specific measurements to a wide range of global conditions. Similarly, AI methods may be most effective when used to learn universal functions of relevant nondimensional variable groups: functions that likely remain invariant across different climates and are well sampled in current climate conditions, including the seasonal cycle, whose amplitude in many quantities exceeds the climate change signals we expect for the coming decades (Schneider et al.2021a). If such learning minimizes a loss (Eq. 1) in an online setting – that is, while the learned functions are integrated into the large-scale model – it is more likely to lead to long-term stable models, because the existence of the statistics in the loss function implies that the model is stable over the timescales over which the statistics are aggregated. For example, rather than learning the convective-flux divergence F^ for water or energy holistically, it is likely beneficial to focus on learning corrections to process-based parameterizations or key unknown functions such as entrainment and detrainment rates within the coarse-grained conservation laws for water and energy, embedded online in a larger forward model. Lopez-Gomez et al. (2022) demonstrated that this approach can successfully learn parameterizations that generalize well to warmer climates not encountered during training; related results with gradient-based online learning approaches are emerging for turbulence closure models (Shankar et al.2024). An ancillary benefit is that the quantification of uncertainties becomes more straightforward, and the parameterizations remain interpretable, facilitating the investigation of mechanisms, for example, of cloud feedbacks and the differential effects of changing greenhouse gas concentrations and warming in them. Models for structural errors can similarly be incorporated where the errors are actually made – within the parameterizations of unresolvable small-scale processes (Levine and Stuart2022; Wu et al.2024), rather than in the space of the model output (Kennedy and O'Hagan2001).

Therefore, we advocate for an approach that leverages our extensive knowledge of conservation laws, expressed as partial differential equations, and enhances it with AI methods to learn about closure functions in parameterizations where reductionist first-principle approaches fall short. The central challenge is to find a balance: using first principles to encode system knowledge and conservation laws for generalization and interpretability while avoiding overly rigid constraints that limit the model's adaptability to diverse datasets. This balance will vary across different components of the climate system, and finding it requires domain expertise. For instance, first-principle modeling has proven less effective than data-driven approaches for river flows and snowpack thickness, where systematic coarse-graining is challenging and observed spatial and temporal variations plausibly sample future scenarios (Kratzert et al.2018, 2019; Nearing et al.2021; Kratzert et al.2023; Moshe et al.2020; Charbonneau et al.2023; Nearing et al.2024). In contrast, the modeling of phenomena such as turbulence, convection, and clouds, including their microphysics, may benefit more from reductionist process-informed modeling because, among other reasons, rapid radiative adjustments impact clouds and cannot be learned solely from data; however, spatial and temporal variations in data may sample climate-invariant closure functions appearing in them well. High-resolution simulations, for example, of the dynamics of clouds and convection (Hourdin et al.2021; Shen et al.2022) or of ocean turbulence (Wagner et al.2024), Lagrangian particle-based simulations of cloud microphysics (Shima et al.2009; Morrison et al.2020; Azimi et al.2024), and laboratory data provide increasingly rich datasets for the pre-training of individual parameterizations and closure functions within them. Pre-training can occur in a compartmentalized manner, process by process, before jointly fine-tuning all processes with observations. Striking the right balance between first-principle and data-driven approaches is crucial for developing climate models that are physically grounded and trustworthy for predictions beyond observed climates yet flexible enough to integrate a wide range of data, leading to more accurate and reliable predictions.

6 A balanced path forward

Climate models, as a form of techne, aim to provide the most accurate and reliable predictions of how the climate system's statistics will change under unobserved conditions, such as increased greenhouse gas concentrations or changes in aerosol loadings. Improving climate models is urgently necessary for proactive and effective adaptation to the coming climate changes. However, current models fall short in accuracy and reliability (Fiedler et al.2021; President's Council of Advisors on Science and Technology2023), as evidenced by their still significant errors in simulating observed climate statistics (Fig. 1).

Progress in climate modeling has been gradual and achieved primarily through increasing the resolution and refining process-based parameterizations for small, unresolvable scales. Yet, these approaches alone or in combination seem unlikely to produce a significant leap in model accuracy and reliability. The complexity of the climate system limits the effectiveness of reductionist approaches in developing process-based parameterizations. Additionally, while higher resolution is beneficial, it is no panacea. At any resolution reachable in the foreseeable future, many aspects, such as large portions of the atmosphere's and oceans' vertical motion and finer scales such as those controlling cloud microphysics, will remain unresolvable.

AI tools, in their broadest sense, hold promise for breakthroughs due to their capacity to learn from high-resolution simulations and from the extensive array of Earth observations available. However, they cannot operate in isolation. Climate change prediction is an archetypal out-of-distribution prediction challenge. It is difficult to envision how an unsupervised AI system could learn the effects of unseen greenhouse gas concentrations on aspects such as cloud cover or the biosphere's photosynthetic productivity using only higher-resolution simulations and current and recent past observations. The limited range of greenhouse gas variations is closely correlated with temperature changes in recent observations, making it challenging to isolate their individual effects, which need to be predicted. Unlike weather forecasting, where short-term predictions can be validated daily and long-term stability and conservation properties of simulations are less critical, climate prediction lacks the luxury of immediate validation. Conservation, long-term stability in an “infinite forecast”, and reliable generalization beyond observed climate states are essential. Trust in climate predictions and the absence of immediate validation additionally require models to be interpretable and uncertainties in predictions to be quantified.

Figure 4A balanced approach capitalizing on all tools at our disposal is the most promising path toward accurate, interpretable, and trustworthy climate predictions and projections that can inform decision-making.


Therefore, a balanced approach that capitalizes on the strengths of all three dimensions – advancing process-based parameterizations, maximizing resolution while allowing ensembles of simulations, and harnessing AI tools to incorporate data-driven models where reductionism reaches its limits – is, in our view, the most promising path forward (Schneider et al.2021a) (Fig. 4). In situations where we have well-defined equations of motion and can systematically coarse-grain them, AI may be optimally employed to learn data-driven yet climate-invariant closure functions of nondimensional variable groups arising within coarse-grained equations. This approach is akin to how data have been used to close Monin–Obukhov similarity theory for the atmospheric surface layer. Conversely, in situations where first-principle modeling and systematic coarse-graining are less effective, but where spatial and temporal climate variations – particularly the seasonal cycle – may plausibly represent future climate states, more direct data-driven models could prove more fruitful. This may be particularly relevant for various aspects of land surface modeling, such as snow, vegetation, and river models.

To catalyze advances in climate modeling, we advocate for the establishment of well-defined competitive challenge problems, employing open benchmark data, shared code, and clear quantitative success metrics. Such challenge problems can foster innovation in climate process modeling, as they have in other areas, such as machine vision, natural-language processing, and protein folding (Donoho2023). For example, benchmark challenges for cloud parameterizations could leverage libraries of high-resolution simulations, employing current-climate simulations for training and altered climate conditions for evaluation (e.g., Hourdin et al.2021; Shen et al.2022; Lopez-Gomez et al.2022; Yu et al.2023). Other benchmark challenges may focus on the seasonal cycle of land carbon uptake, evapotranspiration, snow cover, or river discharge, using a subset of the available data for model training while reserving other regional datasets for evaluation. Benchmarking can also include retrospective analysis of emergent properties such as climate trends in historical simulations, as long as the metrics evaluated were not used in model calibration. Designing such structured challenges can drive innovation, help determine what balance between process-based and data-driven methods is most successful, and lead to more accurate and reliable climate models.

Moreover, to engender trust in climate predictions, it is imperative to develop and maintain carefully designed open-source software, which should be accompanied by rigorous unit and integration tests. This approach ensures transparency, reproducibility, and replicability (National Academies of Sciences, Engineering, and Medicine2019), enabling the scientific community and stakeholders to scrutinize and validate the models' predictions. Trustworthy software infrastructure is a cornerstone for building confidence in climate models and their predictions, especially as we integrate more complex data-driven components into modeling frameworks.

Ultimately, the utility of climate predictions for decision-making hinges on their trustworthiness and their ability to explore a broad range of possible climate outcomes through ensembles (Deser et al.2020; Bevacqua et al.2023). A balanced approach that is grounded in decades of accumulated intellectual capital, rigorous approximations, and carefully designed software is likely to foster such trust. This approach can enable a clear tracing of the causal chain leading to possible climate changes, allowing for interpretation and scrutiny in line with centuries-old scientific traditions. If successful, this strategy may eventually also narrow the gap between episteme and techne in climate modeling. It may deepen our understanding of the climate system's complexities, allowing us to investigate models that integrate data-driven components and to use them to shed light on very different past climates, such as at the Last Glacial Maximum or the Eocene hothouse climates. Such a convergence would mark a significant advance in both the science and practical application of climate modeling.

Code and data availability

The data and code needed to produce Fig. 1 are available at (Wills and Schneider2024), and those needed to produce Fig. 3 are available at (Schneider2024).

Author contributions

TS: conceptualization, formal analysis, visualization, writing (original draft); LRL: conceptualization, visualization, writing (editing and review); RCJW: formal analysis, visualization, writing (editing and review).

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Special issue statement

This article is part of the special issue “20 years of Atmospheric Chemistry and Physics”. It is not associated with a conference.


We thank Joern Callies and Ulrich Schumann for providing the data for Fig. 3 and for their valuable discussions; Peter Caldwell, Yi-Fan Chen, Raffaele Ferrari, Nadir Jeevanjee, Thomas Müller, Fei Sha, and R. Saravanan for insightful comments on drafts; and Duan-Heng Chang for identifying a critical typo in a previous version of the analysis script for Fig. 1.

Financial support

The research on which this essay draws is supported by Schmidt Sciences, LLC, the US National Science Foundation (grant no. AGS-1835860), and the Swiss National Science Foundation (award no. PCEFP2_203376).

Review statement

This paper was edited by Ken Carslaw and Peter Haynes and reviewed by Peter Caldwell and one anonymous referee.


Adler, R. F., Sapiano, M. R. P., Huffman, G. J., Wang, J.-J., Gu, G., Bolvin, D., Chiu, L., Schneider, U., Becker, A., Nelkin, E., Xie, P., Ferraro, R., and Shin, D.-B.: The Global Precipitation Climatology Project (GPCP) monthly analysis (new version 2.3) and a review of 2017 global precipitation, Atmosphere, 9, 138,, 2018. a, b

Anber, U. M., Giangrande, S. E., Donner, L. J., and Jensen, M. P.: Updraft constraints on entrainment: insights from Amazonian deep convection, J. Atmos. Sci., 76, 2429–2442,, 2019. a

Arakawa, A. and Schubert, W. H.: Interaction of a cumulus cloud ensemble with the large-scale environment. Part I, J. Atmos. Sci., 31, 674–701, 1974. a, b, c

Arakawa, A. and Wu, C.-M.: A unified representation of deep moist convection in numerical modeling of the atmosphere: Part I, J. Atmos. Sci., 70, 1977–1992,, 2013. a

Arakawa, A., Jung, J.-H., and Wu, C.-M.: Toward unification of the multiscale modeling of the atmosphere, Atmos. Chem. Phys., 11, 3731–3742,, 2011. a

Azimi, S., Jaruga, A., de Jong, E. K., Arabas, S., and Schneider, T.: Training warm-rain bulk microphysics schemes using super-droplet simulations, J. Adv. Model. Earth Sys., in review, 2024. a

Balaji, V., Couvreux, F., Deshayes, J., Gautrais, J., Hourdin, F., and Rio, C.: Are general circulation models obsolete?, P. Natl. Acad. Sci. USA, 119, e2202075119,, 2022. a

Bauer, P., Stevens, B., and Hazeleger, W.: A digital twin of Earth for the green transition, Nat. Clim. Change, 11, 80–83,, 2021. a, b, c

Beucler, T., Pritchard, M., Rasp, S., Ott, J., Baldi, P., and Gentine, P.: Enforcing analytic constraints in neural networks emulating physical systems, Phys. Rev. Lett., 126, 098302,, 2021. a

Beucler, T., Pritchard, M., Yuval, J., Gupta, A., Peng, L., Rasp, S., Ahmed, F., O'Gorman, P. A., Neelin, J. D., Lutsko, N. J., and Pritchard, M.: Climate-invariant machine learning, Sci. Adv., 10, eadj7250,, 2024. a

Bevacqua, E., Suarez-Gutierrez, L., Jézéquel, A., Lehner, F., Vrac, M., Yiou, P., and Zscheischler, J.: Advancing research on compound weather and climate events via large ensemble model simulations, Nat. Commun., 14, 2145,, 2023. a, b

Bock, L., Lauer, A., Schlund, M., Barreiro, M., Bellouin, N., Jones, C., Meehl, G., Predoi, V., Roberts, M., and Eyring, V.: Quantifying progress across different CMIP phases with the ESMValTool, J. Geophys. Res.-Atmos., 125, e2019JD032321,, 2020. a

Bolton, T. and Zanna, L.: Applications of deep learning to ocean data inference and subgrid parameterization, J. Adv. Model. Earth Sys., 11, 376–399,, 2019. a

Bony, S. and Dufresne, J. L.: Marine boundary layer clouds at the heart of tropical cloud feedback uncertainties in climate models, Geophys. Res. Lett., 32, L20806,, 2005. a, b

Brenowitz, N. D. and Bretherton, C. S.: Prognostic validation of a neural network unified physics parameterization, Geophys. Res. Lett., 45, 6289–6298,, 2018. a

Brenowitz, N. D., Beucler, T., Pritchard, M., and Bretherton, C. S.: Interpreting and stabilizing machine-learning parametrizations of convection, J. Atmos. Sci., 77, 4357–4375,, 2020. a

Bretherton, C. S.: Insights into low-latitude cloud feedbacks from high-resolution models, Philos. T. R. Soc. Lond. A, 373, 20140415,, 2015. a, b

Bretherton, C. S., Henn, B., Kwa, A., Brenowitz, N. D., Watt-Meyer, O., McGibbon, J., Perkins, W. A., Clark, S. K., and Harris, L.: Correcting coarse-grid weather and climate models by machine learning from global storm-resolving simulations, J. Adv. Model. Earth Sys., 14, e2021MS002794,, 2022. a, b, c

Brient, F. and Schneider, T.: Constraints on climate sensitivity from space-based measurements of low-cloud reflection, J. Clim., 29, 5821–5835,, 2016. a

Businger, J. A., Wyngaard, J. C., Izumi, Y., and Bradley, E. F.: Flux-profile relationships in the atmospheric surface layer, J. Atmos. Sci., 28, 181–189, 1971. a

Caldwell, P. M., Bretherton, C. S., Zelinka, M. D., Klein, S. A., Santer, B. D., and Sanderson, B. M.: Statistical significance of climate sensitivity predictors obtained by data mining, Geophys. Res. Lett., 41, 1803–1808,, 2014. a

Caldwell, P. M., Zelinka, M. D., and Klein, S. A.: Evaluating emergent constraints on equilibrium climate sensitivity, J. Clim., 31, 3921–3942,, 2018. a, b

Callies, J., Ferrari, R., and Bühler, O.: Transition from geostrophic turbulence to inertia–gravity waves in the atmospheric energy spectrum, P. Natl. Acad. Sci. USA, 111, 17033–17038,, 2014. a, b

Cess, R. D., Potter, G. L., Blanchet, J. P., Boer, G. J., Ghan, S. J., Kiehl, J. T., Le Treut, H., Li, Z.-X., Liang, X.-Z., Mitchell, J. F. B., Morcrette, J.-J., Randall, D. A., Riches, M. R., Roeckner, E., Schlese, U., Slingo, A., Taylor, K. E., Washington, W. M., Wetherald, R. T., and Yagai, I.: Interpretation of cloud-climate feedback as produced by 14 atmospheric general circulation models, Science, 245, 513–516, 1989. a

Chang, P., Zhang, S., Danabasoglu, G., Yeager, S. G., Fu, H.,Wang, H., Castruccio, F. S., Chen, Y., Edwards, J., Fu, D., Jia, Y., Laurindo, L. C., Liu, X., Rosenbloom, N., Small, R. J., Xu, G., Zeng, Y., Zhang, Q., Bacmeister, J., Bailey, D. A., Duan, X., DuVivier, A. K., Li, D., Li, Y., Neale, R., Stössel, A., Wang, L., Zhuang, Y., Baker, A., Bates, S., Dennis, J., Diao, X., Gan, B., Gopal, A., Jia, D., Jing, Z., Ma, X., Saravanan, R., Strand, W. G., Tao, J., Yang, H., Wang, X., Wei, Z., andWu, L.: An unprecedented set of high-resolution Earth system simulations for understanding multiscale interactions in climate variability and change, J. Adv. Model. Earth Sys., 12, e2020MS002298,, 2020. a

Chantry, M., Christensen, H., Dueben, P., and Palmer, T.: Opportunities and challenges for machine learning in weather and climate modelling: hard, medium and soft AI, Philos. T. R. Soc. A, 379, 20200083,, 2021. a

Charbonneau, A., Deck, K., and Schneider, T.: A physics-constrained neural differential equation for data-driven seasonal snowpack forecasting, Artificial Intelligence for the Earth Systems, in review, 2023. a

Christopoulos, C. and Schneider, T.: Assessing biases and climate implications of the diurnal precipitation cycle in climate models, Geophys. Res. Lett., 48, e2021GL093017,, 2021. a

Clark, P., Roberts, N., Lean, H., Ballard, S. P., and Charlton-Perez, C.: Convection-permitting models: a step-change in rainfall forecasting, Meteor. Appl., 23, 165–181,, 2016. a

Cleary, E., Garbuno-Inigo, A., Lan, S., Schneider, T., and Stuart, A. M.: Calibrate, emulate, sample, J. Comp. Phys., 424, 109716,, 2021. a

Cohen, Y., Lopez-Gomez, I., Jaruga, A., He, J., Kaul, C. M., and Schneider, T.: Unified entrainment and detrainment closures for extended eddy-diffusivity mass-flux schemes, J. Adv. Model. Earth Sys., 12, e2020MS002162,, 2020. a, b, c

Couvreux, F., Hourdin, F., Williamson, D., Roehrig, R., Volodina, V., Villefranque, N., Rio, C., Audouin, O., Salter, J., Bazile, E., Brient, F., Favot, F., Honnert, R., Lefebvre, M.-P., Madeleine, J.-B., Rodier, Q., and Xu, W.: Process-based climate model development harnessing machine learning: I. A calibration tool for parameterization improvement, J. Adv. Model. Earth Sys., 13, e2020MS002217,, 2021. a

Danabasoglu, G., Lamarque, J.-F., Bacmeister, J., Bailey, D. A., DuVivier, A. K., Edwards, J., Emmons, L. K., Fasullo, J., Garcia, R., Gettelman, A., Hannay, C., Holland, M. M., Large, W. G., Lauritzen, P. H., Lawrence, D. M., Lenaerts, J. T. M., Lindsay, K., Lipscomb, W. H., Mills, M. J., Neale, R., Oleson, K. W., Otto-Bliesner, B., Phillips, A. S., Sacks, W., Tilmes, S., van Kampenhout, L., Vertenstein, M., Bertini, A., Dennis, J., Deser, C., Fischer, C., Fox-Kemper, B., Kay, J. E., Kinnison, D., Kushner, P. J., Larson, V. E., Long, M. C., Mickelson, S., Moore, J. K., Nienhouse, E., Polvani, L., Rasch, P. J., and Strand, W. G.: The Community Earth System Model Version 2 (CESM2), J. Adv. Model. Earth Sys., 12, e2019MS001916,, 2020. a

de Rooy, W. C., Bechtold, P., Fröhlich, K., Hohenegger, C., Jonker, H., Mironov, D., Siebesma, A. P., Teixeira, J., and Yano, J.-I.: Entrainment and detrainment in cumulus convection: an overview, Q. J. Roy. Meteorol. Soc., 139, 1–19,, 2013. a

Deser, C., Lehner, F., Rodgers, K. B., Ault, T., Delworth, T. L., DiNezio, P. N., Fiore, A., Frankignoul, C., Fyfe, J. C., Horton, D. E.,Kay, J. E., Knutti, R., Lovenduski, N. S., Marotzke, J., McKinnon, K. A., Minobe, S., Randerson, J., Screen, J. A., Simpson, I. R., and Ting, M.: Insights from Earth system model initial-condition large ensembles and future prospects, Nat. Clim. Change, 10, 277–286, 2020. a, b

Devine, G. M., Carslaw, K. S., Parker, D. J., and Petch, J. C.: The influence of subgrid surface-layer variability on vertical transport of a chemical species in a convective environment, Geophys. Res. Lett., 33, L15807,, 2006. a

Dewan, E. M.: Stratospheric wave spectra resembling turbulence, Science, 204, 832–835, 1979. a

Dong, L., Leung, L. R., Lu, J., and Song, F.: Double-ITCZ as an emergent constraint for future precipitation over Mediterranean climate regions in the North Hemisphere, Geophys. Res. Lett., 48, e2020GL091569,, 2021. a

Donoho, D.: Data science at the singularity, arXiv:2310.00865v1, (last access: 17 June 2023), 2023. a

Dunbar, O. R. A., Garbuno-Inigo, A., Schneider, T., and Stuart, A. M.: Calibration and uncertainty quantification of convective parameters in an idealized GCM, J. Adv. Model. Earth Sys., 13, e2020MS002454,, 2021. a, b

Edwards, P. N.: A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming, MIT Press, ISBN: 9780262518635, 2010. a

Fiedler, T., Pitman, A. J., Mackenzie, K., Wood, N., Jakob, C., and Perkins-Kirkpatrick, S. E.: Business risk and the emergence of climate analytics, Nat. Clim. Change, 11, 87–94,, 2021. a, b

Firl, G. J. and Randall, D. A.: Fitting and analyzing LES using multiple trivariate Gaussians, J. Atmos. Sci., 72, 1094–1116,, 2015. a

Foken, T.: 50 Years of the Monin-Obukhov similarity theory, Bound.-Lay. Meteorol., 119, 431–447,, 2006. a

Gentine, P., Pritchard, M., Rasp, S., Reinaudi, G., and Yacalis, G.: Could machine learning break the convection parameterization deadlock?, Geophys. Res. Lett., 45, 5742–5751,, 2018. a, b, c, d

Gettelman, A., Morrison, H., and Thompson, G.: Cloud microphysics across scales for weather and climate, in: Current Trends in the Representation of Physical Processes in Weather and Climate Models, edited by: Randall, D. A., Srinivasan, J., Nanjundiah, R. A., and Mukhopadhyay, P., 71–94, Springer,, 2019. a

Golaz, J.-C., Larson, V. E., and Cotton, W. R.: A PDF-based model for boundary layer clouds, Part I: Method and model description, J. Atmos. Sci., 59, 3540–3551, 2002. a

Gregory, J. and Webb, M.: Tropospheric adjustment induces a cloud component in CO2 forcing, J. Cim., 21, 58–71, 2008. a, b

Gross, M.,Wan, H., Rasch, P. J., Caldwell, P. M.,Williamson, D. L., Klocke, D., Jablonowski, C., Thatcher, D. R., Wood, N., Cullen, M., Beare, B., and Willett, M., Lemarié, F., Blayo, E., Malardel, S., Termonia, P., Gassmann, A., Lauritzen, P. H., Johansen, H., Zarzycki, C. M., Sakaguchi, K., and Leung, R.: Physics–dynamics coupling in weather, climate, and Earth system models: Challenges and recent progress, Mon. Weather Rev., 146, 3505–3544,, 2018. a

Guo, H., Golaz, J.-C., Donner, L. J., Wyman, B., Zhao, M., and Ginoux, P.: CLUBB as a unified cloud parameterization: Opportunities and challenges, Geophys. Res. Lett., 42, 4540–4547,, 2015. a

Gutjahr, O., Putrasahan, D., Lohmann, K., Jungclaus, J. H., von Storch, J.-S., Brüggemann, N., Haak, H., and Stössel, A.: Max Planck Institute Earth System Model (MPI-ESM1.2) for the High-Resolution Model Intercomparison Project (HighResMIP), Geosci. Model Dev., 12, 3241–3281,, 2019. a

Haarsma, R. J., Roberts, M. J., Vidale, P. L., Senior, C. A., Bellucci, A., Bao, Q., Chang, P., Corti, S., Fučkar, N. S., Guemas, V., von Hardenberg, J., Hazeleger, W., Kodama, C., Koenigk, T., Leung, L. R., Lu, J., Luo, J.-J., Mao, J., Mizielinski, M. S., Mizuta, R., Nobre, P., Satoh, M., Scoccimarro, E., Semmler, T., Small, J., and von Storch, J.-S.: High Resolution Model Intercomparison Project (HighResMIP v1.0) for CMIP6, Geosci. Model Dev., 9, 4185–4208,, 2016. a, b

Hall, A., Cox, P., Huntingford, C., and Klein, S.: Progressing emergent constraints on future climate change, Nat. Clim. Change, 4, 269–278, 2019. a

Hansen, P. C.: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM Monogr. on Mathematical Modeling and Computation, Society for Industrial and Applied Mathematics, Philadelphia, PA, ISBN: 0898714036, 1998. a

Harlaß, J., Latif, M., and Park, W.: Improving climate model simulation of tropical Atlantic sea surface temperature: The importance of enhanced vertical atmosphere model resolution, Geophys. Res. Lett., 42, 2401–2408,, 2015. a

Held, I. M.: The gap between simulation and understanding in climate modeling, Bull. Am. Meteorol. Soc., 86, 1609–1614,, 2005. a

Hohenegger, C., Korn, P., Linardakis, L., Redler, R., Schnur, R., Adamidis, P., Bao, J., Bastin, S., Behravesh, M., Bergemann, M., Biercamp, J., Bockelmann, H., Brokopf, R., Brüggemann, N., Casaroli, L., Chegini, F., Datseris, G., Esch, M., George, G., Giorgetta, M., Gutjahr, O., Haak, H., Hanke, M., Ilyina, T., Jahns, T., Jungclaus, J., Kern, M., Klocke, D., Kluft, L., Kölling, T., Kornblueh, L., Kosukhin, S., Kroll, C., Lee, J., Mauritsen, T., Mehlmann, C., Mieslinger, T., Naumann, A. K., Paccini, L., Peinado, A., Praturi, D. S., Putrasahan, D., Rast, S., Riddick, T., Roeber, N., Schmidt, H., Schulzweida, U., Schütte, F., Segura, H., Shevchenko, R., Singh, V., Specht, M., Stephan, C. C., von Storch, J.-S., Vogel, R., Wengel, C., Winkler, M., Ziemen, F., Marotzke, J., and Stevens, B.: ICON-Sapphire: simulating the components of the Earth system and their interactions at kilometer and subkilometer scales, Geosci. Model Dev., 16, 779–811,, 2023. a, b

Hourdin, F., Williamson, D., Rio, C., Couvreux, F., Roehrig, R., Villefranque, N., Musat, I., Diallo, F. B., Fairhead, L., and Volodina, V.: Process-based climate model development harnessing machine learning: II. Model calibration from single column to global, J. Adv. Model. Earth Sys., 13, e2020MS002225,, 2021. a, b

Hourdin, F., Ferster, B., Deshayes, J., Mignot, J., Musat, I., and Williamson, D.: Toward machine-assisted tuning avoiding the underestimation of uncertainty in climate change projections, Sci. Adv., 9, eadf2758,, 2023. a

Howland, M. F., Dunbar, O. R. A., and Schneider, T.: Parameter Uncertainty Quantification in an Idealized GCM With a Seasonal Cycle, J. Adv. Model. Earth Sys., 14, e2021MS002735,, 2022. a

Huang, D. Z., Schneider, T., and Stuart, A. M.: Iterated Kalman methodology for inverse problems, J. Comp. Phys., 463, 111262,, 2022. a

Iglesias, M. A., Law, K. J. H., and Stuart, A. M.: Ensemble Kalman methods for inverse problems, Inverse Problems, 29, 045001,, 2013. a, b

Intergovernmental Panel on Climate Change: Climate Change 2021: The Physical Science Basis, Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge University Press, New York, NY, ISBN: 1009157884, 2021. a, b

Irrgang, C., Boers, N., Sonnewald, M., Barnes, E. A., Kadow, C., Staneva, J., and Saynisch-Wagner, J.: Towards neural Earth system modelling by integrating artificial intelligence in Earth system science, Nat. Mach. Intell., 3, 667–674,, 2022. a

Jeevanjee, N., Hassanzadeh, P., Hill, S., and Sheshadri, A.: A perspective on climate model hierarchies, J. Adv. Model. Earth Sys., 9, 1760–1771,, 2017. a

Kawai, H., Yukimoto, S., Koshiro, T., Oshima, N., Tanaka, T., Yoshimura, H., and Nagasawa, R.: Significant improvement of cloud representation in the global climate model MRI-ESM2, Geosci. Model Dev., 12, 2875–2897,, 2019. a

Kennedy, M. C. and O'Hagan, A.: Bayesian calibration of computer models, J. Roy. Stat. Soc. B, 63, 425–464,, 2001. a, b

Klaver, R., Haarsma, R., Vidale, P. L., and Hazeleger, W.: Effective resolution in high resolution global atmospheric models for climate studies, Atmos. Sci. Lett., 21, e952,, 2020. a

Klein, S. A. and Hall, A.: Emergent constraints for cloud feedbacks, Curr. Clim. Change Rep., 1, 276–287,, 2015. a

Knight, C. G., Knight, S. H. E., Massey, N., and Allen, M. R.: Association of parameter, software, and hardware variation with large-scale behavior across 57,000 climate models, P. Natl. Acad. Sci. USA, 104, 12259–12264,, 2007. a

Koldunov, N., Kölling, T., Pedruzo-Bagazgoitia, X., Rackow, T., Redler, R., Sidorenko, D., Wieners, K.-H., and Ziemen, F. A.: nextGEMS: output of the model development cycle 3 simulations for ICON and IFS, World Data Center for Climate (WDCC) at DKRZ,, 2023. a

Kovachki, N., Li, Z., Liu, B., Azizzadenesheli, K., Bhattacharya, K., Stuart, A., and Anandkumar, A.: Neural operator: Learning maps between function spaces, J. Mach. Learn. Res., 24, 1–97, 2023. a

Kovachki, N. B. and Stuart, A. M.: Ensemble Kalman inversion: A derivative-free technique for machine learning tasks, Inverse Problems, 35, 095005,, 2019. a, b

Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022,, 2018. a

Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110,, 2019. a

Kratzert, F., Nearing, G., Addor, N., Erickson, T., Gauch, M., Gilon, O., Gudmundsson, L., Hassidim, A., Klotz, D., Nevo, S., Shalev, G., and Matias, Y.: Caravan – A global community dataset for large-sample hydrology, Nat. Sci. Data, 10, 61,, 2023. a

Lappen, C.-L. and Randall, D. A.: Toward a unified parameterization of the boundary layer and moist convection. Part I: A new type of mass-flux model, J. Atmos. Sci., 58, 2021–2036, 2001. a

Large, W. G., McWilliams, J. C., and Doney, S. C.: Oceanic vertical mixing: A review and a model with a nonlocal boundary layer parameterization, Rev. Geophys., 32, 363–403,, 1994. a

Levine, M. E. and Stuart, A. M.: A framework for machine learning of model error in dynamical systems, Comm. Amer. Math. Soc., 2, 283–344,, 2022. a, b

Lindzen, R. S. and Fox-Rabinovitz, M.: Consistent vertical and horizontal resolution, Mon. Weather Rev., 117, 2575–2583, 1989. a

Lock, A. P., Brown, A. R., Bush, M. R., Martin, G. M., and Smith, R. N. B.: A new boundary layer mixing scheme. Part I: Scheme description and single-column model tests, Mon. Weather Rev., 128, 3187–3199, 2000. a

Loeb, N. G., Wielicki, B. A., Doelling, D. R., Smith, G. L., Keyes, D. F., Kato, S., Manalo-Smith, N., and Wong, T.: Toward optimal closure of the Earth's top-of-atmosphere radiation budget, J. Clim., 22, 748–766,, 2009. a

Lopez-Gomez, I., Cohen, Y., He, J., Jaruga, A., and Schneider, T.: A generalized mixing length closure for eddy-diffusivity mass-flux schemes of turbulence and convection, J. Adv. Model. Earth Sys., 12, e2020MS002161,, 2020. a, b

Lopez-Gomez, I., Christopoulos, C., Ervik, H. L., Dunbar, O. R. A., Cohen, Y., and Schneider, T.: Training physics-based machine-learning parameterizations with gradient-free ensemble Kalman methods, J. Adv. Model. Earth Sys., 14, e2022MS003105,, 2022. a, b, c, d, e

Lorenz, E. N.: Climatic predictability, in: The Physical Basis of Climate and Climate Modelling: Report of the International Study Conference in Stockholm, 29 July–10 August 1974, Vol. 16, GARP Publication Series, 132–136, World Meteorological Organization, 1975. a

Lucarini, V. and Chekroun, M. D.: Theoretical tools for understanding the climate crisis from Hasselmann's programme and beyond, Nat. Rev. Phys., 5, 744–765,, 2023. a, b

Lucarini, V., Blender, R., Herbert, C., Ragone, F., Pascale, S., and Wouters, J.: Mathematical and physical ideas for climate science, Rev. Geophys., 52, 809–859,, 2014. a

Luo, Y.: Terrestrial carbon-cycle feedback to climate warming, Annu. Rev. Ecol. Evol. Syst., 38, 683–712, 2007. a

Majda, A. J., Timofeyev, I., and Vanden-Eijnden, E.: Systematic strategies for stochastic mode reduction in climate, J. Atmos. Sci., 60, 1705–1722, 2003. a

Manabe, S. and Strickler, R. F.: Thermal equilibrium of the atmosphere with a convective adjustment, J. Atmos. Sci., 21, 361–385, 1964. a

Manabe, S. and Wetherald, R. T.: Thermal equilibrium of the atmosphere with a given distribution of relative humidity, J. Atmos. Sci., 24, 241–259, 1967. a

Mansfield, L. A., Gupta, A., Burnett, A. C., Green, B., Wilka, C., and Sheshadri, A.: Updates on model hierarchies for understanding and simulating the climate system: A focus on data-informed methods and climate change impacts, J. Adv. Model. Earth Sy., 15, e2023MS003715,, 2023. a

Mellor, G. L. and Yamada, T.: Development of a turbulence closure model for geophysical fluid problems, Rev. Geophys., 20, 851–875, 1982. a

Morrison, H., van Lier-Walqui, M., Kumjian, M. R., and Prat, O. P.: A Bayesian Approach for Statistical–Physical Bulk Parameterization of Rain Microphysics, Part I: Scheme Description, J. Atmos. Sci., 77, 1019–1041, 2020. a

Moshe, Z., Metzger, A., Elidan, G., Kratzert, F., Nevo, S., and El-Yaniv, R.: HydroNets: Leveraging river structure for hydrologic modeling, in: ICLR Workshop on AI for Earth Sciences,, 2020. a

National Academies of Sciences, Engineering, and Medicine: Reproducibility and Replicability in Science, The National Academies Press, Washington, D.C.,, 2019. a

Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., Hassidim, A., Klotz, D., Kratzert, F., Metzger, A., Nevo, S., Pappenberger, F., Prudhomme, C., Shalev, G., Shenzis, S., Tekalign, T. Y., Weitzner, D., and Matias, Y.: Global prediction of extreme floods in ungauged watersheds, Nature, 627, 559–563,, 2024. a

Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., and Gupta, H. V.: What role does hydrological science play in the age of machine learning?, Water Resour. Res., 57, e2020WR028091,, 2021. a

Nelsen, N. H. and Stuart, A. M.: The random feature model for input-output maps between Banach spaces, SIAM J. Sci. Comput., 43, A3212–A3243,, 2021. a

O'Gorman, P. A. and Dwyer, J. G.: Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change, and extreme events, J. Adv. Model. Earth Sy., 10, 2548–2563,, 2018. a, b, c, d

Palmer, T. N.: A personal perspective on modelling the climate system, Proc. R. Soc. Lond. A, 472, 20150772,, 2016. a

Parry, R.: Episteme and Techne, in: The Stanford Encyclopedia of Philosophy, edited by: Zalta, E. N., Metaphysics Research Lab, Stanford University, ISSN 1095-5054, 2021. a, b

President's Council of Advisors on Science and Technology: Extreme weather risk in a changing climate: Enhancing prediction and protecting communities, Tech. rep., Executive Office of the President of the United States, Washington, D.C., (last access: 15 March 2024), 2023. a, b

Rackow, T., Pedruzo-Bagazgoitia, X., Becker, T., Milinski, S., Sandu, I., Aguridan, R., Bechtold, P., Beyer, S., Bidlot, J., Boussetta, S., Diamantakis, M., Dueben, P., Dutra, E., Forbes, R., Goessling, H. F., Hadade, I., Hegewald, J., Keeley, S., Kluft, L., Koldunov, N., Koldunov, A., Kölling, T., Kousal, J., Mogensen, K., Quintino, T., Polichtchouk, I., Sármány, D., Sidorenko, D., Streffing, J., Sützl, B., Takasuka, D., Tietsche, S., Valentini, M., Vannière, B., Wedi, N., Zampieri, L., and Ziemen, F.: Multi-year simulations at kilometre scale with the Integrated Forecasting System coupled to FESOM2.5/NEMOv3.4, EGUsphere, pp. 1–59,, 2024. a

Randall, D. A.: Beyond deadlock, Geophys. Res. Lett., 40, 5970–5976,, 2013. a, b

Randall, D. A., Khairoutdinov, M., Arakawa, A., and Grabowski, W.: Breaking the cloud parameterization deadlock, Bull. Am. Meteorol. Soc., 84, 1547–1564,, 2003. a

Rasp, S., Pritchard, M. S., and Gentine, P.: Deep learning to represent subgrid processes in climate models, P. Natl. Acad. Sci. USA, 115, 9684–9689,, 2018. a, b

Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., and Prabhat: Deep learning and process understanding for data-driven Earth system science, Nature, 566, 195–204,, 2019. a

Roberts, M. J., Baker, A., Blockley, E. W., Calvert, D., Coward, A., Hewitt, H. T., Jackson, L. C., Kuhlbrodt, T., Mathiot, P., Roberts, C. D., Schiemann, R., Seddon, J., Vannière, B., and Vidale, P. L.: Description of the resolution hierarchy of the global coupled HadGEM3-GC3.1 model as used in CMIP6 HighResMIP experiments, Geosci. Model Dev., 12, 4999–5028,, 2019. a

Romps, D. M.: A direct measure of entrainment, J. Atmos. Sci., 67, 1908–1927,, 2010. a

Ruelle, D.: General linear response formula in statistical mechanics, and the fluctuation-dissipation theorem far from equilibrium, Phys. Lett. A, 245, 220–224, 1998. a

Russo, F.: Techno-Scientific Practices: An Informational Approach, Rowman & Littlefield Publishers, ISBN: 978-1-78661-232-8, 2000. a, b

Sane, A., Reichl, B. G., Adcroft, A., and Zanna, L.: Parameterizing vertical mixing coefficients in the ocean surface boundary layer using neural networks, J. Adv. Model. Earth Sys., 15, e2023MS003890,, 2023. a

Savre, J. and Herzog, M.: A general description of entrainment in buoyant cloudy plumes including the effects of mixing induced evaporation, J. Atmos. Sci., 76, 479–496,, 2019. a

Schirber, S., Klocke, D., Pincus, R., Quaas, J., and Anderson, J. L.: Parameter estimation using data assimilation in an atmospheric general circulation model: From a perfect toward the real world, J. Adv. Model. Earth Sys., 5, 58–70,, 2013. a, b

Schneider, T.: Atmospheric spectra plots from “Opinion: Optimizing climate models with process-knowledge, resolution, and AI”, Atmos. Chem. Phys. 2024, CaltechDATA [data set],, 2024. a

Schneider, T., Lan, S., Stuart, A., and Teixeira, J.: Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations, Geophys. Res. Lett., 44, 12396–12417,, 2017a. a, b, c, d, e, f, g

Schneider, T., Teixeira, J., Bretherton, C. S., Brient, F., Pressel, K. G., Schär, C., and Siebesma, A. P.: Climate goals and computing the future of clouds, Nat. Clim. Change, 7, 3–5,, 2017b. a, b, c, d, e, f

Schneider, T., Kaul, C. M., and Pressel, K. G.: Possible climate transitions from breakup of stratocumulus decks under greenhouse warming, Nat. Geosci., 12, 163–167, 2019. a

Schneider, T., Kaul, C. M., and Pressel, K. G.: Solar geoengineering may not prevent strong warming from direct effects of CO2 on stratocumulus cloud cover, P. Natl. Acad. Sci. USA, 117, 30179–30185,, 2020. a

Schneider, T., Jeevanjee, N., and Socolow, R.: Accelerating progress in climate science, Phys. Today, 74, 44–51,, 2021a. a, b

Schneider, T., Stuart, A. M., and Wu, J.: Learning stochastic closures using ensemble Kalman inversion, Trans. Math. Appl., 5, 1–31,, 2021b. a

Schneider, T., Behera, S., Boccaletti, G., Deser, C., Emanuel, K., Ferrari, R., Leung, L. R., Lin, N., Müller, T., Navarra, A., Ndiaye, O., Stuart, A., Tribbia, J., and Yamagata, T.: Harnessing AI and computing to advance climate modeling and prediction, Nat. Clim. Change, 13, 887–889,, 2023. a, b

Schulzweida, U.: CDO User Guide (2.3.0),, 2023. a

Schumann, U.: The horizontal spectrum of vertical velocities near the tropopause from global to gravity wave scales, J. Atmos. Sci., 76, 3847–3862,, 2019. a, b, c, d

Shankar, V., Chakraborty, D., Viswanathan, V., and Maulik, R.: Differentiable turbulence: Closure as a PDE-constrained optimization, arXiv:2307.03683v2, 2024. a

Shen, Z., Sridhar, A., Tan, Z., Jaruga, A., and Schneider, T.: A library of large-eddy simulations forced by global climate models, J. Adv. Model. Earth Sys., 14, e2021MS002631,, 2022. a, b

Sherwood, S. C., Bony, S., Boucher, O., Bretherton, C., Forster, P. M., Gregory, J. M., and Stevens, B.: Adjustments in the forcing-feedback framework for understanding climate change, Bull. Am. Meteorol. Soc., 96, 217–228, 2015. a, b

Shima, S., Kusano, K., Kawano, A., Sugiyama, T., and Kawahara, S.: The super-droplet method for the numerical simulation of clouds and precipitation: a particle-based and probabilistic microphysics model coupled with a non-hydrostatic model, Q. J. R. Meteorol. Soc., 135, 1307–1320, 2009. a

Siebesma, A. P., Soares, P. M. M., and Teixeira, J.: A combined eddy-diffusivity mass-flux approach for the convective boundary layer, J. Atmos. Sci., 64, 1230–1248,, 2007. a

Silvestri, S., Wagner, G., Hill, C., Ardakani, M. R., Blaschke, J., Campin, J.-M., Churavy, V., Constantinou, N., Edelman, A., Marshall, J., Ramadhan, A., Souza, A., and Ferrari, R.: Oceananigans.jl: A model that achieves breakthrough resolution, memory and energy efficiency in global ocean simulations, submitted,, 2024. a

Skamarock, W. C.: Evaluating mesoscale NWP models using kinetic energy spectra, Mon. Weather Rev., 132, 3019–3032, 2004. a

Skamarock, W. C., Park, S.-H., Klemp, J. B., and Snyder, C.: Atmospheric kinetic energy spectra from global high-resolution nonhydrostatic simulations, J. Atmos. Sci., 71, 4369–4381,, 2014. a

Slingo, J., Bates, P., Bauer, P., Belcher, S., Palmer, T., Stephens, G., Stevens, B., Stocker, T., and Teutsch, G.: Ambitious partnership needed for reliable climate prediction, Nat. Clim. Change, 12, 499–503,, 2022. a, b, c

Smalley, M. A., Lebsock, M. D., and Teixeira, J.: Quantifying the impact of vertical resolution on the representation of marine boundary layer physics for global-scale models, Mon. Weather Rev., 151, 2977–2992,, 2023. a

Soares, P. M. M., Miranda, P. M. A., Siebesma, A. P., and Teixeira, J.: An eddy-diffusivity/mass-flux parametrization for dry and shallow cumulus convection, Q. J. Roy. Meteorol. Soc., 130, 3365–3384, 2004. a

Stainforth, D. A., Aina, T., Christensen, C., Collins, M., Faull, N., Frame, D. J., Kettleborough, J. A., Knight, S., Martin, A., Murphy, J. M., Piani, C., Sexton, D., Smith, L. A., Spicer, R. A., Thorpe, A. J., and Allen, M. R.: Uncertainty in predictions of the climate response to rising levels of greenhouse gases, Nature, 433, 403–406, 2005. a

Stephens, G. L.: Cloud feedbacks in the climate system: A critical review, J. Clim., 18, 237–273,, 2005. a, b

Stevens, B., Satoh, M., Auger, L., Biercamp, J., Bretherton, C. S., Chen, X., Düben, P., Judt, F., Khairoutdinov, M., Klocke, D., Kodama, C., Kornblueh, L., Lin, S.-J., Neumann, P., Putman, W. M., Röber, N., Shibuya, R., Vanniere, B., Vidale, P. L., Wedi, N., and Zhou, L.: DYAMOND: The DYnamics of the Atmospheric general circulation Modeled On Non-hydrostatic Domains, Prog. Earth Planet. Sci., 6, 61,, 2019. a, b

Subramanian, A., Ummenhofer, C., Giannini, A., Holland, M., Legg, S., Mahadevan, A., Perovich, D., Small, J., Teixeira, J., and Thompson, L.: Translating process understanding to improve climate models. A US CLIVAR White Paper, Report 2016-3, US CLIVAR,, 2016. a

Tan, Z., Kaul, C. M., Pressel, K. G., Cohen, Y., Schneider, T., and Teixeira, J.: An extended eddy-diffusivity mass-flux scheme for unified representation of subgrid-scale turbulence and convection, J. Adv. Model. Earth Sys., 10, 770–800,, 2018. a

Tao, C., Xie, S., Ma, H.-Y., Bechtold, P., Cui, Z., Vaillancourt, P. A., Van Weverberg, K., Wang, Y.-C., Wong, M., Yang, J., Zhang, G. J., Choi, I.-J., Tang, S., Wei, J., Wu, W.-Y., Zhang, M., Neelin, J. D., and Zeng, X.: Diurnal cycle of precipitation over the tropics and central United States: intercomparison of general circulation models, Q. J. Roy. Meteorol. Soc., 150, 911–936,, 2024. a

Tarantola, A.: Inverse Problem Theory: Methods for Data Fitting and Model Parameter Estimation, Elsevier, New York, ISBN: 0444427651, 1987. a

Thuburn, J., Weller, H., Vallis, G. K., Beare, R. J., and Whitall, M.: A framework for convection and boundary layer parameterization derived from conditional filtering, J. Atmos. Sci., 75, 965–981,, 2018. a

Tian, B. and Dong, X.: The double-ITCZ bias in CMIP3, CMIP5, and CMIP6 models based on annual mean precipitation, Geophys. Res. Lett., 47, e2020GL087232,, 2020. a

Vallis, G. K.: Atmospheric and Oceanic Fluid Dynamics: Fundamentals and Large-Scale Circulation, Cambridge University Press, Cambridge, UK, ISBN: 110706550X, 2006. a

VanZandt, T. E.: A universal spectrum of buoyancy waves in the atmosphere, Geophys. Res. Lett., 9, 575–578, 1982. a

Vial, J., Dufresne, J.-L., and Bony, S.: On the interpretation of inter-model spread in CMIP5 climate sensitivity estimates, Clim. Dynam., 41, 3339–3362,, 2013. a, b

Wagner, G. L., Hillier, A., Constantinou, N. C., Silvestri, S., Souza, A., Burns, K., Ramadhan, A., Hill, C., Campin, J.-M., Marshall, J., and Ferrari, R.: CATKE: a turbulent-kinetic-energy-based parameterization for ocean microturbulence with dynamic convective adjustment, arXiv:2306.13204, in review, 2024. a

Wang, P., Yuval, J., and O'Gorman, P. A.: Non-local parameterization of atmospheric subgrid processes with neural networks, J. Adv. Model. Earth Sys., 14, e2022MS002984,, 2022. a

Watson-Parris, D.: Machine learning for weather and climate are worlds apart, Philos. T. R. Soc. A, 379, 20200098,, 2021. a

Watt-Meyer, O., Brenowitz, N. D., Clark, S. K., Henn, B., Kwa, A., McGibbon, J., Perkins, W. A., and Bretherton, C. S.: Correcting weather and climate models by machine learning nudged historical simulations, Geophys. Res. Lett., 48, e2021GL092555,, 2021. a, b

Wedi, N. P.: Increasing horizontal resolution in numerical weather prediction and climate simulations: illusion or panacea?, Philos. T. R. Soc. A, 372, 20130289,, 2014. a

Wedi, N. P., Polichtchouk, I., Dueben, P., Anantharaj, V. G., Bauer, P., Boussetta, S., Browne, P., Deconinck, W., Gaudin, W., Hadade, I., Hatfield, S., Iffrig, O., Lopez, P., Maciel, P., Mueller, A., Saarinen, S., Sandu, I., Quintino, T., and Vitart, F.: A baseline for global weather and climate simulations at 1-km resolution, J. Adv. Model. Earth Sys., 12, e2020MS00219,, 2020. a, b

Wills, R. C. J. and Schneider, T.: Error analysis of climate models from CMIP3 through CMIP6, including AMIP and higher-resolution models, CaltechDATA [data set],, 2024. a

Wills, R. C. J., Battisti, D. S., Armour, K. C., Schneider, T., and Deser, C.: Pattern recognition methods to separate forced responses from internal variability in climate model ensembles and observations, J. Clim., 33, 8693–8719, 2020. a

Witek, M. L., Teixeira, J., and Matheou, G.: An integrated TKE-based eddy-diffusivity/mass-flux boundary layer closure for the dry convective boundary layer, J. Atmos. Sci., 68, 1526–1540, 2011. a

Wouters, J. and Lucarini, V.: Multi-level dynamical systems: Connecting the Ruelle response theory and the Mori-Zwanzig approach, J. Stat. Phys., 151, 850–860,, 2013. a

Wouters, J., Dolaptchiev, S. I., Lucarini, V., and Achatz, U.: Parameterization of stochastic multiscale triads, Nonlin. Processes Geophys., 23, 435–445,, 2016. a

Wu, J.-L., Levine, M. E., Schneider, T., and Stuart, A. M.: Learning about structural errors in models of complex dynamical systems, J. Comp. Phys., 513, 113157,, 2024. a, b, c

Yeo, K. and Romps, D. M.: Measurement of convective entrainment using Lagrangian particles, J. Atmos. Sci., 70, 266–277,, 2013. a

Yu, S., Hannah, W. M., Peng, L., Bhouri, M. A., Gupta, R., Lin, J., Lütjens, B., Will, J. C., Beucler, T., Harrop, B. E., Hillman, B. R., Jenney, A., Ferretti, S., Liu, N., Anandkumar, A., Brenowitz, N. D., Eyring, V., Geneva, N., Gentine, P., Mandt, S., Pathak, J., Subramaniam, A., Vondrick, C., Yu, R., Zanna, L., Zheng, T., Abernathey, R., Ahmed, F., Bader, D. C., Baldi, P., Barnes, E., Bretherton, C., Caldwell, P., Chuang, W., Han, Y., Huang, Y., Iglesias-Suarez, F., Jantre, S., Kashinath, K., Khairoutdinov, M., Kurth, T., Lutsko, N., Ma, P.-L., Mooers, G., Neelin, J. D., Randall, D., Shamekh, S., Taylor, M. A., Urban, N., Yuval, J., Zhang, G., and Pritchard, M.: ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators, arXiv:2306.08754, 2023. a, b

Yuval, J. and O'Gorman, P. A.: Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions, Nat. Commun., 11, 1–10,, 2020. a, b, c

Yuval, J., O'Gorman, P. A., and Hill, C. N.: Use of neural networks for stable, accurate and physically consistent parameterization of subgrid atmospheric processes with good performance at reduced precision, Geophys. Res. Lett., 48, e2020GL091363,, 2021. a, b, c

Zanna, L. and Bolton, T.: Data-driven equation discovery of ocean mesoscale closures, Geophys. Res. Lett., 47, e2020GL088376,, 2020. a

Zelinka, M. D., Myers, T. A., McCoy, D. T., Po-Chedley, S., Caldwell, P. M., Ceppi, P., Klein, S. A., and Taylor, K. E.: Causes of higher climate sensitivity in CMIP6 models, Geophys. Res. Lett., 47, e2019GL085782,, 2020. a, b

Zhang, T., Li, L., Lin, Y., Xue, W., Xie, F., Xu, H., and Huang, X.: An automatic and effective parameter optimization method for model tuning, Geosci. Model Dev., 8, 3579–3591,, 2015.  a

Zhou, W., Leung, L. R., and Lu, J.: Linking large-scale double-ITCZ bias to local-scale drizzling bias in climate models, J. Clim., 35, 7965–7979, 2022. a

Zhu, J., Otto-Bliesner, B. L., Brady, E. C., Gettelman, A., Bacmeister, J. T., Neale, R. B., Poulsen, C. J., Shaw, J. K., McGraw, Z. S., and Kay, J. E.: LGM paleoclimate constraints inform cloud parameterizations and equilibrium climate sensitivity in CESM2, J. Adv. Model. Earth Sys., 14, e2021MS002776,, 2022. a

Zwanzig, R.: Nonequilibrium Statistical Mechanics, Oxford University Press, ISBN: 0195140184, 2001. a


That is, what is displayed in Fig. 1 are unweighted errors, in contrast to the loss function (Eq. 1), which downweights mismatches between simulations and observations for variables that have high error variance, e.g., because of internal variability in finite-time averages.


NextGEMS cycle 3 output (Koldunov et al.2023) was averaged over 2021–2024 in ICON to avoid the influence of sea surface temperature (SST) nudging during the spin-up period and the subsequent  0.8 °C drift in global-mean SST over the first year of simulation; the output was averaged over 2020–2022 and 2024 in IFS due to missing data in 2023.


For a kα spectrum, the cumulative vertical kinetic energy scales as λβ, where β=α-1. This gives β=2/3 for α=5/3. The curves in Fig. 3, for wavelengths λ<25 km, are well fitted with β≈0.4. This shows that there is no qualitative change in behavior over those scales, only gradual gains from increasing the resolution.


If emergent-constraint statistics are used during loss minimization, they can no longer serve as retrospective constraints on the response of the model to perturbations. In retrospective studies, there is a risk in using emergent constraints because the correlation between emergent-constraint statistics and the climate response may be spurious (Caldwell et al.2014, 2018). When using emergent-constraint statistics during loss minimization, by contrast, the statistics may, at worst, merely be uninformative about model parameters and processes.

Executive editor
This article was solicited for the ACP 20th Anniversary collection. It received positive reviews that very nicely contributed to the ideas and to which the authors responded thoroughly. It is a stimulating read, combining 'big-picture' considerations with more detailed technical discussion of important and illuminating examples.
Short summary

Climate models are crucial for predicting climate change in detail. This paper proposes a balanced approach to improving their accuracy by combining traditional process-based methods with modern artificial intelligence (AI) techniques while maximizing the resolution to allow for ensemble simulations. The authors propose using AI to learn from both observational and simulated data while incorporating existing physical knowledge to reduce data demands and improve climate prediction reliability.

Final-revised paper