Detection of potential structural deficiencies in a global aerosol model using a perturbed parameter ensemble

Prévost, Léa M. C.; Regayre, Leighton A.; Johnson, Jill S.; McNeall, Doug; Milton, Sean; Carslaw, Kenneth S.

doi:10.5194/acp-26-2487-2026

Articles | Volume 26, issue 4

https://doi.org/10.5194/acp-26-2487-2026

Articles | Volume 26, issue 4

Research article

17 Feb 2026

Research article |

| 17 Feb 2026

Detection of potential structural deficiencies in a global aerosol model using a perturbed parameter ensemble

Léa M. C. Prévost, Leighton A. Regayre, Jill S. Johnson, Doug McNeall, Sean Milton, and Kenneth S. Carslaw

Abstract

Understanding and reducing uncertainty in model-based estimates of aerosol radiative forcing is crucial for improving climate projections. A key challenge is that differences between model output and observations can stem from uncertainties in input parameters (parametric uncertainty) or from deficiencies in model code and configuration (structural uncertainty), and these two causes are difficult to distinguish. Structural deficiencies limit efforts to reduce parametric uncertainty through observational constraint because they prevent models from being simultaneously consistent with multiple observations. However, no framework exists to detect structural deficiencies and assess their impact on parametric uncertainty. We propose a workflow to identify structural inconsistencies between observational constraints and diagnose potential structural deficiencies. Using a perturbed parameter ensemble, we sample uncertainty in aerosols, clouds, and radiation in the UK Earth System Model (UKESM), and evaluate model bias against in-situ observations of sulfate aerosol, sulfur dioxide, aerosol optical depth, and particle number concentration across Europe. Applying observational constraints reveals inconsistencies that no combination of the perturbed parameters can resolve. For example, sulfate concentrations in different regions cannot be matched simultaneously, and enforcing a compromise between regions reduces skill across most variables. Additional examples include an inter-region inconsistency in SO₂ and an inter-variable inconsistency between aerosol optical depth and sulfate. By examining the parameter sets retained by constraints, we trace inconsistencies to the parameterisations that may cause them and propose targeted changes to address the underlying deficiency. This approach offers a pathway for evidence-based model development that supports more robust uncertainty reduction and improves climate projection skill.

Download & links

How to cite.

Received: 29 Sep 2025 – Discussion started: 07 Oct 2025 – Revised: 10 Jan 2026 – Accepted: 04 Feb 2026 – Published: 17 Feb 2026

1 Introduction

Earth System Models are essential tools for understanding and projecting climate change. However, these models cannot directly resolve many complex or small-scale processes, such as cloud formation or aerosol–cloud interactions, due to computational restrictions. Instead, unresolved processes are represented using parameterisations: mathematical equations with adjustable input parameters that approximate physical behaviour. Different choices of parameter values lead to different model outputs, so the use of parameterisations inevitably introduces parametric uncertainty for quantities that cannot be observed such as aerosol radiative forcing, which contributes to the spread in climate projections (Peace et al., 2020; Watson-Parris and Smith, 2022).

Modelling centres often adjust parameter values to improve agreement with observations through tuning, which involves expert-informed adjustments to a small number of key parameters to produce a single “best” parameter set for each model. Tuning, however, relies on subjective decisions; modelling teams determine which simulated variables to prioritise, which observations to use, and how to weigh them to best optimise their model (Hourdin et al., 2017). The result across multiple models is a “collection of carefully configured best estimates” (Knutti et al., 2010) that reflect expert judgement and available data, but not necessarily the full range of plausible outcomes. Although tuning is often necessary to produce stable and physically realistic simulations (Schmidt et al., 2017), it obscures other causes of error in the model (Rostron et al., 2025).

These additional errors arise from the model's inherent structural limitations. All models depend on choices about which physical processes to include, how they are formulated, the chosen spatial resolution used, and how the code is implemented. Since no model is perfectly structured to represent the real world, all models carry some degree of structural uncertainty. Structural uncertainty leads to model discrepancy or systematic error that cannot be resolved by adjusting parameters when compared to observations (Goldstein and Rougier, 2004; McNeall et al., 2016; Sexton et al., 2012). As a result, there is a risk that model tuning, when selecting parameter values that best match observations, will overcompensate for deficiencies in the model's structure. The chosen parameter combinations may reproduce observations for the wrong reasons due to compensating model errors. As a result, they will not produce reliable output when used under novel conditions, like when the model is used for climate projections that inform policies (Golaz et al., 2013).

Understanding the causes of a model's structural uncertainty is an essential part of model development. However, it is complicated by the fact that parametric and structural uncertainties are entangled, making it difficult to determine whether discrepancies between model output and observations are due to parameter choices or deficiencies in the model's structure. Historically, structural uncertainty has been explored using multi-model ensembles (MMEs, or model intercomparisons) by comparing structurally different models (Collins, 2007; Flato et al., 2013). However, each model in an MME is typically subjectively tuned so only provides a limited view of its structure, as it is already pre-conditioned to match observations as well as its structure allows. In addition, many models share common components or code, so the effective diversity within an MME is often smaller than it appears (Masson and Knutti, 2011). The range of outputs generated by varying parameters within a single model has been shown to be as large as, or even larger than, the spread across multiple models (Murphy et al., 2004; Yoshioka et al., 2019), which suggests that MMEs alone provide only a partial picture of parametric and structural uncertainty, and that a more systematic exploration of uncertainty is needed to separate these two main causes of model error.

The parametric uncertainty of a model can be sampled using a perturbed parameter ensemble (PPE). PPEs are created by running the same model with different combinations of parameter values to capture the range of possible model outputs (Lee et al., 2011, 2012; Sexton et al., 2012, 2021; Yoshioka et al., 2019; Eidhammer et al., 2024). The information derived from PPEs can be extended using statistical emulators (e.g., Gaussian Process emulators) to predict model outputs for a much larger set of parameter combinations than were simulated (O'Hagan, 2006). PPEs and emulators form a key part of the Uncertainty Quantification (UQ) framework (Kennedy and O'Hagan, 2001), which aims to assess how different causes of uncertainty (e.g., parametric, structural, and observational) affect model output.

Within this framework, history matching is a method used to reduce parametric uncertainty. Rather than identifying a single best-fitting parameter set, history matching rules out combinations of parameters that are observationally implausible, given defined thresholds of the uncertainties in the quantities being compared (Craig et al., 1997). Unlike tuning, this method avoids overfitting by retaining all parameter sets that remain observationally plausible. History matching has been applied both to full climate models (Williamson et al., 2013) and to individual components such as the NEMO ocean model (Williamson et al., 2017), land surface models (Raoult et al., 2024), as well as aerosol models (Johnson et al., 2020; Regayre et al., 2020).

History matching is designed to account for structural uncertainty. The “implausibility” of every model variant (a model run with a different combination of parameter values) is calculated and used to determine which parameter combinations are ruled out. The implausibility measure includes a structural error term as part of its definition. However, as there is no reliable way to quantify structural uncertainty, this term effectively reflects the modeller's judgement about how wrong the model might be (Williamson et al., 2015). If the term is too small, plausible parameter sets may be incorrectly ruled out; if it is too large, implausible combinations may be retained. Consequently, the uncertainty in this term adds subjectivity to the process of ruling out parameter combinations, without necessarily bringing us closer to disentangling parametric and structural uncertainty. As a result, while history matching is more transparent than tuning because assumptions about uncertainty are explicitly stated, it still carries limitations when structural uncertainty is poorly understood (Brynjarsdóttir and O'Hagan, 2014).

Unquantified structural uncertainties have limited the scientific community's ability to constrain uncertainty in predictions of aerosol radiative forcing (ΔF_aer), the change in Earth's radiative balance due to anthropogenic aerosol emissions. As the most uncertain component of anthropogenic forcing (Forster et al., 2021), ΔF_aer complicates estimates of climate sensitivity to greenhouse gases and affects projections of global temperature change (Andreae et al., 2005), limiting how confidently we can simulate future climate change and inform policy decisions. Despite extensive use of observational constraints to reduce parametric uncertainty (Johnson et al., 2020; Regayre et al., 2023), uncertainty in ΔF_aer remains high (Regayre et al., 2026). Similar limitations have been reported in other recent studies, where applying large observational datasets led to only modest reductions in uncertainty in global-mean liquid water path adjustment (Mikkelsen et al., 2025) and effective radiative forcing from aerosol–cloud interactions, (ΔF_aci, Song et al., 2024), both of which contribute directly to the overall uncertainty in ΔF_aer.

A clear illustration of the limits of observational constraints is found in Johnson et al. (2020), who used a history matching approach incorporating over 9000 aerosol observations in an effort to substantially constrain ΔF_aer. Yet, the resulting reductions in parametric uncertainty were minimal – 6 % for ΔF_aci (the component of ΔF_aer from aerosol–cloud interactions) and 34 % for ΔF_ari (the component from aerosol–radiation interactions). One reason for this limited constraint was that different observational datasets pulled model parameters towards opposite sides of their ranges, resulting in conflicting estimates of ΔF_aer. These inconsistencies reduced the effectiveness of observational constraints, despite the size and diversity of the observational dataset, and suggested that we remain far from achieving the maximum feasible reduction in aerosol radiative forcing uncertainty.

Such inconsistencies are symptomatic of structural model deficiencies, as they indicate that the model cannot reproduce all available observations simultaneously. Evidence of similar inconsistencies was found in McNeall et al. (2016), where constraining the climate model FAMOUS to match observations from the Amazon forest led to different parameter combinations being retained than when constraining the model to other forests. The model could represent features of individual forests, but its inability to represent all forests simultaneously implied that key processes are missing or overly simplified. The scale of this problem is systemic and substantial: in an attempt to reduce ΔF_aci uncertainty in the UK Earth System Model (UKESM1; Sellar et al., 2019), Regayre et al. (2023) found that only 13 out of 450 cloud and aerosol measurements could be used before structural inconsistencies started weakening the constraint, which indicates that some of the remaining parametric uncertainty might be due to unaddressed structural deficiencies. If such deficiencies were identified and addressed, more observations could be used and tighter bounds on ΔF_aci could potentially be achieved. Therefore, identifying the causes of inconsistent observational constraints and the structural deficiencies responsible for them is a necessary step towards improving model reliability and increasing model skill at simulating future climate.

There has been growing interest in using PPEs not only to quantify parametric uncertainty, but also to reveal structural deficiencies that cannot be resolved by tuning parameter values alone (Carslaw et al., 2025). For example, Furtado et al. (2023) and Rostron et al. (2023) used PPEs to explore parametric uncertainty in their models and detect discrepancies that persist across all parameter combinations. Couvreux et al. (2021) proposed a parameter calibration framework to identify parameters which limit model performance by introducing structural uncertainty, to be implemented during model development and tuning. Peatier et al. (2024) examined how variability across PPE simulations could provide information about the presence of structural error. Despite these innovations, there is currently no agreed framework to identify structural deficiencies that lead to conflicting observational constraints, and thus block progress in reducing parametric uncertainty. Moreover, little attention has been given to identifying which model developments should be prioritised to most effectively improve model skill at simulating future climate. Without such a framework, there is a risk that model developments increase model complexity without delivering clear benefits (Proske et al., 2023).

In this study, we develop an approach to (a) detect structural inconsistencies between observational constraints and (b) identify structural deficiencies that could cause them. We build on the work of Regayre et al. (2023) who identified a key structural inconsistency in observational constraints related to aerosol–cloud interactions. Our focus is on aerosol-radiation interactions in European winter, where we explore the performance of a UKESM1 PPE by examining the effect of sulfate aerosol mass concentration, sulfur dioxide concentration, aerosol optical depth, and particle number concentration as observational constraints. Specifically, we aim to answer the following questions: (1) what are the main inconsistencies between these aerosol observational constraints? (2) Can these inconsistencies help identify the structural deficiencies that limit our ability to reduce uncertainty in ΔF_aer?

The paper is organised as follows: in Sect. 2 we outline our methodologies to identify inconsistencies and infer potential structural deficiencies that may cause them. In Sects. 3.1 to 3.3, we evaluate the model's performance against in-situ observations across the parametric space. In Sect. 3.4 to 3.6, we apply observational constraints and examine the inconsistencies that arise. In Sect. 4, we identify priorities for structural model development and discuss how this approach could be used more broadly to support uncertainty reduction in Earth system modelling.

2 Methods

We use the PPE and statistical emulation methodology described in Regayre et al. (2023). In Sect. 2.1, we summarise the components of the model configuration that are relevant to the study. Section 2.2 presents the measurements used to compute model bias. In Sect. 2.3, we outline how the main causes of parametric uncertainty were identified for each model grid box, and in Sect. 2.4, how this information informed the spatial clustering of the study region. Section 2.5 then details the calculation of model bias within each cluster, while Sect. 2.6 explains our approach to applying observational constraints. Finally, Sect. 2.7 defines the types and severities of observational inconsistency considered.

2.1 Experimental design

2.1.1 Model version

The PPE used here was created using version 1 of the UKESM (UKESM1; Sellar et al., 2019), which is based on the HadGEM3-GC3.1 physical climate model (Williams et al., 2018) and includes coupling to the United Kingdom Chemistry and Aerosol (UKCA) model (Archibald et al., 2020). Simulations were run using the atmosphere-only configuration, UKESM1-A, which consists of the GA7.1 atmosphere (Walters et al., 2019) with additional updates to aerosol, cloud, and atmospheric structure as described in Mulcahy et al. (2020). The model resolution is N96 (1.875° × 1.25°, or approximately 208 km × 139 km at the Equator), with 85 vertical levels extending up to 85 km. Horizontal winds above approximately 2 km were nudged towards ERA-Interim reanalysis data for the period December 2016 to November 2017. Sea surface temperatures and sea ice were prescribed for the same period.

Each PPE member was forced using anthropogenic SO₂ emissions from the years 2014 and 1850, consistent with those used in CMIP6 (Eyring et al., 2016). Emissions of carbonaceous aerosol from residential and fossil fuel sources followed CMIP6 data for 1850, while present-day carbonaceous aerosol from biomass burning sources were prescribed using Copernicus Atmosphere Monitoring Service (CAMS) data for December 2016 to November 2017. Monthly mean output from a fully coupled UKESM simulation was used to prescribe ocean surface concentrations of dimethylsulfide (DMS) and chlorophyll, as well as atmospheric concentrations of gas-phase species, including OH and O₃. Volcanic SO₂ emissions included continuous and sporadic sources (Andres and Kasgnoc, 1998) and emissions from explosive eruptions (Halmer et al., 2002). Aerosol number concentrations were calculated prognostically using the GLOMAP-mode aerosol scheme (Mann et al., 2010, 2012), which represents five log-normal modes and includes sulfate, sea salt, black carbon, and organic carbon, internally mixed within each mode.

We use a version of UKESM1-A with structural changes described by Regayre et al. (2023). These include: a revised threshold for ice mass fraction above which nucleation scavenging is deactivated to allow aerosol transport into the Arctic (Browse et al., 2012); updated high-resolution lookup tables for aerosol optical properties (Bellouin et al., 2013), including mineral dust (Balkanski et al., 2007) and improved aerosol absorption; and the inclusion of an organically mediated aerosol nucleation parameterisation (Metzger et al., 2010), intended to improve the model's representation of remote marine and early industrial aerosol conditions, known to affect the magnitude of ΔF_aer (Carslaw et al., 2013).

2.1.2 Perturbed parameter ensemble and statistical emulation

The PPE from Regayre et al. (2023) consists of 221 model simulations, with 37 perturbed parameters related to aerosols, clouds, and the physical atmosphere (detailed in Table A1). The selection of the perturbed parameters was based on those identified in previous PPEs as large causes of uncertainty in key outputs (Regayre et al., 2015, 2018; Sexton et al., 2021; Yoshioka et al., 2019), together with parameters associated with structural model developments (Mulcahy et al., 2018, 2020; Walters et al., 2019). Their perturbation ranges were determined using formal expert elicitation using the Sheffield Elicitation Framework (SHELF) approach described in Gosling (2018). The PPE was developed in two stages. In the first stage, the most implausible parts of the parameter space were identified and removed by comparing simulated shortwave fluxes with observations using a history-matching style approach. The second stage PPE was sampled from the remaining, more plausible parameter space and forms the focus of this analysis.

Here, model output from the 221 PPE simulations, resolved at the grid-box level across Europe in January 2017, was used to train statistical emulators for four variables related to aerosol–radiation interaction forcing: sulfate aerosol mass concentration (“sulfate”), sulfur dioxide concentration (SO₂), aerosol optical depth (AOD), and particle number concentration larger than 3 nm diameter (N₃). Gaussian Process emulators (O'Hagan, 2006) were constructed to represent the monthly mean of each variable as a continuous function across the 37-dimensional input parameter space, with each parameter jointly varied over its specified range (shown in Table A1). The emulators were then used to generate output for 1 million model variants at the grid-box level, with a large reduction in computational cost compared to full climate model simulations. Emulator uncertainty was quantified and assessed against the spread of emulator output (Fig. B1). Grid boxes where emulator predictive uncertainty exceeded the spread in emulator output were excluded from the analyses to avoid relying on emulator predictions in regions of high predictive uncertainty.

2.2 Measurements

We use in-situ aerosol measurements for January 2017 in Europe, aggregated to monthly means, for the four variables: sulfate, SO₂, AOD, and N₃. Measurements for sulfate, SO₂, and AOD were obtained from the Globally Harmonised Observations in Space and Time (GHOST) dataset (Bowdalo, 2024a; Bowdalo et al., 2024b), which provides station-level monthly mean values. Sulfate measurements represent total particulate sulfate at the surface, reported in µg m⁻³. SO₂ concentrations were measured as surface-level sulfur dioxide in nmol mol⁻¹ and converted to µg m⁻³. AOD data are level 2.0 observations measured at a wavelength of 440 nm from the AERONET network (Sinyuk et al., 2020). N₃ represents the number concentration of particles larger than 3 nm, measured at the surface in particles per cm³. N₃ data were directly obtained from the European Monitoring and Evaluation Programme (EMEP, http://ebas.nilu.no/, last access: 27 January 2025; Tørseth et al., 2012).

2.3 Causes of uncertainty

The importance of each parameter as a cause of model uncertainty was estimated using Generalised Additive Models (GAMs). GAMs are flexible statistical models that represent the relationship between predictors and a response as a sum of smooth, linear or non-linear functions. We fitted non-linear GAMs to emulated model output for each variable within individual grid boxes using the pygam Python package (Servén and Brummitt, 2018). The fitted GAM functions were used to quantify the variance in model output attributable to each parameter, while allowing for non-linear effects (Strong et al., 2014), following Regayre et al. (2026).

To quantify the parameter's contribution to output variance, we varied one parameter at a time across its sampled range while fixing all others at their median values. This approach isolates the marginal effect of the target parameter by removing variability introduced by changes in other parameters. The resulting 37 variances were summed to obtain the total parametric variance, and each parameter's contribution was expressed as a proportion of this total. The resulting percentage contribution to parametric uncertainty reflects both the range over which each parameter was perturbed and the local importance of that parameter to model output.

The GAMs were trained on the “unconstrained” subset of approximately 900 000 model variants, excluding those with prim_so4_diam values below ∼ 10 nm, as defined in Regayre et al. (2026). In the original ensemble comprising 1 000 000 model variants, such low diameters led to implausibly high particle number concentrations, which were ruled out as observationally implausible by Regayre et al. (2023). Including these variants would have artificially inflated the apparent importance of prim_so4_diam, thereby masking the contributions of other parameters (Regayre et al., 2026).

2.4 Spatial clustering of causes of uncertainty

We applied k-means clustering, an unsupervised machine learning technique, to group grid boxes according to shared causes of parametric uncertainty. The clustering was implemented using the scikit-learn Python package (Pedregosa et al., 2011), and was based on the parameter percentage contributions to variance multiplied by the sign of variable dependence on parameter values from the GAM fit (Sect. 2.3). The number of clusters was chosen iteratively: we began with a high number relative to the size of the region (e.g. six clusters for Europe) and reduced it if clusters showed redundant patterns in dominant parameters and their contributions. In some instances, clusters that spanned wide regions remained undivided even as the number of clusters increased. The clustering method preferentially split regions adjacent to grid boxes excluded for high emulator uncertainty because of distinct local patterns in causes of uncertainty. In these cases, we manually divided large clusters by masking all other grid boxes and applying k-means clustering again within the selected region following the same method.

2.5 Evaluation of model-observation bias within clusters

We evaluate model performance against observations within each cluster of shared causes of parametric uncertainty. For each PPE simulation, we compute the mean model value over the set of grid boxes containing observations within the uncertainty cluster, resulting in a cluster mean for each of the 221 PPE members. These cluster mean values are then used to train and validate the emulator for each cluster (Fig. B2). Leave-one-out cross-validation indicates that the emulators reproduce cluster-mean PPE outputs with high accuracy overall (e.g., NRMSE ≤ 0.09), although some underprediction occurs for high values in certain clusters (e.g., sulfate and N₃). These biases suggest that true values in these regions may be higher than emulated estimates; however, given the focus on relative differences across clusters, these limitations are unlikely to affect the main conclusions.

Model-observation bias is calculated for each model variant (i=1 to 1 000 000) using normalised mean bias factors following Yu et al. (2006). N denotes the number of observational sites in the cluster. For each site j, we use a single observed value (O_j) and pair it with the modelled value (M_ij) from the grid box containing that site for every model variant i. Both observations and model values are monthly averages. Thus, for a given model variant i, the cluster-mean model value is ${\overline{M}}_{i} = \frac{1}{N} \sum_{j = 1}^{N} M_{i j}$ and the cluster-mean observation is $\overline{O} = \frac{1}{N} \sum_{j = 1}^{N} O_{j}$ . The normalised mean bias factor (B_NMBF) is then calculated as follows:

\begin{matrix} (1) & B_{NMBF, i} = \{\begin{cases} 1 - \frac{\overline{O}}{{\overline{M}}_{i}}, if {\overline{M}}_{i} < \overline{O} \\ \frac{{\overline{M}}_{i}}{\overline{O}} - 1, if {\overline{M}}_{i} > \overline{O} \end{cases} . \end{matrix}

2.6 Application of observational constraints

The steps in Sect. 2.5 provide the model–observation bias for each of the 1 000 000 model variants. Observational constraints are then applied by retaining only those variants with the smallest absolute B_NMBF, which correspond to those closest to the mean observed value. We apply observational constraints to the original set of 1 000 000 model variants, rather than the “unconstrained” subset of ∼ 900 000 used for clustering (Sect. 2.4). While low prim_so4_diam values are excluded from uncertainty analyses due to their unrealistic nature, including them here helps illustrate the effect structural deficiencies in observational constraints.

Observational uncertainties are not directly incorporated into the constraint process. Instead, we retain a threshold of 5000 model variants (0.5 %) closest to observations to prevent over-constraint, given the presence of unquantified measurement errors. This threshold was also used by Regayre et al. (2023), and was chosen to approximate the proportion of model variants retained using a more rigorous history matching approach that explicitly accounts for observational uncertainty, emulation uncertainty and other model-to-observation comparison uncertainties (Johnson et al., 2020; Regayre et al., 2020). In this research, observational constraints are not used to identify a single “best” model variant or to quantify parametric uncertainty. Rather, they are used as tools to explore model responses to constraints and to identify potential structural deficiencies.

For joint observational constraints, we identify the set of model variants that are common to all individual constraints that form the joint constraint. In cases where no common variants are found, we define the constraints as inconsistent, using definitions that follow in Sect. 2.7. To explore the extent of the inconsistency and assess how conflicting constraints might be accommodated, we progressively relax individual constraints until at least around 300 model variants are retained in the overlapping set. We define this as a compromise between inconsistent observational constraints, following Regayre et al. (2023).

When observations are outside of the range of the model output of PPE members, they are not used in the calculation of model-observation bias (Sect. 2.5) and are therefore not included in the process of observational constraints. An observation outside the PPE range is a clear indication of the presence of a structural model deficiency, as it means that no amount of parameter retuning will bring the model into agreement with the observations, given the parameters that were included in the PPE and the wide range of values they were perturbed over. In these cases, we provide hypotheses on potential consequences for our results.

While observations outside the PPE range are excluded from the constraint process, they are retained for evaluation purposes. Because these values lie beyond the range represented by the ensemble, they cannot be meaningfully used for constraint. However, they remain important for assessing model skill and identifying potential structural limitations. To ensure a complete evaluation, we assess the impact of each observational constraint on model–observation bias across all available observations, including those outside the PPE range. For example, when constraining using SO₂, only SO₂ observations within the PPE range for all regions are used in the constraint, but model skill is evaluated using all available observations for sulfate, AOD, and particle number concentration, even those outside the PPE range. Similarly, when constraining toward AOD, we use only AOD observations within the PPE range for the constraint, but evaluate model skill against all sulfate, SO₂, and particle number observations.

2.7 Definitions of potential structural inconsistencies

In the ideal case, all observational constraints would guide the model toward the same part of parameter space. That is, each constraint would support convergence towards parameter combinations that produce simulations consistent with several observed variables. When constraints do not converge, it indicates that the model would need to be tuned differently to match each variable and that, having exhausted the parameter space, no model variant exists that is consistent with multiple observations. In history-matching terminology, this situation is referred to as the “terminal case” (Salter et al., 2019). Such lack of convergence suggests a structural deficiency rather than a problem that can be resolved through tuning alone. We therefore define this lack of convergence between constraints as a potential structural inconsistency.

The concept is related to Keith Beven's definition of a behavioural model, where a parameter set is considered “behavioural” if it cannot be rejected as observationally implausible (Beven, 2006). In our context, we identify cases where the model may be partially behavioural (i.e., satisfying individual constraints) but not universally behavioural across different aspects of the model (e.g., variables, regions).

https://acp.copernicus.org/articles/26/2487/2026/acp-26-2487-2026-f01

Figure 1Schematic showing the possible levels of inconsistency between two observational constraints. The shaded regions are the parts of parameter space that match one observation type. The diagram only represents the 2-dimensional aspects of what is in our case a 37-dimensional problem.

Detection of potential structural deficiencies in a global aerosol model using a perturbed parameter ensemble

2.1 Experimental design

2.1.1 Model version

2.1.2 Perturbed parameter ensemble and statistical emulation

2.2 Measurements

2.3 Causes of uncertainty

2.4 Spatial clustering of causes of uncertainty

2.5 Evaluation of model-observation bias within clusters

2.6 Application of observational constraints

2.7 Definitions of potential structural inconsistencies

3.1 The model, its parametric uncertainty and comparison with observations

3.2 Clusters of shared causes of parametric uncertainty

3.3 Model-observation bias in uncertainty clusters

3.4 Inconsistency between observational constraints

3.5 Compromised constraint in the presence of structural inconsistencies

3.6 Other potential structural inconsistencies

3.6.1 AOD-sulfate inconsistency

3.6.2 SO2 inter-cluster inconsistency

3.6.2 SO₂ inter-cluster inconsistency