Technical note: Hybrid machine learning model for bias correction of UTLS relative humidity against IAGOS observations in ERA5 reanalysis

Antonopoulos, Mathieu; Juvin-Quarroz, Jérémie; Boucher, Olivier

doi:10.5194/acp-26-4771-2026

Articles | Volume 26, issue 7

https://doi.org/10.5194/acp-26-4771-2026

Articles | Volume 26, issue 7

Technical note

10 Apr 2026

Technical note |

| 10 Apr 2026

Technical note: Hybrid machine learning model for bias correction of UTLS relative humidity against IAGOS observations in ERA5 reanalysis

Mathieu Antonopoulos, Jérémie Juvin-Quarroz, and Olivier Boucher

Abstract

Persistent contrail cirrus form in Ice-Supersaturated Regions (ISSRs) and are responsible for a large portion of aviation's non-CO₂ climate impact. Avoiding ISSRs through flight rerouting has been proposed as a short-term mitigation strategy. However, accurate prediction of the Relative Humidity with respect to ice, RH_i, distribution within ISSRs at cruising altitude remains difficult. Observations are problematic: Satellite-based global measurements carry large uncertainties while in-situ measurements offer a limited spatial coverage. On the contrary, reanalysis offer a global estimate of RH_i, but it suffers from a dry bias near the tropopause where ISSRs are located as well as significant random errors. In this study, we develop a hybrid machine learning model to improve RH_i estimates in the upper troposphere and lower stratosphere using ERA5 and aircraft measurements from the In-service Aircraft for a Global Observing System. The model combines an XGBoost regressor for drier conditions (RH_i < 85 %) and an Artificial Neural Network (ANN) for more humid cases (RH_i > 85 %). This hybrid approach significantly outperforms raw ERA5 data, leveraging the ANN's ability to capture non-linear relationships and the XGBoost's robustness in handling drier conditions. The mean absolute error (MAE) is reduced from 13.7 % to 11.4 % and the Equitable Threat Score (ETS) for ISSR detection is improved from 0.36 to 0.44. The greatest improvement is observed in the lower stratosphere, where the ETS increases by 0.18 and the MAE drops to 10.71 %. These improvements mark a key step toward more reliable identification of ISSRs, helping reduce the uncertainties that currently limit effective flight-rerouting strategies.

Download & links

Article (PDF, 5632 KB)

Supplement (310 KB)

Download & links

How to cite.

Received: 16 Sep 2025 – Discussion started: 10 Dec 2025 – Revised: 22 Jan 2026 – Accepted: 25 Mar 2026 – Published: 10 Apr 2026

1 Introduction

Aviation contributes to climate change through both carbon dioxide (CO₂) emission and non-CO₂ effects. The latter may have accounted for 66 % of aviation's total net Effective Radiative Forcing (ERF) in 2018 (Lee et al., 2021). The largest contributor to non-CO₂ ERF is the radiative forcing from condensation trails – contrails – which are linear ice clouds formed behind aircraft as hot exhaust gases mix with cold ambient air. The physical processes governing contrail formation are well understood (Schmidt, 1941; Appleman, 1953; Schumann, 1996). While most contrails dissipate quickly and have a negligible climatic impact, those that form within Ice Supersaturated Regions (ISSRs) can persist for hours (Kärcher, 2018; Kärcher and Corcos, 2025), eventually developing into contrail cirrus clouds with microphysical and optical properties similar to those of natural cirrus (Li et al., 2023; Wang et al., 2024).

https://acp.copernicus.org/articles/26/4771/2026/acp-26-4771-2026-f01

Figure 1Density map of IAGOS data points in the North Atlantic region for the year 2022. The color scale indicates the number of data points per 0.25° × 0.25° gridbox.

https://acp.copernicus.org/articles/26/4771/2026/acp-26-4771-2026-f02

Figure 2Illustration of the collocation process between IAGOS and ERA5 datasets. Diagram (a) presents the datasets before collocation, while diagram (b) illustrates the data after preprocessing. IAGOS measurements are averaged within each ERA5 grid box, and the ERA5 variables are interpolated to match the mean altitude of the IAGOS point. The RH_i is then recomputed based on the interpolated variables.

Download

ISSRs are regions of the atmosphere where the Relative Humidity with respect to ice is supersaturated, RH_i > 100 %, which is a meta-stable thermodynamic state. The size of ISSRs has been estimated based on the distance flown within such regions by Measurement of Ozone and Water Vapor by Airbus In-Service Aircraft (MOZAIC) aircraft, showing that they can extend over 𝒪(10–100) km (Gierens and Spichtinger, 2000). ISSRs are typically located at the tropopause level and are rather frequent across the globe (Gierens et al., 1999; Spichtinger et al., 2003 a). Their formation is facilitated by the presence of upward winds and the meeting of cold and warm air streams (Gierens and Brinkop, 2012). The dynamics of ISSRs also present complex patterns as it has been determined that ISSRs' displacement may be slower than the surrounding wind field's (Hofer and Gierens, 2025).

On average, it is estimated that aircraft spends 13.5 % of their flight time in ISSRs (Gierens et al., 1999) and that only a small proportion of flights might be responsible for the majority of persistent contrail formation. For instance, it has been estimated that only 2.7 % of total flights are responsible for 80 % of the total contrail RF in 2019 (Teoh et al., 2024). These findings have motivated flight rerouting as a short-term mitigation strategy, e.g. Rosenow et al. (2018). Being able to meteorologically forecast ISSRs with enough accuracy would allow a slight deviation of the flight route to avoid the formation of persistent contrails, at the cost of a small amount of additional CO₂ emissions.

Global ISSR forecasts are generally derived from Numerical Weather Prediction (NWP) models, informed by in-situ and satellite observations. However, detecting ISSRs remains challenging, particularly through satellite observations (Gierens et al., 2004; Gettelman et al., 2006; Lamquin et al., 2012; Hegglin et al., 2013) due to their relatively narrow vertical extent (Spichtinger et al., 2003 b; Wolf et al., 2023). The tropopause region, where ISSRs typically occur, also features steep vertical gradients in humidity (Stenke et al., 2008; Zahn et al., 2014), necessitating high vertical resolution for accurate characterization. Both remote measurements with lidar (Groß et al., 2014; Krüger et al., 2022) and in-situ observations, such as those made with balloon-borne instruments (Heymsfield et al., 1998) and with on-flight sensors (Diao et al., 2015 b; Petzold et al., 2015), provide accurate measurements of RH_i, but are spatially and temporally limited.

Hence we must rely on NWP models to obtain RH_i values across the global atmosphere. NWP models contribute in two ways: forecasts, which predict atmospheric conditions in real time and are required for operational contrail avoidance, and reanalysis, which retrospectively reconstruct past atmospheric states by assimilating all historical observations. While forecasts are necessary for live avoidance decisions, reanalysis data is well-suited for retrospective studies such as this one. We therefore use the ERA5 reanalysis dataset (Hersbach et al., 2020), which is widely regarded as a highly trusted reconstruction of the global atmosphere. However, as other NWP products, ERA5 is known to suffer from a dry bias in the Upper Troposphere (UT) and Lower Stratosphere (LS) (Teoh et al., 2022; Rolf et al., 2023; Platt et al., 2024), which means that the model often predicts lower relative humidity values compared to the observed values. This bias, as well as significant random errors, can lead to considerable errors in RH_i estimates, particularly in regions where contrails are likely to form. Previous work on the correction of humidity bias in ERA5 have shown potential for improvement of the ISSR detection in the UTLS. Wolf et al. (2025) applied a bivariate Quantile Mapping (QM) correction to ERA5 temperature and relative humidity fields, significantly reducing the dry bias. They report a reduction of the RH_i bias from approximately −4.5 % to −1.2 % on average at pressure levels of 250, 225, and 200 hPa. This bias reduction leads to an improved classification of different contrail categories. Wang et al. (2025) proposed a machine learning (ML) approach to improve ERA5's RH_i estimates in the UT using an Artificial Neural Network (ANN).

https://acp.copernicus.org/articles/26/4771/2026/acp-26-4771-2026-f03

Figure 3Correlation matrix showing the relationships between selected engineered features, the target variable RHi_IAGOS, and the initial ERA5 estimate. The engineered gradient features show a relatively strong correlations with the target RHi_IAGOS. The time embeddings (cos_day) exhibit low direct correlation with RHi_IAGOS.

Download

https://acp.copernicus.org/articles/26/4771/2026/acp-26-4771-2026-f04

Figure 4Comparison of RH_i predictions between different models with respect to the target value (RHi_IAGOS). (a) presents the baseline ERA5 reanalysis, (b) ANN model, (c) XGBoost model, and (d) the hybrid ensemble model. All predictions are made on the test dataset between 400 and 200 hPa over the North Atlantic region (cf. Fig. 1) for the year 2022.

Download

In this study, we present a hybrid ensemble machine learning approach to correct the dry bias in ERA5's RH_i in the UTLS. We combine an XGBoost regressor (Chen and Guestrin, 2016) and an ANN, leveraging the complementary strengths of both models. ANNs are well-suited for modeling complex, non-linear relationships in multivariate data (Živković et al., 2009), while tree-based models like XGBoost excel in handling structured data and often deliver state-of-the-art performance (Chen and Guestrin, 2016). We dynamically chose the model based on the initial RH_i value provided by ERA5: the XGBoost is used for prediction in drier conditions (RH_i < 85 %), while the ANN is used for more humid cases (RH_i > 85 %). ISSR formation involves more complex and non-linear atmospheric dynamics (Diao et al., 2014, 2015 a), which are better captured by ANNs due to their flexibility and non-linear representation capabilities. Effectively, the ANN captures subtle variations relevant to ISSR classification, while the XGBoost model is used to correct bias under simpler thermodynamic conditions.

This paper is organized as follows. In Sect. 2 we describe the IAGOS and ERA5 datasets, the collocation procedure, filtering, and feature engineering. Sect. 3 details the ANN and XGBoost architectures, training procedures, and the hybrid model. Sect. 4 presents our results, showing improvements in regression and classification scores under different meteorological conditions.

2 Data

In this study, we will use two complementary datasets. The ERA5 reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF, Copernicus Climate Change Service, 2025), and the in-situ observations from the In-service Aircraft for a Global Observing System (IAGOS, Petzold et al., 2015) program. ERA5 provides a global and consistent estimate of atmospheric conditions through a wide range of variables, while IAGOS offers direct measurements of RH_i at cruise altitudes. Combining these datasets enables a supervised learning approach where the ERA5 features serve as model inputs, while the estimation of RH_i from IAGOS from temperature and relative humidity measurements serve as the target variable. The following subsections describe each dataset, along with our preprocessing and feature engineering methodology.

Unlike Wang et al. (2025), we focus on the North Atlantic region, spanning latitudes from 40 to 70° N and longitudes from 65° W to 5° E (Fig. 1). This region shows a relatively uniform distribution of the IAGOS data and is known for frequent contrail formation due to high air traffic and favorable atmospheric conditions. For these reasons, it is thought to be a potential region for contrail avoidance trials. We selected the year 2022 as our study period as it offers recent atmospheric conditions and comes after the maximum COVID disruption of international flights in 2020 and 2021. Our data processing is implemented based on methodology from Wang et al., 2025, with some modifications to ensure proper separation of the training and test datasets to avoid temporal leakage.

2.1 ECMWF reanalysis data (ERA5)

ERA5 data are provided on a regular latitude–longitude grid with a horizontal resolution of 0.25° × 0.25°, corresponding to an approximate grid spacing of 19.5 km at 45° N latitude. Vertically, the dataset includes 37 pressure levels extending from the surface to the LS. We focus on pressure levels between 500 and 125 hPa, specifically using the following levels: 500, 450, 400, 350, 300, 250, 225, 200, 175, 150 and 125 hPa. The extracted ERA5 variables provide a comprehensive description of the atmospheric conditions (see Table 1 for a summary of all extracted variables and their units). The ERA5 data serve as input for both statistical validation and the training of machine learning models. We use all variables as features for the model, and the RHi_ERA5 measurement as a baseline to evaluate model performance.

Table 1Summary of all features used for training, grouped by category. ERA5 variables include values at current pressure level and context, ± 2 pressure levels, and 2 and 6 h prior current timestep (i.e. 7 versions for each ERA5 variable). The target IAGOS value is indicated in bold. ^∗ RHi_ERA5 is recomputed after linear interpolation to maintain consistency and physical dependencies

Download Print Version | Download XLSX

2.2 In-service Aircraft for a Global Observing System (IAGOS)

IAGOS is a European research initiative providing long-term, high-resolution in-situ atmospheric measurements to support the understanding of atmospheric composition, air quality, and climate processes. Data are collected by equipping commercial aircraft, mostly Airbus A330s, with the “P1 package” set of scientific instruments. Currently, the IAGOS fleet comprises ten aircraft operating worldwide. The RH_i is derived from measured temperature, pressure, and water vapor mixing ratio, using the saturation vapor pressure formula by Sonntag (1994). The measurements are recorded at a temporal resolution of 4 s, corresponding to a spatial resolution of approximately 1 km at cruise speeds. However the data points are averaged over a longer period of few minutes because of the time lags involved with the instrument (Borella et al., 2024).

The spatial distribution of IAGOS measurements is inherently non-uniform as commercial flights tend to follow similar routes (cf. Fig. 1). Additionally, the limited number of aircraft in the IAGOS fleet contributes to a concentration of the data in some regions. The highest data density is found along transatlantic flight corridors and Western Europe. The North Atlantic and European regions are the most represented, as most flights from IAGOS are issued by European companies, and transatlantic flights represent an important share of global aviation traffic. This area is of particular interest for contrail studies, given the frequent occurrence of ISSRs (Reutter et al., 2020). Most measurements are collected between 400 and 200 hPa, matching the standard cruise altitude.

To ensure data quality, we retain only samples flagged as “good” by the IAGOS quality control system. We also restrict the dataset to measurements between 400 and 200 hPa. According to Rolf et al. (2023) and their analysis of the airborne measurements, the P1 sensors used in IAGOS show limited accuracy in dry conditions (RH_i < 10 %). Hence, we exclude those samples from our dataset. After applying all these filtering steps for the year 2022 on the North-Atlantic region, we obtain a dataset comprising 1 535 724 data points from 678 flights.

2.3 Data collocation

The IAGOS and ERA5 data have different resolutions in both time and space, hence we must proceed to a collocation of the data. Once we collected and filtered the IAGOS data for each flight, we determine the closest point in ERA5 by matching the closest pressure level, longitude and latitude points (at a 0.25° resolution) and timestep (closest hour). Once we get these indices, we find the closest grid box in ERA5 to each IAGOS data point. The IAGOS points matching the same grid box are grouped. Given the four-second resolution of the data and the average grid box length of 19 km, each group contains up to 23 IAGOS points and 15 IAGOS points on average. Once grouped, the IAGOS data are averaged within one grid box to obtain a single IAGOS point per ERA5 grid box and timestep. Given that points from same flights are usually located at a similar altitude within a grid box, we effectively horizontally average the IAGOS data for longitudes and latitudes. We further proceed to a vertical linear interpolation of the ERA5 data to match the averaged altitude of the IAGOS point. Future work should test log-pressure linear interpolation to reflect the atmosphere's vertical structure. A visual representation of this process is given in Fig. 2. In order to maintain physical relationships between variables, we recalculate RHi_ERA5 with the interpolated variables T and q, which are, respectively the air temperature and the specific humidity at the relevant pressure value p. This is done using the Clausius-Clapeyron equation, with a Magnus form approximation for saturation vapor pressure over ice as described by Alduchov and Eskridge (1996). A comparison between observed RH_i from IAGOS and RH_i predictions from ERA5 reanalysis is shown in Fig. 4a.

The dataset is then separated into training, validation and test datasets. Careful separation is essential, as atmospheric conditions can exhibit strong spatial and temporal correlations. Without proper splitting, the model might train on samples nearly identical to those in the test dataset, potentially leading to overfitting and artificially high performance on validation and test datasets. To avoid this issue while keeping a representative distribution across seasons, days, and hours, we implemented a by-day sampling strategy. Specifically, we sample 5 d for training, 1 d gap, 1 d for validation, 1 d gap, 5 d for training, 1 d gap, 1 d for testing and 1 d gap, uniformly throughout the year. This differs from Wang et al. (2025) in two ways: (i) we use 5 d for training (4 d in Wang et al., 2025), (ii) an additional day gap is used after the validation or testing day to prevent correlations with the following training period. After collocation and split, we obtain 64 712 training samples, 7025 validation samples and 6085 test samples. To address the underrepresentation of high RH_i values and reduce potential bias, data augmentation is applied to the training dataset: samples with RH_i above 120 % are oversampled by a factor of 3, while those below 20 % RH_i are undersampled by a factor of 0.4. This approach results in a RH_i distribution that is more suitable for our needs.

Finally, we ensure that our processed data are representative of ISSRs characteristics by comparing it to the results in Gierens and Brinkop (2012). We report the distribution of vorticity and divergence in Fig. A1.

Table 2Hyperparameter grid used for tuning the ANN model. Each row lists the set of values tested for a specific hyperparameter during the grid search process. The values in bold indicate the configuration that achieved the best performance on the validation dataset and was selected for final training.

Download Print Version | Download XLSX

2.4 Feature engineering

ERA5 provides an extensive set of variables, but to further improve the input data and capture new relation (Fig. 3), additional derived variables are added as input features. Following the methodology proposed by Wang et al. (2025), we add for each data point, the selected variables two pressure levels above and below the current pressure level (totaling 5 levels) as well as data from two hours and 6 h prior to the current timestep. In addition we extracted several features from the original ERA5 and IAGOS data to enrich the model inputs and improve the correction of the RH_i estimates. These features aim to capture the vertical structure and dynamics of the atmosphere.

Table 3Performance comparison of models for RH_i correction. Improvement relative to ERA5 baseline shown in parentheses. Best values are indicated in bold.

Download Print Version | Download XLSX

To capture the vertical temperature profile, we compute temperature gradients at different scales. These include the gradient between the current level and the two levels immediately above (T_grad_up), between the current level and the two levels immediately below (T_grad_down), a centered gradient between one level above and one below (T_grad_centered), and a lower-resolution gradient between two levels above and two below (T_grad_overall), in K hPa⁻¹. Similarly, gradients of the vorticity (vo_grad_up, vo_grad_down, vo_grad_centered, vo_grad_overall), in $s^{- 1} {hPa}^{- 1}$ , and the RH_i (RHi_grad_up, RHi_grad_down, RHi_grad_centered, RHi_grad_overall), in %, are computed following the same pattern.

Table 4Top five most important features for the ANN and XGBoost models. As expected, RHi_ERA5 ranks highest in both models due to its strong correlation with the target variable. Several engineered features demonstrate strong importance, specifically the temperature gradient, the binary cloud presence indicator, and the time embeddings. Time embedding features show strong importance for the ANN model but not for XGBoost, indicating that each model captures different aspects of the underlying atmospheric patterns.

Download Print Version | Download XLSX

To account for the diurnal and seasonal variability in atmospheric conditions, we embedded time information using cosine functions. The local time at the observation point was computed from the UTC hour and longitude, then encoded using sine and cosine transforms to capture daily cycles (cos_hour, sin_hour). Similarly, the day of the year (cos_day, sin_day) is encoded to reflect seasonal effects. These four quantities are normalized by 2π.

These engineered features were incorporated into the training data to allow the model to better represent the complex atmospheric processes influencing the RH_i (cf. Table 1). Some of the engineered features proved to be statistically meaningful. In particular, the temperature gradient showed a stronger correlation with RH_i than the original temperature variable. Time embeddings exhibited almost null direct correlation with RHi. However, they were identified as important contributors in the feature importance analysis of the ANN, indicating that their effect may be captured in more complex, non-linear relationships within the model (cf. Fig. 3 and Table 4).

Data is then categorized according to distinct meteorological conditions following methodology from Wang et al. (2025). To determine the cloud presence, we used the cloud ice water content (ciwc_ERA5) variable. If ciwc equals zero at the current pressure level and two levels above and below it (± 2 pressure levels), the data point is classified as clear sky. If any of these five levels has a non-zero ciwc value, the point is classified as cloudy. Additionally, we separated the points from the UT and LS. Commercial flights cruise at an atmospheric level called the tropopause, which is the interface between the troposphere and the stratosphere. This separation helps to account for the differing thermodynamic properties and humidity distributions between the troposphere and stratosphere. The classification is based on potential vorticity (pv_ERA5). Data points are labeled as UT if potential vorticity is less than 2 PVU (Potential Vorticity Units). Conversely, points with PV > 2 PVU are classified as LS. We use the typical conversion rate of 1 PVU = 10⁶ $K {kg}^{- 1} m^{2} s^{- 1}$ . In the test dataset, 2641 points are classified as Cloudy-Sky, 3444 as Clear-Sky, 4566 as LS and 1519 as UT.

Table 5Comparison of ERA5 and hybrid model (HYB) prediction performance against IAGOS observations across different meteorological conditions on the test dataset. We show the percentage of True Positive (TP), False Negative (FN), False Positive (FP) and True Negative(TN) per condition.

Download Print Version | Download XLSX

2.5 Evaluation Metrics

The performance of the model is assessed using different metrics. Since we primarily solve a regression task, we use the Mean Absolute Error (MAE), expressed as RH_i percentage points. However, MAE alone does not fully capture the model's ability to correct the biases in ERA5. The aim of the model is to accurately identify ISSRs, which requires a classification perspective. Therefore, following common practices in ISSR prediction (such as in Wang et al., 2025 and Wolf et al., 2025), the Equitable Threat Score (ETS) is used as a complementary evaluation metric. The ETS is a skill score that quantifies the accuracy of a binary classification (e.g., detecting ISSRs where RH_i ≥ 100 %) while accounting for hits that could occur by random chance. The ETS is defined as

\begin{matrix} (1) & ETS = \frac{TP - r}{TP + FP + FN - r}, \end{matrix}

where TP denotes true positives (correctly predicted ISSRs), FP denotes false positives (non-ISSRs incorrectly predicted as ISSRs), FN denotes false negatives (ISSRs missed by the model), TN denotes true negatives (correctly predicted non-ISSRs), and r is the number of hits expected by random chance, computed as

\begin{matrix} (2) & r = \frac{(TP + FP) (TP + FN)}{TP + FP + FN + TN}, \end{matrix}

The ETS adjusts for random hits and is particularly suited for evaluating rare event prediction. It ranges from $- 1 / 3$ (worse than random chance) to 1 (perfect prediction), with 0 indicating no better results than random guessing. Our baseline for evaluating model improvement is the ERA5 RH_i prediction, which shows a 0.36 ETS and 13.7 % MAE (see Table 3).

3 Hybrid Machine Learning Model development

Correcting the dry bias in ERA5's RH_i estimates in the UTLS presents a significant challenge due to the inherently non-linear nature of atmospheric processes. To address this, we propose a hybrid machine learning approach that combines two complementary models: an ANN and an XGBoost. The ANN is well-suited for capturing complex, non-linear dependencies, particularly under high humidity conditions, while XGBoost performs robustly in drier regimes.

The hybrid strategy dynamically selects the appropriate model for prediction based on the ERA5-estimated RH_i value. Specifically, for samples where ERA5 RH_i is below 85 %, predictions are taken from the XGBoost model, which has demonstrated superior performance in lower-humidity environments. Conversely, for RH_i values exceeding 85 %, the ANN model is employed, leveraging its capacity to extract highly non-linear patterns, such as those associated with ISSRs. The 85 % threshold was chosen based on two key considerations. First, ERA5 is known to exhibit a MAE of approximately 15 % in humid conditions. By subtracting this from the 100 % threshold typically used to identify ISSRs, we ensure that the ANN model is prioritized in critical high-humidity cases. Second, validation dataset performance confirmed that this hybrid architecture consistently outperforms either model used in isolation.

https://acp.copernicus.org/articles/26/4771/2026/acp-26-4771-2026-f05

Figure 5Comparison of ISSR prediction performance between ERA5 and the hybrid model across four meteorological conditions. These are the predictions made on the test dataset between 400 and 200 hPa over the North Atlantic region (cf. Fig. 1) for the year 2022.

Download

https://acp.copernicus.org/articles/26/4771/2026/acp-26-4771-2026-f06

Figure 6Number of correctly predicted ISSRs by ERA5 and the hybrid model compared to IAGOS ground truth across different meteorological conditions on the 2022 test dataset. Blue bars indicate true positives from ERA5, red bars from the hybrid model, and the dashed curve represents the total number of actual ISSRs in each condition based on IAGOS observations. The largest improvement is observed in the LS. The far left bars show the overall comparison across all conditions.

Download

Firstly, the ANN component is trained to predict RH_i using the input features summarized in Table 1. Data are normalized using min–max scaling, with the RH_i range extended to 200 % to reduce sensitivity to outliers and mitigate dry bias. The final network architecture includes three hidden layers with 100 neurons each, employing He initialization and ReLU activation functions. Batch normalization and dropout regularization are applied between layers to stabilize training and reduce overfitting. The output layer uses a linear activation function to predict RH_i , denoted as RHi_ANN.

Training is conducted using the Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.001, decay rate of 5 × 10⁻³, and momentum of 0.98. The model is trained over 150 epochs with a batch size of 1024, and early stopping is employed to halt training if validation loss does not improve over 20 consecutive epochs. A custom loss function, based on the Mean Squared Error (MSE), is used to emphasize accuracy in high-RH_i conditions. The loss function is defined as:

\begin{matrix} (3) & L (y, \hat{y}) = \frac{1}{n} \sum_{i = 1}^{n} w_{i} \cdot (y_{i} - {\hat{y}}_{i})^{2}, with w_{i} = 1 + y_{i}^{α} . \end{matrix}

In Eq. (3), y_i is the target value at index i, ${\hat{y}}_{i}$ is the predicted value at index i, n is the number of samples, and α is a weight factor. The weight parameter (α) that defines the weight of the RH_i value is denoted WEIGHT_FACTOR. This weighted approach prioritizes performance in ISSRs, addressing the ERA5 dry bias. Hyperparameters were optimized via grid search on the train dataset as detailed in Table 2.

Secondly, the XGBoost model uses the same input feature set as the ANN, including engineered variables. XGBoost is an efficient implementation of gradient boosting known for its scalability, accuracy, and interpretability. The final model was trained with 100 estimators, a learning rate of 0.1, a maximum tree depth of 4, a subsampling ratio of 0.9, and a column sampling ratio per tree of 0.8. In Table A1, we report the performance comparison for a set of ML models.

4 Results

The hybrid model achieved an ETS of 0.44 and a MAE of 11.37 %. Compared to ERA5, it showed an improvement of +0.08 in ETS and −2.33 % in MAE. Fig. 4 compares scatter plots of our different models predictions on the test dataset to the baseline ERA5 estimates, see Fig. 4a. Figure 4c shows that the XGBoost model was able to correct low RH_i values with great accuracy, consequently showing strong MAE improvement. However, it struggles with higher RH_i values, resulting in poor ETS. The ANN, Fig. 4b, shows a strong ETS, indicating its ability to classify ISSRs, but it overestimates the lower RH_i values leading to a higher MAE. The hybrid model shown in Fig. 4d combines the strengths of both models, achieving a good balance between ETS and MAE.

The RH_i estimates are effectively improved by combining the strengths of both models, leveraging the ANN's ability to capture non-linear relationships and the XGBoost's robustness in handling drier conditions. The hybrid model achieves a higher ETS comparable to the ANN's while maintaining a lower MAE than both models. This hybrid approach demonstrates the potential of combining different machine learning techniques to improve predictions in complex atmospheric conditions. Table 3 summarizes the performance of all models in terms of ETS and MAE.

Table 4 presents the most important features used by the ANN and XGBoost models. As expected we find RHi_ERA5 as the most important feature. This table shows that the models relied on different features for predictions: the XGBoost model relies more heavily on features with stronger linear relationships to the target, such as the vertical temperature gradients, while the ANN is able to exploit more complex, non-linear patterns, like the time embeddings. This observation highlights the complementary nature of the two models, with the ANN capturing complex relationships (often associated with more humid regions) and the XGBoost focusing on more linear patterns.

The performance of the hybrid model relative to the baseline ERA5 are assessed for both UT and LS under clear- and cloudy-sky conditions. Results are shown in Fig. 5. The best improvement in MAE is observable in the LS, from 13.19 % to 10.71 %. This is explained because the model performs a better correction on lower RH_i values and the LS counts significantly less ISSRs (see Fig. 6). In Table 5, we report the predictions performance for each of the cases. In comparison, while QM successfully removes bias from the distribution, it does not improve point-wise predictive accuracy. Wolf et al. (2025) explicitly report that after applying QM, the “RMSE, MAE, and MSE increase unnoticeably”. In contrast, our model significantly reduces the MAE from 13.70 % to 11.37 %.

Finally, Fig. 6 presents the number of correctly predicted ISSRs for the various cases. We observe that the hybrid model correctly predicts significantly more ISSRs than ERA5 in the LS, 92 out of 204, compared to 54 out of 204 for ERA5. Overall, the hybrid model improves ISSR detection by approximately 10 % across all conditions. In the UT, the improvement is less significant, with only 4 additional ISSRs correctly classified.

5 Conclusions

In this study, we used a hybrid ensemble model to improve RH_i from ERA5 reanalysis in the UTLS using in-situ measurement from IAGOS. The dry bias of ERA5 is considerably reduced and the number of correctly predicted ISSRs is increased, especially in the LS where it nearly doubles.

The hybrid model is built on two complementary models: an XGBoost regressor that excels in dry conditions (RH_i < 85 %), and an ANN that performs better in more humid regions (RH_i > 85 %). The hybrid model demonstrated significant improvements compared to the initial ERA5 estimates, both in terms of MAE and ETS. It achieved an MAE of 11.37 %, compared to 13.7 % for ERA5, and improved the ETS from 0.36 to 0.44. The best improvement was observed in the LS, where the hybrid model increased ETS by +0.18 and reduced MAE from 13.19 % to 10.71 %. The hybrid model consistently outperformed ERA5 across all defined meteorological conditions, detecting approximately 10 % more ISSRs overall.

However, the model presents limitations, especially in predicting very high RH_i values (above 120 %), likely due to the very limited availability of such samples in the dataset. Although ISSR detection has improved, reanalysis products still exhibit limitations for supporting accurate contrail impact estimates, and similar challenges are expected to apply to forecast products required for real-time contrail avoidance, particularly given the absence of clearly established performance requirements. Future work could explore advanced downsampling techniques to improve collocation between IAGOS and ERA5 datasets despite their different spatial and temporal resolutions.

Appendix A

https://acp.copernicus.org/articles/26/4771/2026/acp-26-4771-2026-f07

Figure A1Histograms of vorticity (a) and divergence (b) for the North-Atlantic region in 2022 on the 250 hPa level. This pressure level corresponds to the UT, where ISSRs are frequently observed. The blue bars refer to all points region while the red ones refer to ISSRs only. The data are normalized using the standard deviation of all grid boxes. We compared these distributions with studies of dynamical characteristics of ISSRs from Gierens and Brinkop (2012). By ensuring that our processed data exhibit comparable statistical properties, we confirmed the reliability of our data processing pipeline.

Download

Table A1Model performance comparison across regression and classification metrics for a set of ML model available in Scikit-learn (Pedregosa et al., 2011). The train and test dataset are described in Sect. 2.3. Best values per metric are in bold.

Download Print Version | Download XLSX

Data availability

The IAGOS data are available on the IAGOS Data Portal at https://doi.org/10.25326/06 (Boulanger et al., 2018). The ERA5 reanalysis data are available from the Copernicus Climate Change Service (C3S) Climate Data Store (CDS) (https://doi.org/10.24381/cds.bd0915c6, Hersbach et al., 2023).

Supplement

The supplement related to this article is available online at https://doi.org/10.5194/acp-26-4771-2026-supplement.

Author contributions

OB and JJQ conceived and designed the experiments; MA conducted the experiments and analyzed the data; MA and JJQ wrote the manuscript; OB provided critical review.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Acknowledgements

The authors thank Ziming Wang for useful discussions on her work which has initiated our study.

Review statement

This paper was edited by Franziska Aemisegger and reviewed by two anonymous referees.

References

Alduchov, O. A. and Eskridge, R. E.: Improved Magnus Form Approximation of Saturation Vapor Pressure, J. Appl. Meteorol., 35, 601–609, https://doi.org/10.1175/1520-0450(1996)035<0601:imfaos>2.0.co;2, 1996. a

Appleman, H.: The formation of exhaust condensation trails by jet aircraft, B. Am. Meteorol. Soc., 34, 14–20, https://doi.org/10.1175/1520-0477-34.1.14, 1953. a

Borella, A., Vignon, É., Boucher, O., and Rohs, S.: An Empirical Parameterization of the Subgrid-Scale Distribution of Water Vapor in the UTLS for Atmospheric General Circulation Models, J. Geophys. Res.-Atmos., 129, https://doi.org/10.1029/2024JD040981, 2024. a

Boulanger, D., Thouret, V., and Petzold, A.: IAGOS final quality controlled Observational Data L2 – Time series, Aeris [data set], https://doi.org/10.25326/06, 2018. a

Chen, T. and Guestrin, C.: XGBoost: A Scalable Tree Boosting System, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '16, p. 785–794, ACM, https://doi.org/10.1145/2939672.2939785, 2016. a, b

Copernicus Climate Change Service: ERA5 hourly data on pressure levels from 1940 to present, Climate Data Store [data set], https://doi.org/10.24381/cds.bd0915c6, 2025. a

Diao, M., Zondlo, M. A., Heymsfield, A. J., Avallone, L. M., Paige, M. E., Beaton, S. P., Campos, T., and Rogers, D. C.: Cloud-scale ice-supersaturated regions spatially correlate with high water vapor heterogeneities, Atmos. Chem. Phys., 14, 2639–2656, https://doi.org/10.5194/acp-14-2639-2014, 2014. a

Diao, M., Jensen, J. B., Pan, L. L., Homeyer, C. R., Honomichl, S., Bresch, J. F., and Bansemer, A.: Distributions of ice supersaturation and ice crystals from airborne observations in relation to upper tropospheric dynamical boundaries, J. Geophys. Res.-Atmos., 120, 5101–5121, https://doi.org/10.1002/2015JD023139, 2015a. a

Gettelman, A., Fetzer, E. J., Eldering, A., and Irion, F. W.: The global distribution of supersaturation in the upper troposphere from the Atmospheric Infrared Sounder, J. Climate, 19, 6089–6103, https://doi.org/10.1175/JCLI3955.1, 2006. a

Gierens, K. and Brinkop, S.: Dynamical characteristics of ice supersaturated regions, Atmos. Chem. Phys., 12, 11933–11942, https://doi.org/10.5194/acp-12-11933-2012, 2012. a, b, c

Gierens, K. and Spichtinger, P.: On the size distribution of ice-supersaturated regions in the upper troposphere and lowermost stratosphere, Ann. Geophys., 18, 499–504, https://doi.org/10.1007/s00585-000-0499-7, 2000. a

Gierens, K., Schumann, U., Helten, M., Smit, H., and Marenco, A.: A distribution law for relative humidity in the upper troposphere and lower stratosphere derived from three years of MOZAIC measurements, Ann. Geophys., 17, 1218–1226, https://doi.org/10.1007/s00585-999-1218-7, 1999. a, b

Gierens, K., Kohlhepp, R., Spichtinger, P., and Schroedter-Homscheidt, M.: Ice supersaturation as seen from TOVS, Atmos. Chem. Phys., 4, 539–547, https://doi.org/10.5194/acp-4-539-2004, 2004. a

Groß, S., Wirth, M., Schäfler, A., Fix, A., Kaufmann, S., and Voigt, C.: Potential of airborne lidar measurements for cirrus cloud studies, Atmos. Meas. Tech., 7, 2745–2755, https://doi.org/10.5194/amt-7-2745-2014, 2014. a

Hegglin, M. I., Tegtmeier, S., Anderson, J., Froidevaux, L., Fuller, R., Funke, B., Jones, A., Lingenfelser, G., Lumpe, J., Pendlebury, D., Remsberg, E., Rozanov, A., Toohey, M., Urban, J., von Clarmann, T., Walker, K. A., Wang, R., and Weigel, K.: SPARC Data Initiative: Comparison of water vapor climatologies from international satellite limb sounders, J. Geophys. Res., 118, 11824–11846, https://doi.org/10.1002/jgrd.50752, 2013. a

Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.: The ERA5 global reanalysis, Q. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803, 2020. a

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., and Thépaut, J.-N.: ERA5 hourly data on pressure levels from 1940 to present, Copernicus Climate Change Service (C3S) Climate Data Store (CDS), https://doi.org/10.24381/cds.bd0915c6, 2023. a

Heymsfield, A. J., Miloshevich, L. M., Twohy, C., Sachse, G., and Oltmans, S.: Upper-tropospheric relative humidity observations and implications for cirrus ice nucleation, Geophys. Res. Lett., 25, 1343–1346, https://doi.org/10.1029/98GL01089, 1998. a

Hofer, S. M. and Gierens, K. M.: Kinematic properties of regions that can involve persistent contrails over the North Atlantic and Europe during April and May 2024, Atmos. Chem. Phys., 25, 6843–6856, https://doi.org/10.5194/acp-25-6843-2025, 2025. a

Kärcher, B.: Formation and radiative forcing of contrail cirrus, Nat. Commun., 9, 1824, https://doi.org/10.1038/s41467-018-04068-0, 2018. a

Kärcher, B. and Corcos, M.: On the lifetimes of persistent contrails and contrail cirrus, manuscript submitted to J. Geophys. Res-Atmos., 2025. a

Kingma, D. and Ba, J.: Adam: A Method for Stochastic Optimization, International Conference on Learning Representations, http://arxiv.org/abs/1412.6980 (last access: 31 July 2025), 2014. a

Krüger, K., Schäfler, A., Wirth, M., Weissmann, M., and Craig, G. C.: Vertical structure of the lower-stratospheric moist bias in the ERA5 reanalysis and its connection to mixing processes, Atmos. Chem. Phys., 22, 15559–15577, https://doi.org/10.5194/acp-22-15559-2022, 2022. a

Lamquin, N., Stubenrauch, C. J., Gierens, K., Burkhardt, U., and Smit, H.: A global climatology of upper-tropospheric ice supersaturation occurrence inferred from the Atmospheric Infrared Sounder calibrated by MOZAIC, Atmos. Chem. Phys., 12, 381–405, https://doi.org/10.5194/acp-12-381-2012, 2012. a

Lee, D., Fahey, D., Skowron, A., Allen, M., Burkhardt, U., Chen, Q., Doherty, S., Freeman, S., Forster, P., Fuglestvedt, J., Gettelman, A., De León, R., Lim, L., Lund, M., Millar, R., Owen, B., Penner, J., Pitari, G., Prather, M., Sausen, R., and Wilcox, L.: The contribution of global aviation to anthropogenic climate forcing for 2000 to 2018, Atmos. Environ., 244, 117834, https://doi.org/10.1016/j.atmosenv.2020.117834, 2021. a

Li, Y., Mahnke, C., Rohs, S., Bundke, U., Spelten, N., Dekoutsidis, G., Groß, S., Voigt, C., Schumann, U., Petzold, A., and Krämer, M.: Upper-tropospheric slightly ice-subsaturated regions: frequency of occurrence and statistical evidence for the appearance of contrail cirrus, Atmos. Chem. Phys., 23, 2251–2271, https://doi.org/10.5194/acp-23-2251-2023, 2023. a

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.: Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., 12, 2825–2830, 2011. a

Petzold, A., Thouret, V., Gerbig, C., Zahn, A., Brenninkmeijer, C. A. M., Gallagher, M., Hermann, M., Pontaud, M., Ziereis, H., Boulanger, D., Marshall, J., Nédélec, P., Smit, H. G. J., Friess, U., Flaud, J.-M., Wahner, A., Cammas, J.-P., Volz-Thomas, A., and Team, I.: Global-scale atmosphere monitoring by in-service aircraft – current achievements and future prospects of the European Research Infrastructure IAGOS, Tellus B, 67, 28452, https://doi.org/10.3402/tellusb.v67.28452, 2015. a, b

Platt, J. C., Shapiro, M. L., Engberg, Z., McCloskey, K., Geraedts, S., Sankar, T., Stettler, M. E. J., Teoh, R., Schumann, U., Rohs, S., Brand, E., and Van Arsdale, C.: The effect of uncertainty in humidity and model parameters on the prediction of contrail energy forcing, Environmental Research Communications, 6, 095015, https://doi.org/10.1088/2515-7620/ad6ee5, 2024. a

Reutter, P., Neis, P., Rohs, S., and Sauvage, B.: Ice supersaturated regions: properties and validation of ERA-Interim reanalysis with IAGOS in situ water vapour measurements, Atmos. Chem. Phys., 20, 787–804, https://doi.org/10.5194/acp-20-787-2020, 2020. a

Rolf, C., Rohs, S., Smit, H., Krämer, M., Bozóki, Z., Hofmann, S., Franke, H., Maser, R., Hoor, P., and Petzold, A.: Evaluation of compact hygrometers for continuous airborne measurements, Meteorol. Z., 33, https://doi.org/10.1127/metz/2023/1187, 2023. a, b

Rosenow, J., Fricke, H., Luchkova, O., and Schultz, M.: Minimizing contrail formation by rerouting around dynamic ice-supersaturated regions, Aeronautics and Aerospace Open Access Journal, 2, 105–111, https://doi.org/10.15406/aaoaj.2018.02.00039, 2018. a

Schmidt, E.: Die Entstehung von Eisnebel aus den Auspuffgasen von Flugmotoren, Eintrag von Ulrich Schumann, zur Sicherstellung des Zugangs zu diesem wissenschaftshistorisch wichtigen Dokument; mit Zustimmung des Rechteinhabers, Nachfolger des Oldenbourg Verlags, 1941. a

Schumann, U.: On conditions for contrail formation from aircraft exhausts, DLR Forschungsbericht DLR-FB 1996-10, Deutsches Zentrum für Luft- und Raumfahrt (DLR), https://elib.dlr.de/8302/ (last access: 31 July 2025), 1996. a

Sonntag, D.: Advancements in the field of hygrometry, Meteorol. Z., 3, 51–66, https://doi.org/10.1127/metz/3/1994/51, 1994. a

Spichtinger, P., Gierens, K., Leiterer, U., and Dier, H.: Ice supersaturation in the tropopause region over Lindenberg, Germany, Meteorol. Z., 12, 143–156, https://doi.org/10.1127/0941-2948/2003/0012-0143, 2003a. a

Spichtinger, P., Gierens, K., and Read, W.: The global distribution of ice-supersaturated regions as seen by the Microwave Limb Sounder, Q. J. Roy. Meteor. Soc., 129, 3391–3410, https://doi.org/10.1256/qj.02.141, 2003b. a

Stenke, A., Grewe, V., and Ponater, M.: Lagrangian transport of water vapor and cloud water in the ECHAM4 GCM and its impact on the cold bias, Clim. Dynam., 31, 491–506, https://doi.org/10.1007/S00382-007-0347-5, 2008. a

Teoh, R., Schumann, U., Gryspeerdt, E., Shapiro, M., Molloy, J., Koudis, G., Voigt, C., and Stettler, M. E. J.: Aviation contrail climate effects in the North Atlantic from 2016 to 2021, Atmos. Chem. Phys., 22, 10919–10935, https://doi.org/10.5194/acp-22-10919-2022, 2022. a

Teoh, R., Engberg, Z., Schumann, U., Voigt, C., Shapiro, M., Rohs, S., and Stettler, M. E. J.: Global aviation contrail climate effects from 2019 to 2021, Atmos. Chem. Phys., 24, 6071–6093, https://doi.org/10.5194/acp-24-6071-2024, 2024. a

Wang, X., Wolf, K., Boucher, O., and Bellouin, N.: Radiative Effect of Two Contrail Cirrus Outbreaks Over Western Europe Estimated Using Geostationary Satellite Observations and Radiative Transfer Calculations, Geophys. Res. Lett., 51, https://doi.org/10.1029/2024gl108452, 2024. a

Wang, Z., Bugliaro, L., Gierens, K., Hegglin, M. I., Rohs, S., Petzold, A., Kaufmann, S., and Voigt, C.: Machine learning for improvement of upper-tropospheric relative humidity in ERA5 weather model data, Atmos. Chem. Phys., 25, 2845–2861, https://doi.org/10.5194/acp-25-2845-2025, 2025. a, b, c, d, e, f, g, h

Wolf, K., Bellouin, N., and Boucher, O.: Long-term upper-troposphere climatology of potential contrail occurrence over the Paris area derived from radiosonde observations, Atmos. Chem. Phys., 23, 287–309, https://doi.org/10.5194/acp-23-287-2023, 2023. a

Wolf, K., Bellouin, N., Boucher, O., Rohs, S., and Li, Y.: Correction of ERA5 temperature and relative humidity biases by bivariate quantile mapping for contrail formation analysis, Atmos. Chem. Phys., 25, 157–181, https://doi.org/10.5194/acp-25-157-2025, 2025. a, b, c

Zahn, A., Christner, E., van Velthoven, P. F. J., Rauthe-Schöch, A., and Brenninkmeijer, C. A. M.: Processes controlling water vapor in the upper troposphere/lowermost stratosphere: An analysis of 8 years of monthly measurements by the IAGOS-CARIBIC observatory, J. Geophys. Res., 119, 11505–11525, https://doi.org/10.1002/2014JD021687, 2014. a

Živković, Ž., Mihajlović, I., and Nikolić, D.: Artificial neural network method applied on the nonlinear multivariate problems, Serbian Journal of Management, 4, 143–155, 2009. a

Articles

Download

Article (5632 KB)
Full-text XML

Short summary

Aviation impacts climate by forming contrails that trap heat and can persist for hours at cruising altitudes. Forecasting these humid regions is difficult, as satellites lack accuracy, aircraft data are limited, and ERA5 reanalysis has random errors. This study presents a hybrid machine learning method that corrects ERA5 with aircraft data, using decision trees in dry air and neural networks in humid air. It improves relative humidity predictions, especially in the lower stratosphere.