A novel method of comparison between an atmospheric model and satellite probabilistic estimates of relative humidity (RH) in the tropical atmosphere is presented. The method is developed to assess the Météo-France numerical weather forecasting model ARPEGE (Action de Recherche Petite Echelle Grande Echelle) using probability density functions (PDFs) of RH estimated from the SAPHIR (Sondeur Atmosphérique du Profil d'Humidité Intertropicale par Radiométrie) microwave sounder. The satellite RH reference is derived by aggregating footprint-scale probabilistic RH to match the spatial and temporal resolution of ARPEGE over the April–May–June 2018 period. The probabilistic comparison is discussed with respect to a classical deterministic comparison confronting each model RH value to the reference average and using a set confidence interval. This study first documents the significant spatial and temporal variability in the reference distribution spread and shape. We demonstrate the need for a finer assessment at the individual case level to characterize specific situations beyond the classical bulk comparison using determinist “best” reference estimates. The probabilistic comparison allows for a more contrasted assessment than the deterministic one. Specifically, it reveals cases where the ARPEGE-simulated values falling within the deterministic confidence range actually correspond to extreme departures in the reference distribution, highlighting the shortcomings of the too-common Gaussian assumption of the reference, on which most current deterministic comparison methods are based.
Fundamental drivers of the climate system variability, such as atmospheric water cycle, are still not well understood. They are associated with uncertainties that hamper climate predictions with consequences for society. An essential ingredient of the Earth's hydrological cycle, water vapor is the principal greenhouse gas and exerts a fundamental control on the distribution of temperature (Held and Soden, 2000; Pierrehumbert, 2011; Allan, 2012; Stevens and Bony, 2013). The radiative importance of the atmospheric water in maintaining the thermal energy balance of the Earth system is undebated. The connection between temperature, water vapor and infrared radiation creates a positive feedback that further warms the global climate from an external forcing (Hartmann et al., 2013). In addition, cloud–moisture interactions and their associated processes are diverse (Bony et al., 2015; Sherwood et al., 2010; Sherwood et al., 2020), and their representation in numerical models bears strong constraints on the local scales of weather forecasts and on global climate sensitivity (Stevens and Schwartz, 2012).
The accuracy of meteorological forecasts and climate projections relies on parametrization schemes or model physics. Assessing their accuracy is routinely performed by comparing the simulated geophysical fields to an observed reference derived from ground-based measurements or remote sensing techniques (Randall et al., 2007). When considering remote sensing techniques as a reference, the comparison to numerical simulations may be performed in either geophysical or observation space, each one being associated with its own uncertainties. In the geophysical space, the model geophysical variables are evaluated directly against remote sensing estimations based on a retrieval scheme. This retrieval scheme can be an inversion algorithm that relies on incomplete representations of the atmospheric variability (see for instance Solheim et al., 1998; Aires et al., 2002; and Roy et al., 2020). In the observation (e.g., radiance) space, a forward model is used to convert the simulated atmosphere into synthetic remote sensing measurements (Morcrette, 1991; Soden and Bretherton, 1994; Brogniez et al., 2005; Chepfer et al., 2008; Bodas-Salcedo et al., 2011; Jiang et al., 2012; Tian et al., 2013; Steiner et al., 2018). This model-to-satellite approach relies on the accuracy of the forward model to simulate remote sensing observations for a given atmospheric state (Weng, 2007), while strong uncertainties may remain (Geer and Baordo, 2014; Brogniez et al., 2016).
In any case, the comparisons usually involve spatial and/or temporal averaging, sometimes involving error bars or the use of averaging kernels to smooth models or in situ profiles relative to the vertical resolution of the satellite measurement (Rodgers and Connor, 2003). Moreover, common assessment practices typically use bulk comparison metrics (e.g., correlation, bias) to assess performances over a given spatial and temporal domain.
The present work focuses on atmospheric relative humidity (RH). There is an extensive body of literature on the use of relative humidity estimated by spaceborne instruments to evaluate climate models (among others Soden and Bretherton, 1994; Brogniez et al., 2005; John and Soden, 2006; Jiang et al., 2012; Tian et al., 2013; and Steiner et al., 2018). However, the comparison generally provides limited insight in their error characteristics for several reasons:
First, an objective assessment requires an independent reference, which may
not be verified when satellite remote sensing observations that are already
incorporated in the model via an assimilation step are re-used to assess its
accuracy. Second, metrics such as correlation and bias are often applied without
necessarily checking the relevance of such criteria. For example, the
magnitude of the bias as an additive model-to-reference difference may be
challenging to assess objectively at the primary satellite scale. The linear
correlation is generally insufficient to describe the non-linear and
heteroscedastic dependence structure between the model estimates and the
reference. Third, the model product is often assumed to be uniform and display
homogeneous properties over the spatial and temporal domain of comparison.
Bulk metrics such as correlation and bias are computed over samples that
actually gather a variety of atmospheric situations (vertical structure,
moisture, etc.) for which the model is likely to behave differently through
its assumptions. Hence bulk error metrics lack specificity and depict
averaged space and time properties, while the errors tend to be non-stationary
and sensitive to parameters not accounted for in the assessment formulation.
Therefore, the representativeness of any deterministic assessment of model
RH is confined to the time and space domain over which it is performed, with
limited extension over other regimes, regions, seasons, etc. These issues
are not confined to the study of RH but are, to an extent, common to those
of all geophysical variables (see for instance Kirstetter et al., 2020, for a
discussion on precipitation).
A probabilistic description of the reference RH is most appropriate to acknowledge the possible range of reference values. This approach also explicitly accounts for deterministic uncertainties, making the diagnosis more documented and precise, ultimately contributing to the improvement of climate and weather forecasting models. This paper presents an assessment of the simulated RH using such a probabilistic approach. The method is developed and tested to assess a sample of simulations of the global model ARPEGE (Action de Recherche Petite Echelle Grande Echelle), the numerical weather forecasting system developed by Météo-France (the French national weather service; Bouyssel et al., 2021). For this assessment, density functions of reference RH are derived from the brightness temperatures measured by SAPHIR (Sondeur Atmosphérique du Profil d'Humidité Intertropicale par Radiométrie), the microwave sounder on board the Megha-Tropiques satellite orbiting over the tropical belt (Roca et al., 2015).
This paper is divided into five sections. The datasets and the matching procedure between SAPHIR probabilistic relative humidity (RH) estimates and ARPEGE simulations are presented in Sect. 2. The probabilistic method is introduced and confronted with the deterministic comparison in Sect. 3. Section 4 discusses the results of the two comparison methods and the added value of the probabilistic method. Concluding remarks are then drawn in Sect. 5.
ARPEGE 6-hourly instantaneous RH fields simulated at 6 h lead time for the months April–May–June 2018 serve as a test bed for evaluating the numerical weather forecast model.
SAPHIR is the microwave moisture sounder instrument on board the
Megha-Tropiques satellite, which has been observing the tropical (30
The measured brightness temperatures (BTs) are translated into RH profiles
for clear-sky conditions as well as cloud-covered situations as soon as
cloud hydrometeors are small enough to not scatter the upwelling microwave
radiation significantly. These conditions are associated with deep
convection, with or without overshoots, and are detected from the BTs
following Hong et al. (2005) and Greenwald and Christopher (2002).
Therefore, RH profiles are estimated for every footprint of SAPHIR if no
deep convection is detected. The RH profiles are made of six relatively wide
atmospheric layers ranging between 950 and 100 hPa (100–200, 250–350,
400–600, 650–700, 750–800, 850–950 hPa) defined from an analysis of
the channels' weighting functions (Sivira et al., 2015). The retrieval of RH
profiles is based on a multivariate regression scheme that provides the
parameters (
As detailed in Table 1 of Brogniez et al. (2016), the bulk standard errors in the dataset lie in the range of 3.6 % RH–14.8 % RH, depending on the pressure range (3.6 % RH for layer 250–350 hPa, 15.8 % RH for layer 750–800 hPa). These have been estimated using oceanic and continental radiosoundings co-located with satellite overpasses. Stevens et al. (2017) also highlighted the role of the vertical inhomogeneities in the discrepancies, strong gradients of moisture being the most difficult to capture by the passive sensors.
The ARPEGE model is the operational global model operated by
Météo-France since 1992 (Bouyssel et al., 2021). This model is
characterized by a stretched and tilted horizontal grid and by a hybrid-pressure terrain-following vertical coordinate system. The vertical grid is
composed of 105 levels, and the mesh of the horizontal grid has a 5 km
resolution over Europe and a 24 km resolution elsewhere. Forecasts are
initialized with a four-dimensional variational system (Courtier et al.,
1991) with 6 h windows and run up to a
For the purpose of this study, the 6-hourly forecasts of atmospheric RH have
been projected on a regular horizontal 0.25
SAPHIR footprints' probability density functions (PDF
Within each model grid box, all the footprints' PDF
The differences and similarities between the deterministic and the
probabilistic comparison approaches are illustrated in Fig. 2. At any
given pixel and time step, the ARPEGE model value is noted RH
Single-grid-box example (grid point situated 24.75 to 25
The value
In order to compare the probabilistic method to a more classic approach, a
simple deterministic comparison is used as a benchmark. The mean reference
value
Compared to a deterministic comparison between ARPEGE's RH
The deterministic comparison and the CDF-based comparison are applied to each ARPEGE grid point. Figure 2 illustrates further the complementarity of the two approaches for a representative case.
For any given ARPEGE grid box the values
In the example shown in Fig. 2, ARPEGE's RH
As underlined in Sect. 2.1, the retrieval of RH profiles from SAPHIR
measurements is performed for both clear sky and cloud-covered areas to the
extent that scattering by large hydrometeors produced by convective activity
is negligible (Greenwald and Christopher, 2002; Hong et al., 2005).
Therefore, all ARPEGE grid boxes associated with rainfall rates strictly
above 0 mm h
The comparison method is applied to a spatiotemporal domain covering the
tropical belt over 3 months (April–May–June 2018).
Diagram presenting the spatial and temporal aggregation method for
a single grid point (blue square when the grid box was passed over by SAPHIR,
red when not and/or filtered out).
In deterministic comparison settings, the uncertainty may be defined based
on a priori assumptions, instrumental biases and retrieval errors. This
uncertainty is often assumed to be multiplicative with respect to the reference
value, and it is assumed that the underlying density function is unimodal, symmetric and
follows a Gaussian model so that the uncertainty is defined as a standard
deviation. However, a Gaussian model would not have been adapted to the
dataset. A Shapiro–Wilk test is run with each and every individual PDF of
the dataset (with
The smaller the IQR, the narrower the distribution and the smaller the uncertainty in the reference. Values of the IQR can be compared with the deterministic 15 % RH uncertainty. If the IQR is greater (lower, respectively) than 15 % RH, then the reference distribution is broader (narrower, respectively) than assumed with a set 15 % RH error.
Figure 4a and b show an example of reference
The spatial distribution of the IQR (Fig. 4a) shows contrasted areas that match
the patterns of
The comparison of the two uncertainty approaches over the whole period (Fig. 4c) confirms the spatial correlation between the IQR and the classical patterns of the humidity field. This is particularly visible around the South Pacific and South Atlantic highs, where the proportion of IQR under the 15 % RH threshold reaches 70 % and even 100 %. In these subsiding areas, the atmospheric RH is ruled by large-scale processes that have little to no instantaneous variability at the scale of our grid. This results in more homogeneous conditions within the same grid box, which explains the narrower distributions of the retrievals (i.e., smaller IQR). The Intertropical Convergence Zone (ITCZ) appears through areas of low proportion (0 % to 30 % of the retrievals' dataset) of under 15 % RH IQR. High dynamics characterizing this zone result in smaller-scale processes that impact the RH field and result in heterogeneous conditions within the same grid box and larger IQR. Most importantly, IQR varies across space and time, and it can be partly linked to the RH field and explained by large- and fine-scale processes. These highlight the need for a comparison method that exploits and takes into consideration the variability in the dataset and adapts the comparison to each situation.
One can note that while these results vary significantly depending on the atmospheric layer, they are coherent with the expected RH field patterns. For example, in the upper two atmospheric layers (100–200 and 250–350 hPa), the homogeneous dryer conditions result in almost all retrieval distributions having IQR under 15 % RH. The two lower layers (750–800 and 850–950 hPa), closer to the ground, show strong ocean–continent contrasts. This contrast shows the difference in processes that depend on the surface, with extremely low frequencies of IQR under 15 % RH above the continents. This suggests that the lower the layer, the wider the distribution of retrieved RH.
In all cases where the IQR is above 15 % RH, flattened and possibly
non-Gaussian RH distributions may result in non-representative
Figure 5 shows the comparison results between the reference RH
Comparisons between SAPHIR's
The comparisons are performed using wide discrete color bars in order to
discuss the complementarity of the methods and not specific issues of the
model. The deterministic approach shows that a majority of
RH
In short, the probabilistic method is consistent with the deterministic
comparison on extreme biases and adds more information for cases where
RH
The two methods are applied to the entire dataset over the period from
April to June 2018. The distributions of each method's results are
represented as a histogram (Fig. 6a) and rank histogram diagram, also known as a Talagrand diagram
(Hamill, 2001; Wilks, 2011; Kirstetter et al., 2015; Fig. 6b). Note that
while the Talagrand diagram is often used to assess ensemble forecasts by
comparing a single reference to a distribution of forecast ensembles, its
interpretation is similar when comparing a single simulated value to a
reference distribution. This graphical method illustrates where the ARPEGE's
RH
Distribution of the results from each comparison method applied
to all layers over the period April to June 2018:
As seen in Fig. 6a, the deterministic difference
The comparisons are performed independently for each atmospheric layer in
Fig. 7. The distributions are represented as boxplots in Fig. 7, with the
width of the box defined by the first and third quartiles, and the whiskers indicate the most extreme values but with their length limited to 1.5
Distribution the results of each method of comparison applied over
the period April to June 2018 represented as boxplots for each layer:
The distributions of the deviations from the mean
The distributions of the associated percentiles are wider (Fig. 7b) and
offer a deeper understanding. The comparison results are divided into three
categories that follow the deviation from the mean intervals. They are drawn
separately to highlight the consistency of the two methods' extreme results.
The ARPEGE's RH
Part (in percent) of the distributions within the three categories of
differences
An added value of the probabilistic approach resides in the contrast and
variability in the results within the
The 400–600 hPa layer has the narrowest distribution of deviations from the
mean within the
Deterministic comparisons for layer 400–600 hPa over the period from April to
June 2018.
The two maps in Fig. 8 show the results of the deterministic approach in
terms of the average deviation from
The majority of the average deviations
The results of this method reveal a slight moist bias in the convective zones but mostly validate ARPEGE simulations everywhere else outside these areas.
Maps of the probabilistic method for the layer 400–600 hPa applied
over the period from April to June 2018.
Figure 9 represents maps of the probabilistic comparison method in terms of
spatial distribution of the mode
The probabilistic comparison method highlights a majority of contrasted
extreme values, which indicates a high probability of ARPEGE's RH
The red patch located south of the African continent (Fig. 9a) indicates a
recurring underestimation of the model, with RH
These various problematic areas do not particularly stand out when solely using a deterministic comparison approach. The probabilistic method allows for a more contrasted and detailed assessment. Note that the analysis of the results with regard to the model specificities, such as its parameterization of convection, are outside the scope of this paper.
This paper showcases the importance of considering all the reference information content through a probabilistic approach that considers the reference distribution to assess ARPEGE model simulations. The probabilistic reference is derived from finer-scale RH estimates aggregated into a probability density function at the ARPEGE spatial resolution. In widely used deterministic comparison approaches, the reference distribution is only considered through its first moment (and sometimes its second moment). Moreover, nowadays, a lot of satellite products offer a second moment that enables intercomparison studies. However, the propagation of uncertainties assumes a Gaussian distribution, which is not the case here. We developed a probabilistic approach for the retrieval of RH that gets rid of such assumptions.
The improved assessment with the probabilistic approach is demonstrated by
comparing the insights obtained on ARPEGE with those from a deterministic
method involving the difference
Initial results highlight the inherent inaccuracy of solely using averaged references due to the important variability in spread and shape of the reference estimates. By computing the inter-quartile range (IQR) for the whole reference dataset, it was found that the spread of the PDFs varies significantly and is linked to the RH magnitude, with wider distributions in moist areas and narrower distributions in drier conditions. A deterministically set confidence interval is relevant to the variability in the spread to some extent only. This promotes a comparison method that quantifies more precisely the deviation of the simulated value irrespective of the reference distribution variability, spread and shape.
Both deterministic and probabilistic methods were confronted in a single
time step and over the 3-month period. Most RH values simulated by ARPEGE
fit within the
Overall, the probabilistic comparison allows a more contrasted and complete assessment. The bias structures that are revealed fit known humidity patterns. A more complete analysis with regard to the model's specificities could help highlight areas of improvement. The method presented here can be generalized to different models, variables and observations.
The underlying codes were developed for research and are not suited for direct implementation. Their specificities may be shared and/or discussed upon request from the author and acknowledged colleagues.
SAPHIR data are available through the AERIS/ICARE ground segment of Megha-Tropiques (
The supplement related to this article is available online at:
The present work is the result of an original idea of HB and PEK, developed by CR under the supervision of HB, PEK and PC. HB provided and added her expertise on the SAPHIR dataset, PC on the ARPEGE dataset and PEK on the statistical method itself. All authors discussed the results and contributed to the final paper.
The contact author has declared that neither they nor their co-authors have any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the special issue “Analysis of atmospheric water vapour observations and their uncertainties for climate applications (ACP/AMT/ESSD/HESS inter-journal SI)”. It is not associated with a conference.
We thank the CNES for its financial support through the Megha-Tropiques project and the national AERIS data center, which hosts the satellite data. Christophe Dufour (LATMOS/IPSL) contributed to the data processing. Computing resources of the ESPRI IPSL mesocenter were greatly appreciated. Pierre Kirstetter acknowledges support from the NASA Global Precipitation Measurement Ground Validation program under grant NNX16AL23G. We would also like to thank Joern Ungermann and the anonymous referee, whose insightful comments helped us convey our work in the clearest way.
Financial support was partly brought by the CNES French Space Agency through the Megha-Tropiques project.
This paper was edited by Martina Krämer and reviewed by Joern Ungermann and one anonymous referee.