the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Technical note: Applicability of physics-based and machine-learning-based algorithms of a geostationary satellite in retrieving the diurnal cycle of cloud base height
Mengyuan Wang
Han Lin
Yongen Liang
Binlong Chen
Zhigang Yao
Na Xu
Miao Zhang
Two groups of retrieval algorithms, physics based and machine learning (ML) based, each consisting of two independent approaches, have been developed to retrieve cloud base height (CBH) and its diurnal cycle from Himawari-8 geostationary satellite observations. Validations have been conducted using the joint CloudSat/Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) CBH products in 2017, ensuring independent assessments. Results show that the two ML-based algorithms exhibit markedly superior performance (the optimal method is with a correlation coefficient of R > 0.91 and an absolute bias of approximately 0.8 km) compared to the two physics-based algorithms. However, validations based on CBH data from the ground-based lidar at the Lijiang station in Yunnan Province and the cloud radar at the Nanjiao station in Beijing, China, explicitly present contradictory outcomes (R < 0.60). An identifiable issue arises with significant underestimations in the retrieved CBH by both ML-based algorithms, leading to an inability to capture the diurnal cycle characteristics of CBH. The strong consistence observed between CBH derived from ML-based algorithms and the spaceborne active sensors of CloudSat/CALIOP may be attributed to utilizing the same dataset for training and validation, sourced from the CloudSat/CALIOP products. In contrast, the CBH derived from the optimal physics-based algorithm demonstrates good agreement in diurnal variations in CBH with ground-based lidar/cloud radar observations during the daytime (with an R value of approximately 0.7). Therefore, the findings in this investigation from ground-based observations advocate for the more reliable and adaptable nature of physics-based algorithms in retrieving CBH from geostationary satellite measurements. Nevertheless, under ideal conditions, with an ample dataset of spaceborne cloud profiling radar observations encompassing the entire day for training purposes, the ML-based algorithms may hold promise for still delivering accurate CBH outputs.
- Article
(9646 KB) - Full-text XML
-
Supplement
(4191 KB) - BibTeX
- EndNote
Clouds, comprising visible aggregates like atmospheric water droplets, supercooled water droplets, ice crystals, etc., cover roughly 70 % of the Earth's surface (Stubenrauch et al., 2013). They play a pivotal role in global climate change, the hydrometeor cycle, and aviation safety and serve as a primary focus in weather forecasting and climate research, particularly storm clouds (Hansen, 2007; Hartmann and Larson, 2002). From advanced geostationary (GEO) and polar-orbiting (low-Earth orbit, LEO) satellite imagers, various measurable cloud properties, such as cloud fraction, cloud phase, cloud top height (CTH), and cloud optical thickness (DCOT), are routinely retrieved. However, high-quality cloud geometric height (CGH) and cloud base height (CBH), a fundamental macrophysical parameter delineating the vertical distribution of clouds, remain relatively understudied and underreported. Nonetheless, for boundary-layer clouds, the cloud base height stands as a critical parameter depending on other cloud-controlling variables. These variables encompass the cloud base temperature (Zhu et al., 2014), cloud base vertical velocity (Zheng et al., 2020), activation of cloud condensation nuclei (CCN) at the cloud base (Rosenfeld et al., 2016; Miller et al., 2023), and cloud–surface decoupling state (Su et al., 2022). These factors significantly impact convective cloud development and ultimately the climate.
There are distinct diurnal cycle characteristics of clouds in different regions across the globe (Li et al., 2022). These diurnal cycle characteristics primarily stem from the daily solar energy cycle absorbed by both the atmosphere and Earth's surface. Moreover, vertical atmospheric motions are shaped by imbalances in atmospheric heating and surface configurations, also leading to a range of cloud movements and structures (Miller et al., 2018). Cloud base plays a pivotal role in weather and climate processes. It is critical for predicting fog and cloud-related visibility issues important in aviation and weather forecasting. For instance, lower cloud bases often lead to more intense rainfall. In climate modeling, CBH is integral for accurate long-term weather predictions and understanding the radiative balance of the Earth, which influences global temperatures (Zheng and Rosenfeld, 2015). Hence, the accurate determination of CBH and its diurnal cycle with high spatiotemporal resolution becomes very important, necessitating comprehensive investigations (Viúdez-Mora et al., 2015; Wang et al., 2020). Such efforts can provide deeper insights into the potential ramifications of clouds for radiation equilibrium and global climate systems.
However, as one of the most crucial cloud physical parameters in atmospheric physics, CBH poses challenges in terms of measurement or estimation from space. Presently, the primary methods for measuring CBH rely on ground-based observations, utilizing tools such as sounding balloons, Mie-scattering lidars, stereo-imaging cloud height detection technologies, and cloud probe sensors (Forsythe et al., 2000; Hirsch et al., 2011; Seaman et al., 2017; Zhang et al., 2018; Zhou et al., 2019, 2024). While in situ ground-based observation methods offer highly accurate, reliable, and timely continuous CBH results, they are constrained by localized observation coverage and the sparse distribution of observation sites (Aydin and Singh, 2004). In recent decades, with the rapid advancement of meteorological satellite observation technology, spaceborne observing methods that provide global cloud observations with high spatiotemporal resolution compared to conventional ground-based remote sensing methods have emerged. In this realm, satellite remote sensing techniques for measuring CBH fall primarily into two categories: active and passive methods. Advanced active remote sensing technologies like CloudSat (Stephens et al., 2002) and the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) (Winker et al., 2009) in the National Aeronautics and Space Administration (NASA) A-Train (Afternoon Train) series (Stephens et al., 2002) can capture global cloud profiles, including CBH, with high quality by detecting unique return signals from cloud layers using onboard active millimeter-wave radar or lidar. However, their viewing footprints are limited along the nadir of the orbit, implying that observation coverage remains confined primarily to a horizontal scale (Min et al., 2022; Lu et al., 2021).
In addition to active remote sensing methods, satellite-based passive remote sensing technologies can also play an important role in estimating CBH (Meerkötter and Bugliaro, 2009; Lu et al., 2021). The physics-based principles and retrieval methods for CTH have reached maturity and are now widely employed in the satellite passive remote sensing field (Heidinger and Pavolonis, 2009; Wang et al., 2022). However, the corresponding physical principles or methods for measuring CBH using satellite passive imager measurements are still not entirely clear and unified (Heidinger et al., 2019; Min et al., 2020). A recent study by Yang et al. (2021) utilized oxygen A-band data observed by the Orbiting Carbon Observatory-2 (OCO-2) to retrieve single-layer marine liquid CBH. These abovementioned passive space-based remote sensing methods, such as satellite imagery, play a key role in retrieving CBH. In terms of detection principles, the first method involves the extrapolation technique for retrieving CBH for clouds of the same type. For instance, Wang et al. (2012) proposed a method to extrapolate CBH from CloudSat using spatiotemporally matched Moderate Resolution Imaging Spectroradiometer (MODIS) cloud classification data (Baum et al., 2012; Platnick et al., 2017). The second physics-based retrieval method first approximates the cloud geometric thickness using its optical thickness. It then employs the previously derived CTH product to compute the corresponding CBH using the respective National Oceanic and Atmospheric Administration (NOAA) Suomi National Polar-orbiting Partnership/Visible Infrared Imaging Radiometer Suite (SNPP/VIIRS) products (Noh et al., 2017). Hutchison et al. (2006) and Hutchison (2002) also formulated an empirical algorithm that estimates both cloud geometric thickness (CGT) and CBH. This algorithm relies on statistical analyses derived from MODIS DCOT and cloud liquid water path products (Hutchison et al., 2006; Hutchison, 2002).
Machine learning (ML) has proven to be highly effective in addressing nonlinear problems within remote sensing and meteorology fields, such as precipitation estimation and CTH retrieval (Min et al., 2020; Håkansson et al., 2018; Kühnlein et al., 2014). In recent years, several studies have leveraged ML-based algorithms to retrieve CBH, establishing nonlinear connections between CBH and GEO satellite observations. For instance, Tan et al. (2020) integrated CTH and cloud optical property products from the Fengyun-4A (FY-4A) GEO satellite with spatiotemporally matched CBH data from CALIPSO/CloudSat. They developed a random forest (RF) model for CBH retrieval. Similarly, Lin et al. (2022) constructed a gradient boosted regression tree (GBRT) model using US new-generation Geostationary Operational Environmental Satellites - R Series (GOES-R) Advanced Baseline Imager (ABI) Level-1B radiance data and the ERA5 (the fifth-generation ECMWF) reanalysis dataset (Lin et al., 2022; Hersbach et al., 2020) (https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5, last access: 14 December 2024). They employed CALIPSO CBH data as labels to achieve single-layer CBH retrievals. Notably, the CBH quality of ML-based algorithms was found to surpass that of physics-based algorithms (Lin et al., 2022). Moreover, Tan et al. (2020) utilized Himawari-8 data and the RF algorithm to develop a novel CBH algorithm, achieving a similar high correlation coefficient (R) of 0.92 and a low root mean square error (RMSE) of 1.17 km compared with CloudSat/CALISPO data.
However, these former studies did not discuss whether both physics-based and ML-based algorithms of the GEO satellite could retrieve the diurnal cycle of CBH well. This gap in research could mainly be attributed to potential influences from the fixed LEO satellite's (with active radar or lidar) passing time in the previous CBH retrieval model (Lin et al., 2022). The diurnal cycles of CBH have not been well investigated in both GEO and LEO remote sensing research. Hence, it is crucial to thoroughly investigate the diurnal cycle features of CBH derived from GEO satellite measurements by comparing them with ground-based radar and lidar observations (Min and Zhang, 2014; Warren and Eastman, 2014). In this study, we aim to assess the applicability and feasibility of both physics-based and ML-based algorithms of GEO satellites in capturing the diurnal cycle characteristics of CBH.
The subsequent sections of this paper are structured as follows. Section 2 provides a concise overview of the data employed in this study. Following this, Sect. 3 introduces the four distinct physics- and ML-based CBH retrieval algorithms. In Sect. 4, the CBH results obtained from these four algorithms are analyzed, and comparisons are drawn with spatiotemporally matched CBHs from ground-based cloud radar and lidar. Finally, Sect. 5 encapsulates the primary conclusions and new findings derived from this study.
In this study, observations from the Himawari-8 (H8) Advanced Himawari Imager (AHI) are utilized for the retrieval of high-spatiotemporal-resolution CBH. Launched successfully by the Japan Meteorological Agency on 7 October 2014, the H8 geostationary satellite is positioned at 140.7° E. The AHI on board H8 encompasses 16 spectral bands ranging from 0.47 to 13.3 µm, featuring spatial resolutions of 0.5–2 km. This includes 3 visible (VIS) bands at 0.5–1 km, 3 near-infrared (NIR) bands at 1–2 km, and 10 infrared (IR) bands at 2 km. The H8/AHI can scan a full disk area within 10 min, two specific areas within 2.5 min, a designated area within 2.5 min, and two landmark areas within 0.5 min (Iwabuchi et al., 2018). Its enhanced temporal resolution and observation frequency facilitate the tracking of rapidly changing weather systems, enabling the accurate determination of quantitative atmospheric parameters (Bessho et al., 2016).
Operational H8/AHI Level-1B data, accessible from 7 July 2015, are freely available on the satellite product home page of the Japan Aerospace Exploration Agency (Letu et al., 2019). The Level-2 cloud products utilized in this study, including the cloud mask (CLM), CTH, the cloud effective particle radius (CER or Reff), and DCOT, are generated by the Fengyun geostationary satellite algorithm test bed (FYGAT) science product (Wang et al., 2019; Min et al., 2017) of the China Meteorological Administration (CMA) for various applications. According to previous CALIPSO validations (Min et al., 2020), the absolute bias of cloud top height retrieved by the H8 satellite is approximately 3 km, with an absolute bias of 1 to 2 km for samples below 5 km. The accuracy of CTH is crucial for estimating CBH in the subsequent algorithm. It is important to note that certain crucial preliminary cloud products, such as CLM, have been validated in prior studies (Wang et al., 2019; Liang et al., 2023). Nevertheless, before initiating CBH retrieval, it is imperative to validate the H8/AHI cloud optical and microphysical products from the FYGAT retrieval system. This validation has been carried out by using analogous MODIS Level-2 cloud products as a reference. Additional details regarding the validation of cloud products are provided in Appendix A.
In addition to the H8/AHI Level-1 and Level-2 data, Global Forecast System (GFS) numerical weather prediction (NWP) data are employed for CBH retrieval in this study. The variables include land/sea surface temperature and the vertical profiles of temperature, humidity, and pressure. Operated by the US NOAA (Kalnay et al., 1996), the GFS serves as a global and advanced NWP system. The operational GFS system routinely delivers global high-quality and gridded NWP data at 3 h intervals, with four different initial forecast times per day (00:00, 06:00, 12:00, and 18:00 UTC). The three-dimensional NWP data cover the Earth in a 0.5° × 0.5° grid interval and resolve the atmosphere with 26 vertical levels from the surface (1000 hPa) up to the top of the atmosphere (10 hPa).
As previously mentioned, the official MODIS Collection 6.1 Level-2 cloud product climate data records (Platnick et al., 2017) are utilized in this study to validate the H8/AHI cloud products (CTH, CER, and DCOT) generated by the FYGAT system. High-quality, long-term MODIS data are often used as a validation reference to evaluate the products of new satellites. MODIS sensors are on board NASA's Terra and Aqua polar-orbiting satellites. Terra functions as the morning satellite, passing through the Equator from north to south at approximately 10:30 local time (LT), while Aqua serves as the afternoon satellite, traversing the Equator from south to north at around 13:30 LT. As a successor to the NOAA Advanced Very High Resolution Radiometer (AVHRR), MODIS features 36 independent spectral bands and a broad spectral range from 0.4 µm (VIS) to 14.4 µm (IR), with a scanning width of 2330 km and spatial resolutions ranging from 0.25 to 1.0 km. Recent studies (Baum et al., 2012; Platnick et al., 2017) have highlighted significant improvements and collective changes in cloud top, optical, and microphysical properties from Collection 5 to Collection 6.
In addition to the passive spaceborne imaging sensors mentioned above, the CloudSat satellite, equipped with a 94 GHz active cloud profiling radar (CPR), holds the distinction of being the first sun-synchronous orbit satellite specifically designed to observe global cloud vertical structures and properties. It is part of the A-Train series of satellites, akin to the Aqua satellite, launched and operated by NASA (Heymsfield et al., 2008). CALIPSO is another polar-orbiting satellite within the A-Train constellation, sharing an orbit with CloudSat and trailing it by a mere 10–15 s. CALIPSO is the first satellite equipped with an active dual-channel Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) at 532 and 1064 nm bands (Hunt et al., 2009). Both CloudSat and CALIPSO possess notable advantages over passive spaceborne sensors due to the 94 GHz radar of CloudSat and the joint return signals of lidar and radar on CALIPSO. These features enhance their sensitivity to optically thin cloud layers and ensure strong penetration capability, resulting in more accurate CTH and CBH detections compared to passive spaceborne sensors (CAL_LID_L2_05kmCLay-Standard-V4-10). The joint cloud type products of 2B-CLDCLASS-LIDAR, derived from both CloudSat and CALIPSO measurements, offer a comprehensive description of cloud vertical structure characteristics, cloud type, CTH, CBH, etc. The time interval between each profile in this product is approximately 3.1 s, and the horizontal resolution is 2.5 km (along track) × 1.4 km (cross-track). Each profile is divided into 125 layers with a 240 m vertical interval. For more details on 2B-CLDCLASS-LIDAR products, refer to the CloudSat official product manual (Sassen and Wang, 2008). In this study, we consider the lowest effective cloud base height from the joint CloudSat/CALIOP data as the true values for training and validation. Note that for this study, we utilized 1-year H8/AHI data and matched them with the joint CloudSat/CALIOP data from 1 January to 31 December 2017.
3.1 GEO cloud base height retrieval algorithm from the interface data processing segment of the Visible Infrared Imaging Radiometer Suite
The Joint Polar Satellite System (JPSS) program is a collaborative effort between NASA and NOAA. The operational CBH retrieval algorithm, part of the 30 environmental data records (EDRs) of JPSS, can be implemented operationally through the Interface Data Processing Segment (IDPS) (Baker, 2011). In this study, our geostationary satellite CBH retrieval algorithm aligns with the IDPS CBH algorithm developed by Baker (2011). Utilizing the geostationary H8/AHI cloud products discussed earlier, this new GEO CBH retrieval algorithm is succinctly outlined below. It is important to note that multilayer cloud scenes remain a challenge for retrieving both CTH and CBH, especially when considering the column-integrated cloud water path (CWP) used in physics-based algorithms (Noh et al., 2017). In this study, we simplify the scenario by assuming a single-layer cloud for all algorithms.
The new GEO IDPS CBH algorithm initiates the process by first retrieving the CGT from the bottom to the top. Subsequently, CGT is subtracted from the corresponding CTH to calculate CBH (CBH = CTH − CGT). The algorithm is divided into two independent executable modules based on cloud phase, distinguishing between liquid water and ice clouds. The CBH of water cloud retrieval requires DCOT and CER as inputs. For ice clouds, an empirical equation is employed for CBH retrieval. However, the standard deviations of error in IDPS CBH for individual granules often exceed the JPSS VIIRS minimum uncertainty requirement of ±2 km (Noh et al., 2017). For a more comprehensive understanding of this CBH algorithm, refer to the IDPS algorithm documentation (Baker, 2011). Note that, similar to previous studies on cloud retrieval (Noh et al., 2017; Platnick et al., 2017), this investigation also assumes a single-layer cloud for all CBH algorithms due to the challenges associated with determining multilayer cloud structures.
3.2 GEO cloud base height retrieval algorithm implemented in the Clouds from Advanced Very High Resolution Radiometer Extended system
As mentioned above, the accuracy of the GEO IDPS algorithm is highly dependent on the initial input parameters, such as the cloud phase, DCOT, and Reff, which may introduce some uncertainties in the final retrieval results. In contrast, another statistically based algorithm is proposed and implemented here, which is named the GEO Clouds from AVHRR Extended (CLAVR-x), NOAA's operational cloud processing system for the AVHRR CBH algorithm (Noh et al., 2017), and it mainly refers to the NOAA algorithm working group (AWG) CBH algorithm (ACBA) (Noh et al., 2022). Previous studies have also demonstrated an R of 0.569 and an RMSE of 2.3 km for the JPSS VIIRS CLAVR-x CBH algorithm. It is anticipated that this algorithm will also be employed for the NOAA GOES-R geostationary satellite imager (Noh et al., 2017; Seaman et al., 2017).
Similar to the GEO IDPS CBH retrieval algorithm mentioned earlier, the GEO CLAVR-x CBH retrieval algorithm also initially obtains CGT and CTH, subsequently calculating CBH by subtracting CGT from CTH (CTH − CGT). However, the specific calculation method for the CGT value differs. This algorithm is suitable for single-layer clouds and the topmost layer of multilayer clouds, computing CBH using the CTH at the top layer of the cloud. In comparison with the former GEO IDPS CBH algorithm, the GEO CLAVR-x CBH algorithm considers two additional cloud types: deep convection clouds and thin cirrus clouds (Baker, 2011). For more details on this CLAVR-x CBH algorithm, refer to the original algorithm documentation (Noh et al., 2017).
3.3 Random-forest-based cloud base height estimation algorithm
RF, one of the most significant ML algorithms, was initially proposed and developed by Breiman (2001). It is widely employed to address classification and regression problems based on the law of large numbers. The RF method is well suited for capturing complex or nonlinear relationships between predictors and predictands.
In this study, two distinct ML-based GEO CBH algorithms, namely VIS+IR and IR single (which only uses observations of H8/AHI IR channels), are devised to retrieve or predict the CBH using different sets of predictors. The RF training of the chosen predictors is formulated as follows:
where RFreg denotes the regression RF model, and xi represents the ith predictor. The selected predictors from H8/AHI for both the VIS+IR and the IR RF model training and prediction are detailed in Table 1, mainly referencing Min et al. (2020) and Tan et al. (2020). The VIS+IR algorithm retrieves CBH using NWP data (atmospheric temperature and altitude profiles, total precipitable water (TPW), surface temperature), surface elevation, air mass 1 (air mass 1 = (view zenith angle)), and air mass 2 (air mass 2 = (solar zenith angle)). The rationale for choosing air mass and TPW is their ability to account for the potential absorption effect of water vapor along the satellite viewing angle. The predictors in CBH retrieval also include the IR band brightness temperature (BT) and VIS band reflectance. The IR-single algorithm selects the same GFS NWP data as the VIS+IR algorithm but employs only view zenith angles and azimuth angles.
To optimize the RF prediction model, the hyperparameters of the RF model are tuned individually. The parameters and their dynamic ranges involved in tuning the RF prediction models include the number of trees , the maximum depth of trees , the minimum number of samples required to split an internal node , and the minimum number of samples required to be at a leaf node . In this study, we set the smallest number of trees in the forest to 100 and the maximum depth of the tree to 40.
3.4 Evaluation method
The performance of RF models and physics-based methods is assessed using mean absolute error (MAE), mean bias error (MBE), RMSE, R, and standard deviation (SD) scores using the testing dataset. These scores are used to understand different aspects of the predictive performance of the model: MAE and RMSE provide insights into the average error magnitude, MBE indicates bias in the predictions, R evaluates the linear association between observed and predicted values, and SD assesses the variability of the predictions. In the RF IR-single algorithm, 581 783 matching points are selected from H8/AHI and CloudSat data for 2017; 70 % of these points are randomly assigned to the training dataset, and the remainder serves as the testing dataset. For the RF VIS+IR algorithm, a total of 418 241 matching points are chosen, with 70 % randomly allocated to the training set. Note that the reduced data amount is because only daytime data can be used for the VIS+IR method training. It is important to note that the two training datasets in CloudSat are also used to verify the CBHs obtained by cloud radar and lidar. The statistical formulas for evaluation are as follows:
where n is the sample number, yi is the ith CBH retrieval result, and xi is the ith joint CloudSat/CALIOP CBH product.
Since the two RF models (VIS+IR and IR single) select 230 typical variables to fit CBHs, the importance scores of these predictors in the two ML-based algorithms are ranked for better optimization. In an RF model, feature importance indicates how much each input variable contributes to the model's predictive accuracy by measuring the decrease in impurity or error when the feature is used to split data (Gregorutti et al., 2017). In the VIS+IR model, the top-ranked predictors are CTH and cloud top temperature (CTT) from the H8/AHI Level-2 product (see Fig. B1 in Appendix B). It is important to note that DCOT is a crucial and sensitive factor for these ML-based algorithms. Retrieving CBH samples with relatively low DCOT remains challenging due to the low signal-to-noise ratio when DCOT is low (Lin et al., 2022). To address this issue, samples with DCOT less than 1.6 are filtered in the VIS+IR model, and samples with relatively large BTs at channel 14 are filtered in the IR-single model. This filtering process significantly improves the R value from 0.869 to 0.922 in the VIS+IR model and from 0.868 to 0.911 in the IR-single model. For more details on the algorithm optimization, refer to Appendix B.
In this study, the H8/AHI satellite CBH data retrieved by the four algorithms mentioned before are matched spatiotemporally with the 2B-CLDCLASS-LIDAR cloud product from joint CloudSat/CALIPSO observations in 2017. In this process, the nearest-distance matching method is employed, ensuring that collocating the closest points and the observation time difference between the CloudSat/CALIPSO observation point and the matched Himawari-8 data is less than 5 min (Noh et al., 2017). As in an earlier study (Min et al., 2020), we also used 70 % of the matched data for training and 30 % of an independent sample for validation. Figure 1 displays a comparison of CBH results over the full disk at 02:00 UTC on 1 January 2017, retrieved by the GEO IDPS algorithm, the GEO CLAVR-x algorithm, the RF VIS+IR algorithm, and the RF IR-single algorithm for all cloud conditions including single and multilayer cloud scenes. A similar distribution pattern and magnitude of CBHs retrieved by these four independent algorithms can be observed in Fig. 1. However, notable differences exist between physics-based and ML-based algorithms. Further comparisons are conducted and analyzed with spaceborne and ground-based lidar and radar observations in the subsequent sections of this study.
4.1 Comparisons with the joint CloudSat/CALIPSO cloud base height product
4.1.1 Joint scatter plots
Figure 2 presents the density scatter plot of the CBHs retrieved from the GEO IDPS and GEO CLAVR-x algorithms compared with the CBHs from the joint CloudSat/CALIPSO product, along with the related scores of MAE, MBE, RMSE, and R calculated and labeled in each panel. The calculated R exceeds the 95 % significance level (p < 0.05). For the GEO IDPS algorithm, the R is 0.62, the MAE is 1.83 km, and the MBE and RMSE are −0.23 and 2.64 km (Fig. 2a). In comparison, Seaman et al. (2017) compared the operational VIIRS CBH product retrieved by the similar SNPP/VIIRS IDPS algorithm with the CloudSat CBH results. In their results, the R is 0.57, and the RMSE is 2.3 km. For the new GEO CLAVR-x algorithm (Fig. 2b), the R is 0.645, and the RMSE is 2.91 km. The larger RMSEs from two independent physics-based CBH algorithms demonstrate a slightly poorer performance and precision of these retrieval algorithms for GEO satellites. Particularly, the larger RMSEs (2.64 and 2.91 km) indicate weaker stabilities of the GEO IDPS and CLAVR-x CBH algorithms compared with the VIIRS CBH product (Seaman et al., 2017). In this figure, more samples can be found near the 1:1 line, implying good quality of the retrieved CBHs. However, in stark contrast, quite a few CBH samples retrieved by both the GEO IDPS and the GEO CLAVR-x algorithms (compared with the official VIIRS CBH product) fall below 1.0 km, indicating relatively large errors when compared with the joint CloudSat/CALIPSO CBH product. Moreover, Fig. 2 reveals that relatively large errors are also found in the CBHs lower than 2 km for the four independent algorithms, primarily caused by the weak penetration ability of VIS or IR bands on thick and low clouds.
Referring to the joint CloudSat/CALIPSO CBH product, Fig. 2c and d present the validations of the CBH results retrieved from two ML-based algorithms using the VIS+IR (only retrieving the CBH during the daytime) and IR-single models. Figure 2c demonstrates better consistency of CBH between the VIS+IR model and the joint CloudSat/CALIPSO product with R = 0.91, MAE = 0.82 km, MBE = 0.43 km, and RMSE = 1.71 km. Figure 2d also displays a relatively high R of 0.876 when validating the IR-single model, with MAE = 0.88, MBE = −0.45, and RMSE = 2.00. Therefore, both VIS+IR and IR-single models can obtain high-quality CBH retrieval results from geostationary imager measurements. In comparison, previous studies have also proposed similar ML-based algorithms for estimating CBH using FY-4A satellite imager data. For example, Tan et al. (2020) used the variables of CTH, DCOT, Reff, cloud water path, and longitude/latitude from FY-4A imager data to build the training and prediction model and obtained CBH with MAE = 1.29 km and R = 0.80. In this study, except CTH, the other Level-2 products and geolocation data (longitude/latitude) used in Tan et al. (2020) are abandoned, while the matched atmospheric profile products (such as temperature and relative humidity) from NWP data are added. These changes in ML-based model training and prediction lead to more accurate CBH retrieval results. Note that, in accordance with the previous study conducted by Noh et al. (2017), we excluded CBH samples obtained from CloudSat/CALIPSO that were smaller than 1 km in our comparisons. This exclusion was primarily due to the presence of ground clutter contamination in the CloudSat CPR data (Noh et al., 2017).
4.1.2 Test case
Figure 3 displays two cross-sections of CBH from various sources overlaid with CloudSat radar reflectivity [dBZ] for spatiotemporally matched cases. The periods covered are from 03:16 to 04:55 UTC on 13 January 2017 (40.56–53.39° S, 154.0–160.0° E) and from 05:38 to 07:17 UTC on 14 January 2017 (8.35–11.57° N, 107.1–107.8° E). The CloudSat radar reflectivity and joint CloudSat/CALIPSO product provide insights into the vertical structure or distribution of clouds and their corresponding CBHs. The results from the four GEO CBH retrieval algorithms (GEO IDPS, GEO CLAVR-x, RF VIS+IR model, and RF IR-single model) mentioned earlier are individually marked with different markers in each panel. According to Fig. 3a, the GEO IDPS algorithm faces challenges in accurately retrieving CBHs for geometrically thicker cloud samples near 157° E. Optically thick mid- and upper-level cloud layers may obscure lower-level cloud layers. However, the CBH results retrieved by the GEO IDPS algorithm near 155° E (in Fig. 3a) and 107.4° E (in Fig. 3b) align with the joint CloudSat/CALIPSO CBH product. It is worth noting that the inconsistency observed between 107.2 and 107.3° E in Fig. 3b, specifically regarding the CBHs around 1 km obtained from CloudSat/CALIPSO, can likely be attributed to ground clutter contamination in the CloudSat CPR data (Noh et al., 2017). The GEO CLAVR-x algorithm achieves improved CBH results compared to the GEO IDPS algorithm. It can even retrieve CBHs for some thick cloud samples that are invalid when using the GEO IDPS algorithm. However, the CBHs from the GEO CLAVR-x algorithm are noticeably higher than those from the joint CloudSat/CALIPSO product. In contrast, the CBHs from the two ML-based algorithms show substantially better results than those from the other two physics-based algorithms. Particularly, the ML-based VIS+IR model algorithm yields the best CBH results. However, compared with those from the two physics-based algorithms, the CBHs from the two ML-based algorithms still exhibit a significant error around 5 km.
4.2 Comparisons with the ground-based lidar and cloud radar measurements
Lidar actively emits laser pulses in different spectral bands into the air. When the laser signal encounters cloud particles during transmission, a highly noticeable backscattered signal is generated and received (Omar et al., 2009). The lidar return signal of cloud droplets is markedly distinct from atmospheric aerosol scattering signals and noise, making CBH easily obtainable from the signal difference or mutation (Sharma et al., 2016). In this study, continuous ground-based lidar data from the Twin Astronomy Manor in Lijiang, Yunnan Province, China (26.454° N, 100.0233° E; 3175 m altitude), are used to evaluate the diurnal cycle characteristics of CBHs retrieved using GEO satellite algorithms (Young and Vaughan, 2009). The geographical location and photo of this station are shown in Fig. 4.
4.2.1 Comparison of CBH retrievals from ground and satellite data
The ground-based lidar data at Lijiang station on 6 December 2018 and 8 January 2019 are selected for validation. In fact, this lidar was primarily used for the calibration of ground-based lunar radiation instruments. During the 2-month observation period (from December 2018 to January 2019), it was always operated only under clear-sky conditions, resulting in the capture of cloud data on just 2 d. The 2 d was cloudy, with stratiform clouds at an altitude of around 5 km and no precipitation occurring. The number of available and spatiotemporally matched CBH sample points from ground-based lidar is 78 and 64 on 6 December 2018 and 8 January 2019, respectively. Figure 5a and b show the point-to-point CBH comparisons between ground-based lidar and four GEO satellite CBH algorithms on 6 December 2018 and 8 January 2019. It is worth noting that the retrieved CBHs of the two physics-based algorithms on 6 December 2018 are in good agreement with the reference values from the lidar measurements, and, in particular, the GEO CLAVR-x algorithm can obtain better results. From the results on 8 January 2019, more accurate diurnal cycle characteristics of CBHs are revealed by the GEO CLAVR-x algorithm than by the GEO IDPS algorithm.
Compared with the CBHs measured by ground-based lidar, the statistics of the results retrieved from the GEO IDPS algorithm are R = 0.67, MAE = 3.09 km, MBE = 0.86 km, and RMSE = 3.61 km (Fig. 5c). However, for cloud samples with CBH below 7.5 km, the GEO IDPS algorithm shows an obvious underestimation of CBH in Fig. 5c. For the GEO CLAVR-x algorithm, it can also be seen that the matched samples mostly lie near the 1:1 line, with R = 0.77 (the optimal CBH algorithm), MAE = 1.32 km, MBE = 0.22 km, and RMSE = 1.60 km. In addition, this figure also shows the CBH comparisons between the ML-based VIS+IR model/IR-single model algorithms and the lidar measurements, revealing that the retrieved CBH results from the ML-based VIS+IR model are better than those from the ML-based IR-single model algorithm. The comparison results between the CBHs of the ML-based VIS+IR model algorithm and the lidar measurements are around the 1:1 line, with smaller errors and R = 0.60. In contrast, the R between the CBHs of the ML-based IR-single model algorithm and the lidar measurements is only 0.50, with a relatively large error. By comparing the retrieved CBHs with the lidar measurements at Lijiang station, it is indicated that CBH results from the two physics-based algorithms are remarkably more accurate and that the GEO CLAVR-x algorithm in particular can capture diurnal variation in CBH well.
To further assess the accuracy and quality of the diurnal cycle of CBHs retrieved with these algorithms, CBHs from another ground-based cloud radar dataset covering the entire year of 2017 are also collected and used in this study. The observational instrument is a Ka-band (35 GHz) Doppler millimeter-wave cloud radar (MMCR) located at the Beijing Nanjiao Weather Observatory (a typical urban observation site) (39.81° N, 116.47° E; 32 m altitude; see Fig. 4), performing continuous and routine observations. The MMCR provides a specific vertical resolution of 30 m and a temporal resolution of 1 min for single profile detection, based on the radar reflectivity factor. In a previous study (Zhou et al., 2019), products retrieved by this MMCR were utilized to investigate the diurnal variations in CTH and CBH, and comparisons were made between MMCR-derived CBHs and those derived from a Vaisala CL51 ceilometer. The former study also found that the average R of CBHs from different instruments reached up to 0.65. It is worth noting that the basic physics principle for detecting cloud base height from both spaceborne cloud profiling radar and ground-based cloud radar and lidar measurements is the same. All these algorithms used to detect CBH are based on the manifest change in return signals between CBH and the clear-sky atmosphere in the vertical direction (Huo et al., 2019; Ceccaldi et al., 2013). The diurnal variation in cloud base height over land is primarily influenced by solar heating, causing the cloud base to rise in the morning and reach its peak by midday. As the surface cools in the afternoon and evening, the cloud base lowers, playing a crucial role in weather patterns and forecasting (Zheng et al., 2020). Due to the density of points in the 1-year time series, the point-to-point CBH comparison results for the entire year are not displayed here (monthly results are shown in the Supplement); we only show 4 d results in Fig. 6. Therefore, it is essential to rigorously compare the ML-based algorithm with ground-based observations to determine its ability to adapt to the daily variations in cloud base height caused by natural factors. The joint spaceborne CloudSat/CALIPSO detection might face limitations in penetrating extremely dense, optically thick clouds or areas with heavy precipitation clouds. Hence, in comparison, the CBH values gathered from ground-based lidar and cloud radar measurements are expected to be more accurate than the data derived from spaceborne CloudSat/CALIPSO detection.
Similar to Fig. 5, Fig. 6 presents two sample groups of CBH results from the cloud radar at Beijing Nanjiao station relative to the matched CBHs from the four retrieval algorithms (GEO IDPS, GEO CLAVR-x, ML-based IR single, ML-based VIS+IR) on 9–10 April and 26–28 July 2017. As with the results at Lijiang station discussed in Fig. 5, we observe better and more robust performances in retrieving the diurnal cycle characteristics of CBH from the two physics-based CBH retrieval algorithms. In contrast, more underestimated CBH samples are retrieved by the two ML-based algorithms.
4.2.2 Diurnal cycle analysis of CBH retrieval accuracy
To further investigate the diurnal cycle characteristics of retrieved CBH from GEO satellite imager measurements, Fig. 7 presents box plots of the hourly CBH errors (relative to the results of cloud radar at Beijing Nanjiao station) in 2017 from the four different CBH retrieval algorithms. Remarkably, there are significant underestimations of the CBHs retrieved from the two ML-based algorithms. The ML-based VIS+IR method achieves relatively better results than the ML-based IR-single method during the daytime. Comparing the two ML-based algorithms, the errors in the IR-single model algorithm have a similar standard deviation (2.80 km) to those of the VIS+IR model algorithm (2.69 km) during the daytime. For the IR-single model algorithm, it can be applied during both daytime and nighttime; its nighttime performance degrades slightly, with an averaged RMSE (3.88 km) higher than that of the daytime performance (3.56 km). The nighttime CBH of the IR-single model algorithm is the only choice that should be used with discretion.
Figure 8 shows the comparisons of hourly MAE, MBE, RMSE, and R relative to the CBHs from the cloud radar at Beijing Nanjiao station during daytime between four retrieval algorithms in 2017. The RMSE of the two ML-based algorithms shows stable diurnal variation. It is noted that all algorithms have lower R at sunrise, around 07:00 LT, which improves as the day progresses. However, the GEO CLAVR-x algorithm stands out for its relatively higher and more stable R and RMSE during daytime.
Figure 9a displays scatter plots and relevant statistics of the CBHs retrieved from the GEO IDPS algorithm against the CBHs from cloud radar. The CBHs from the GEO IDPS algorithm align well with the matched CBHs from cloud radar at Beijing Nanjiao station, with R = 0.52, MAE = 2.08 km, MBE = 1.17 km, and RMSE = 2.67 km. In Fig. 9b, the GEO CLAVR-x algorithm shows better results with R = 0.57, MAE = 2.06 km, MBE = −0.20 km, and RMSE = 2.60 km. It is not surprising that Fig. 8c and d reveal obvious underestimated CBH results from the two ML-based CBH algorithms. Particularly, the CBH results from the ML-based VIS+IR model algorithm concentrate in the range of 2.5 to 5 km. Therefore, Figs. 5 to 9 further substantiate the weak diurnal variations captured by ML-based techniques, primarily attributed to the scarcity of comprehensive CBH training samples throughout the entire day. Moreover, although the two robust physics-based algorithms of GEO IDPS and GEO CLAVR-x (the optimal one) can retrieve high-quality CBHs from H8/AHI data, especially the diurnal cycle of CBH during the daytime, they still struggle to retrieve CBHs below 1 km.
To explore and identify the optimal and most robust CBH retrieval algorithm from geostationary satellite imager measurements, particularly focusing on capturing the typical diurnal cycle characteristics of CBH over land, this study employs four different retrieval algorithms (two physics-based and two ML-based algorithms). High-spatiotemporal-resolution CBHs are retrieved using the H8/AHI data from 2017 to 2019. To assess the accuracies of the retrieved CBHs, point-to-point validations are conducted using spatiotemporally matched CBHs from the joint CloudSat/CALIOP product, ground-based lidar, and cloud radar observations in China. The main findings and conclusions are outlined below.
Four independent CBH retrieval algorithms, namely physics-based GEO IDPS, physics-based GEO CLAVR-x, ML-based VIS+IR, and ML-based IR single, have been developed and utilized to retrieve CBHs from GEO H8/AHI data under the assumption of single-layer clouds. The two physics-based algorithms utilize cloud top and optical property products from AHI as input parameters to retrieve high-spatiotemporal-resolution CBHs, with operations limited to daytime. In contrast, the ML-based VIS+IR model and IR-single model algorithms use the matched joint CloudSat/CALIOP CBH product as true values for building RF prediction models. Notably, the ML-based IR-single algorithm, which relies solely on infrared band measurements, can retrieve CBH during both daytime and nighttime.
The accuracy of CBHs retrieved from the four independent algorithms is verified using the joint CloudSat/CALIOP CBH products for the year 2017. The GEO IDPS algorithm shows an R of 0.62 and an RMSE of 2.64 km. The GEO CLAVR-x algorithm provides more accurate CBHs with an R of 0.65 and RMSE of 2.91 km. After filtering samples with optical thickness less than 1.6 and brightness temperature (at the 11 µm band) greater than 281 K, the ML-based VIS+IR and ML-based IR-single algorithms achieve higher accuracy, with an R (RMSE) of 0.92 (1.21 km) and 0.91 (1.42 km), respectively. This indicates strong agreement between the two ML-based CBH algorithms and the CloudSat/CALIOP CBH product.
However, in stark contrast, the results from the physics-based algorithms (with an R and RMSE of 0.59 and 2.86 km) are superior to those from the ML-based algorithms (with an R and RMSE of 0.39 and 3.88 km) when compared with ground-based CBH observations such as lidar and cloud radar. In the comparison with the cloud radar at Beijing Nanjiao station in 2017, the R of the GEO CLAVR-x algorithm is 0.57, while the R of the GEO IDPS algorithm is 0.52. Meanwhile, notable differences are observed in the CBHs between both ML-based algorithms. Similar conclusions are also evident in the 2 d comparisons at Yunnan Lijiang station.
The CBH results from the two ML-based algorithms (R > 0.91) can likely be attributed to the use of the same training and validation dataset source as the joint CloudSat/CALIOP product. However, this dataset has limited spatial coverage and small temporal variation, potentially limiting the representativeness of the training data. In contrast, the GEO CLAVR-x algorithm demonstrates the best performance and highest accuracy in retrieving CBH from geostationary satellite data. Notably, its results align well with those from ground-based lidar and cloud radar during the daytime. However, both physics-based methods, utilizing CloudSat CPR data for regression, struggle to accurately retrieve CBHs below 1 km, as the lowest 1 km above ground level of this data is affected by ground clutter. In general, the physics-based algorithms, such as GEO CLAVR-x and GEO IDPS, demonstrate notable advantages in capturing the diurnal cycle of CBH. Unlike ML-based methods, they offer more stable error metrics, especially with higher correlation and lower RMSE during the daytime. Additionally, they are more effective at capturing significant and natural variations in CBH, providing generally higher-quality retrievals from H8/AHI data, even though challenges remain in accurately retrieving CBHs below 1 km.
Additionally, despite utilizing the same physics principles in spaceborne and ground-based lidar/radar CBH algorithms, the study by Thorsen et al. (2011) has highlighted differences in profiles between them. Therefore, this factor induced by the detection principle could contribute to the relatively poorer results in CBH retrieval by ML-based algorithms compared to ground-based lidar and radar. The analysis and discussion above suggest that ML-based algorithms are constrained by the size and representativeness of their datasets.
Ideally, we guess that including more spaceborne cloud profiling radars with varying passing times (covering the entire day) in the training dataset could improve the machine learning technique, potentially leading to a higher-quality CBH product with more comprehensive observations. The CBH product using ML-based algorithms should continue to be improved in future work. Particularly, exploring the joint ML- and physics-based method presents a promising direction, which can address the complexities and challenges in retrieving cloud properties. By integrating established physical relationships into ML models, we can potentially enhance the accuracy and reliability of predictions. This approach not only leverages the strengths of both physics-based models and data-driven techniques but also offers a pathway to more robust and interpretable solutions in atmospheric sciences. At present, we focus on developing physics-based algorithms for cloud base height for the next generation of geostationary meteorological satellites to support the application of these products in weather and climate domains.
Moreover, at night, current GEO satellite imaging instruments encounter challenges in accurately determining CBH due to limited or absent solar illumination. Because it is unable to retrieve cloud optical depth in the visible band, the current method faces limitations. However, there is potential for enhanced accuracy in deriving cloud optical and microphysical properties, as well as CBH, by incorporating day–night band (DNB) observations during nighttime in the future (Heidinger et al., 2012).
Based on the previously discussed description of two physics-based cloud base height (CBH) retrieval algorithms (GEO IDPS and GEO CLAVR-x retrieval algorithms), cloud products, such as cloud top height (CTH), effective particle radius (Reff), and cloud optical thickness (DCOT), are utilized in both algorithms. To validate the reliability of these cloud products derived from the Advanced Himawari Imager (AHI) aboard the Himawari-8 (H8), a pixel-by-pixel comparison is conducted with analogous MODIS Collection 6.1 Level-2 cloud products. Both Aqua and Terra MODIS Level-2 cloud products (MOD06 and MYD06) are accessible for free from the MODIS official website. For verification purposes, the corresponding Level-2 cloud products from January, April, July, and October 2018 are chosen to assess CTH, DCOT, and Reff retrieved by H8/AHI.
Figure S2 (in the Supplement) shows the spatiotemporally matched case comparisons of CTH, DCOT, and Reff from H8/AHI and Terra/MODIS (MYD06) at 03:30 UTC on 15 January 2018. It can be seen that the CTH, DCOT, and Reff from H8/AHI are in good agreement with the matched MODIS cloud products. However, there are still some differences in Reff at the regions near 35° N, 110° E in Fig. S2d and c. The underestimated Reff values from H8/AHI relative to MODIS have been reported in previous studies. Letu et al. (2019) compared the ice cloud products retrieved from AHI and MODIS and concluded that Reff from both products differs remarkably in the ice cloud region, and DCOT is roughly similar. However, DCOT from AHI data is higher in some areas. Looking again at the cloud optical thickness, the slight underestimation of H8/AHI DCOT can be found in Fig. S2e and f. Figure S3 shows another case at 02:10 UTC on 15 January 2018. Despite the good consistence between the H8/AHI and MODIS cloud products, there are slight differences in CTH in the area around 40–40.5° S, 100–110° E in Fig. S3a and b. Moreover, as shown in Fig. S2, there are still underestimations in Reff of H8/AHI.
To further compare and validate these three H8/AHI cloud products, the spatiotemporally matched samples from H8/AHI and Aqua/Terra MODIS in 4 months of 2018 are counted within the three intervals of 0.1 km (CTH), 1.0 µm (Reff), and 1 (DCOT) in Fig. S4. The corresponding mean absolute error, mean bias error, RMSE, and R values are also calculated and marked in each subfigure. As can be seen, the R of CTH is around 0.75 in all 4 months and is close to 0.8 in August. The results of DCOT show the highest R, reaching above 0.8. In contrast, the underestimation trend in Reff is also shown in this figure. These different consistencies between the two satellite-retrieved cloud products may be attributed to (1) the different spatiotemporal resolutions between H8/AHI and MODIS; (2) the different wavelength bands, bulk scattering models, and specific algorithms used for retrieving cloud products; and (3) the different view zenith angle between GEO and low-Earth-orbit satellite platforms (Letu et al., 2019). In addition, other external factors such as surface type can also affect the retrieval of cloud products. However, according to Fig. S4, the bulk of the analyzed samples is still around the 1:1 line, indicating good quality of H8/AHI cloud products.
The ML-based visible (VIS) and infrared (IR) model algorithm uses 230 typical variables (see Table 1) as model predictors, and the importance scores of the top-30 predictors are ranked in Fig. S5. It can be seen that the most important variables are CTH and CTT, and DCOT is an important or sensitive factor affecting these two quantities. A sensitivity test is also performed to further investigate the potential influence of DCOT on the CBH retrieval by the VIS+IR model (see Table S1 in the Supplement). From Fig. S7a, we find that the samples with DCOT lower than 5 cause the relatively large CBH errors compared with the matched CBHs from the joint Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO)/CloudSat product.
According to the results in Fig. S7b, we may filter the samples with relatively small DCOT to further improve the accuracy of CBH retrieval by the VIS+IR model (see Table S1). Figure S7b shows that after filtering the samples with DCOT less than 1.6, the R increases from 0.895 to 0.922, implying a better performance of CBH retrieval. According to the ranking of predictor importance (see Fig. S6), we also conduct another sensitivity test on the BT observed by H8/AHI IR Channel 14 (Cha14) at 11 µm, which plays an important role in the IR-single model. Figure S7c shows that the BT values of H8/AHI Channel 14 ranges from 160 to 316 K, and the samples with BT higher than 300 K show large CBH errors. Similarly, by filtering the samples with BT higher than 281 K, we can get a better IR-single model algorithm for retrieving high-quality CBH (see Table S2). Figure S7d also proves that the R value increases from 0.868 to 0.911.
The MODIS Collection 6.1 Level-2 cloud product from the National Aeronautics and Space Administration (NASA) is available at https://doi.org/10.5067/MODIS/MOD06_L2.061 (Platnick et al., 2015). The CloudSat datasets from the CloudSat Data Processing Center of the Cooperative Institute for Research in the Atmosphere are available at http://www.cloudsat.cira.colostate.edu/ (CloudSat DPC, 2024). The Himawari-8 data utilized for the CBH retrieval from the Japan Aerospace Exploration Agency (JAXA) P-Tree system are available at https://www.eorc.jaxa.jp/ptree/ (JAXA, 2024; users need to register first). The GFS NWP data from US NOAA are available at https://www.nco.ncep.noaa.gov/pmb/products/gfs/ (US NOAA, 2024).
The supplement related to this article is available online at: https://doi.org/10.5194/acp-24-14239-2024-supplement.
MM proposed the essential research idea. MW, MM, JL, HL, BC, and YL performed the analysis and drafted the paper. ZY and NX provided useful comments. All the authors contributed to the interpretation and discussion of the results and the revision of the paper.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.
The authors would like to acknowledge NASA, JMA, the University of Colorado, and NOAA for freely providing satellite data online. The authors thank NOAA, NASA, and their VIIRS algorithm working groups (AWGs) for freely providing the VIIRS cloud base height algorithm theoretical basic documentations (ATBD). In addition, the authors appreciate the power computer tools developed by the Python and scikit-learn groups (https://scikit-learn.org/stable/, last access: 14 December 2024). The authors also thank Rundong Zhou and Pan Xia for drawing some pictures for this paper. The authors sincerely thank Yong Zhang and Jianping Guo for freely providing cloud base height results retrieved by ground-based cloud radar at Beijing Nanjiao station. We also acknowledge the high‐performance computing support from the School of Atmospheric Science of Sun Yat‐sen University. Last but not least, the authors would like to thank the editor and anonymous reviewers for their thoughtful suggestions and comments.
This work has been supported partly by the Guangdong Major Project of Basic and Applied Basic Research (grant no. 2020B0301030004), the National Natural Science Foundation of China under grant nos. 42175086 and U2142201, the FengYun Meteorological Satellite Innovation Foundation under grant no. FY-APP-ZX-2022.0207, the Innovation Group Project of Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) (no. SML2023SP208), and the Science and Technology Planning Project of Guangdong Province (2023B1212060019).
This paper was edited by Raphaela Vogel and reviewed by two anonymous referees.
Aydin, K. and Singh, J.: Cloud Ice Crystal Classification Using a 95-GHz Polarimetric Radar, J. Atmos. Ocean. Tech., 21, 1679–1688, https://doi.org/10.1175/JTECH1671.1, 2004.
Baker, N.: Joint Polar Satellite System (JPSS) VIIRS Cloud Base Height Algorithm Theoretical Basis Document (ATBD), 2011.
Baum, B., Menzel, W. P., Frey, R., Tobin, D., Holz, R., and Ackerman, S.: MODIS cloud top property refinements for Collection 6, J. Appl. Meteorol. Clim., 51, 1145–1163, https://doi.org/10.1175/JAMC-D-11-0203.1, 2012.
Bessho, K., Date, K., Hayashi, M., Ikeda, A., Imai, T., Inoue, H., Kumagai, Y., Miyakawa, T., Murata, H., Ohno, T., Okuyama, A., Oyama, R., Sasaki, Y., Shimazu, Y., Shimoji, K., Sumida, Y., Suzuki, M., Taniguchi, H., Tsuchiyama, H., Uesawa, D., Yokota, H., and Yoshida, R.: An introduction to Himawari-8/9—Japan's new-generation geostationary meteorological satellites, J. Meteorol. Soc. Jpn., Ser. II, 94, 151–183, https://doi.org/10.2151/jmsj.2016-009, 2016.
Breiman, L.: Random forests, Mach. Learn., 45, 5–32, 2001.
Ceccaldi, M., Delanoë, J., Hogan, R. J., Pounder, N. L., Protat, A., and Pelon, J.: From CloudSat-CALIPSO to EarthCare: Evolution of the DARDAR cloud classification and its comparison to airborne radar-lidar observations, J. Geophys. Res.-Atmos., 118, 7962–7981, https://doi.org/10.1002/jgrd.50579, 2013.
CloudSat DPC (Data Processing Center): http://www.cloudsat.cira.colostate.edu/, last access: 17 December 2024.
Forsythe, J. M., Haar, T. H. V., and Reinke, D. L.: Cloud-Base height estimates using a combination of Meteorological Satellite Imagery and Surface Reports, J. Appl. Meteorol. Clim., 39, 2336–2347, https://doi.org/10.1175/1520-0450(2000)039<2336:CBHEUA>2.0.CO;2, 2000.
Gregorutti, B., Michel, B., and Saint-Pierre, P.: Correlation and variable importance in random forests, Stat. Comput., 27, 659–678, https://doi.org/10.1007/s11222-016-9646-1, 2017.
Håkansson, N., Adok, C., Thoss, A., Scheirer, R., and Hörnquist, S.: Neural network cloud top pressure and height for MODIS, Atmos. Meas. Tech., 11, 3177–3196, https://doi.org/10.5194/amt-11-3177-2018, 2018.
Hansen, B.: A Fuzzy Logic–Based Analog Forecasting System for Ceiling and Visibility, Weather Forecast., 22, 1319–1330, https://doi.org/10.1175/2007waf2006017.1, 2007.
Hartmann, D. L. and Larson, K.: An important constraint on tropical cloud - climate feedback, Geophys. Res. Lett., 29, 12-11–12-14, https://doi.org/10.1029/2002gl015835, 2002.
Heidinger, A. and Pavolonis, M.: Gazing at cirrus clouds for 25 years through a split window, part 1: Methodology, J. Appl. Meteorol. Clim., 48, 1110–1116, https://doi.org/10.1175/2008JAMC1882.1, 2009.
Heidinger, A. K.: GOES-R Advanced Baseline Imager (ABI) Algorithm Theoretical Basis Document for Cloud Height, Version 3.0, https://www.star.nesdis.noaa.gov/goesr/documents/ATBDs/Baseline/ATBD_GOES-R_Cloud_Height_v3.0_Jul2012.pdf (last access: 18 December 2024), 2012.
Heidinger, A. K., Bearson, N., Foster, M. J., Li, Y., Wanzong, S., Ackerman, S., Holz, R. E., Platnick, S., and Meyer, K.: Using sounder data to improve cirrus cloud height estimation from satellite imagers, J. Atmos. Ocean. Tech., 36, 1331–1342, https://doi.org/10.1175/jtech-d-18-0079.1, 2019.
Hersbach, H., Bell, B., Berrisford, P., Hirahara, S., Horányi, A., Muñoz-Sabater, J., Nicolas, J., Peubey, C., Radu, R., Schepers, D., Simmons, A., Soci, C., Abdalla, S., Abellan, X., Balsamo, G., Bechtold, P., Biavati, G., Bidlot, J., Bonavita, M., De Chiara, G., Dahlgren, P., Dee, D., Diamantakis, M., Dragani, R., Flemming, J., Forbes, R., Fuentes, M., Geer, A., Haimberger, L., Healy, S., Hogan, R. J., Hólm, E., Janisková, M., Keeley, S., Laloyaux, P., Lopez, P., Lupu, C., Radnoti, G., de Rosnay, P., Rozum, I., Vamborg, F., Villaume, S., and Thépaut, J.-N.: The ERA5 global reanalysis, Q. J. Roy. Meteor. Soc., 146, 1999–2049, https://doi.org/10.1002/qj.3803, 2020.
Heymsfield, A. J., Bansemer, A., Matrosov, S., and Tian, L.: The 94-GHz radar dim band: Relevance to ice cloud properties and CloudSat, Geophys. Res. Lett., 35, L03802, https://doi.org/10.1029/2007GL031361, 2008.
Hirsch, E., Agassi, E., and Koren, I.: A novel technique for extracting clouds base height using ground based imaging, Atmos. Meas. Tech., 4, 117–130, https://doi.org/10.5194/amt-4-117-2011, 2011.
Hunt, W. H., Winker, D. M., Vaughan, M. A., Powell, K. A., Lucker, P. L., and Weimer, C.: CALIPSO lidar description and performance assessment, J. Atmos. Ocean. Tech., 26, 1214–1228, https://doi.org/10.1175/2009JTECHA1223.1, 2009.
Huo, J., Bi, Y., Lü, D., and Duan, S.: Cloud Classification and Distribution of Cloud Types in Beijing Using Ka-Band Radar Data, Adv. Atmos. Sci., 36, 793–803, https://doi.org/10.1007/s00376-019-8272-1, 2019.
Hutchison, K., Wong, E., and Ou, S. C.: Cloud base heights retrieved during night-time conditions with MODIS data, Int. J. Remote Sens., 27, 2847–2862, https://doi.org/10.1080/01431160500296800, 2006.
Hutchison, K. D.: The retrieval of cloud base heights from MODIS and three-dimensional cloud fields from NASA's EOS Aqua mission, Int. J. Remote Sens., 23, 5249–5265, https://doi.org/10.1080/01431160110117391, 2002.
Iwabuchi, H., Putri, N. S., Saito, M., Tokoro, Y., Sekiguchi, M., Yang, P., and Baum, B. A.: Cloud Property Retrieval from Multiband Infrared Measurements by Himawari-8, J. Meteorol. Soc. Jpn., Ser. II, 96B, 27–42, https://doi.org/10.2151/jmsj.2018-001, 2018.
JAXA: Himawari-8 data, https://www.eorc.jaxa.jp/ptree/, last access: 17 December 2024.
Kalnay, E., Kanamitsu, M., Kistler, R., Collins, W., Deaven, D., Gandin, L., Iredell, M., Saha, S., White, G., Woollen, J., Zhu, Y., Leetmaa, A., Reynolds, R., Chelliah, M., Ebisuzaki, W., W.Higgins, Janowiak, J., Mo, K. C., Ropelewski, C., and Wang, J.: The NCEP NCAR 40-Year Reanalysis Project, B. Am. Meteorol. Soc., 77, 437–472, https://doi.org/10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2, 1996.
Kühnlein, M., Appelhans, T., Thies, B., and Nauß, T.: Precipitation Estimates from MSG SEVIRI Daytime, Nighttime, and Twilight Data with Random Forests, J. Appl. Meteorol. Clim., 53, 2457–2480, https://doi.org/10.1175/jamc-d-14-0082.1, 2014.
Letu, H., Nagao, T. M., Nakajima, T. Y., Riedi, J., Ishimoto, H., Baran, A. J., Shang, H., Sekiguchi, M., and Kikuchi, M.: Ice cloud properties from Himawari-8/AHI next-generation geostationary satellite: Capability of the AHI to monitor the DC cloud generation process, IEEE T. Geosci. Remote, 57, 3229–3239, https://doi.org/10.1109/tgrs.2018.2882803, 2019.
Li, Y., Yi, B., and Min, M.: Diurnal variations of cloud optical properties during day-time over China based on Himawari-8 satellite retrievals, Atmos. Environ., 277, 119065, https://doi.org/10.1016/j.atmosenv.2022.119065, 2022.
Liang, Y., Min, M., Yu, Y., Wang, X., and Xia, P.: Assessing diurnal cycle of cloud covers of Fengyun-4A geostationary satellite based on the manual observation data in China, IEEE T. Geosci. Remote, 61, 4101518, https://doi.org/10.1109/TGRS.2023.3256365, 2023.
Lin, H., Li, Z., Li, J., Zhang, F., Min, M., and Menzel, W. P.: Estimate of daytime single-layer cloud base height from Advanced Baseline Imager measurements, Remote Sens. Environ., 274, 112970, https://doi.org/10.1016/j.rse.2022.112970, 2022.
Lu, X., Mao, F., Rosenfeld, D., Zhu, Y., Pan, Z., and Gong, W.: Satellite retrieval of cloud base height and geometric thickness of low-level cloud based on CALIPSO, Atmos. Chem. Phys., 21, 11979–12003, https://doi.org/10.5194/acp-21-11979-2021, 2021.
Meerkötter, R. and Bugliaro, L.: Diurnal evolution of cloud base heights in convective cloud fields from MSG/SEVIRI data, Atmos. Chem. Phys., 9, 1767–1778, https://doi.org/10.5194/acp-9-1767-2009, 2009.
Miller, R. M., Rauber, R. M., Di Girolamo, L., Rilloraza, M., Fu, D., McFarquhar, G. M., Nesbitt, S. W., Ziemba, L. D., Woods, S., and Thornhill, K. L.: Influence of natural and anthropogenic aerosols on cloud base droplet size distributions in clouds over the South China Sea and West Pacific, Atmos. Chem. Phys., 23, 8959–8977, https://doi.org/10.5194/acp-23-8959-2023, 2023.
Miller, S. D., Rogers, M. A., Haynes, J. M., Sengupta, M., and Heidinger, A. K.: Short-term solar irradiance forecasting via satellite/model coupling, Sol. Energy, 168, 102–117, https://doi.org/10.1016/j.solener.2017.11.049, 2018.
Min, M. and Zhang, Z.: On the influence of cloud fraction diurnal cycle and sub-grid cloud optical thickness variability on all-sky direct aerosol radiative forcing, J. Quant. Spectrosc. Ra., 142, 25–36, https://doi.org/10.1016/j.jqsrt.2014.03.014, 2014.
Min, M., Wu, C., Li, C., Liu, H., Xu, N., Wu, X., Chen, L., Wang, F., Sun, F., Qin, D., Wang, X., Li, B., Zheng, Z., Cao, G., and Dong, L.: Developing the science product algorithm testbed for Chinese next-generation geostationary meteorological satellites: FengYun-4 series, J. Meteorol. Res.-PRC, 31, 708–719, https://doi.org/10.1007/s13351-017-6161-z, 2017.
Min, M., Li, J., Wang, F., Liu, Z., and Menzel, W. P.: Retrieval of cloud top properties from advanced geostationary satellite imager measurements based on machine learning algorithms, Remote Sens. Environ., 239, 111616, https://doi.org/10.1016/j.rse.2019.111616, 2020.
Min, M., Chen, B., Xu, N., He, X., Wei, X., and Wang, M.: Nonnegligible diurnal and long-term variation characteristics of the calibration biases in Fengyun-4A/AGRI infrared channels based on the oceanic drifter data, IEEE T. Geosci. Remote, 60, 1–15, https://doi.org/10.1109/TGRS.2022.3160450, 2022.
Noh, Y.-J., Forsythe, J. M., Miller, S. D., Seaman, C. J., Li, Y., Heidinger, A. K., Lindsey, D. T., Rogers, M. A., and Partain, P. T.: Cloud-base height estimation from VIIRS. Part II: A statistical algorithm based on A-Train satellite data, J. Atmos. Ocean. Tech., 34, 585–598, https://doi.org/10.1175/JTECH-D-16-0110.1, 2017.
Noh, Y.-J., Miller, S. D., Seaman, C. J., Haynes, J. M., Li, Y., Heidinger, A. K., and Kulie, M. S.: Enterprise AWG Cloud Base Algorithm (ACBA), NOAA NESDIS Center for Satellite Applications and Research, Algorithm Theoretical Basis Document (ATBD), 2022.
Omar, A., Winker, D., Kittaka, C., Vaughan, M., Liu, Z., Hu, Y., Trepte, C., Rogers, R., Ferrare, R., Kuehn, R., and Hostetler, C.: The CALIPSO automated aerosol classification and lidar ratio selection algorithm, J. Atmos. Ocean. Tech., 26, 1994–2014, https://doi.org/10.1175/2009JTECHA1231.1, 2009.
Platnick, S., Ackerman, S., King, M., et al.: MODIS Atmosphere L2 Cloud Product (06_L2), NASA MODIS Adaptive Processing System, Goddard Space Flight Center [data set], USA, https://doi.org/10.5067/MODIS/MOD06_L2.061, 2015.
Platnick, S., Meyer, K. G., King, M. D., Wind, G., Amarasinghe, N., Marchant, B., Arnold, G. T., Zhang, Z., Hubanks, P. A., Holz, R. E., Yang, P., Ridgway, W. L., and Riedi, J.: The MODIS cloud optical and microphysical products: Collection 6 updates and examples from Terra and Aqua, IEEE T. Geosci. Remote, 55, 502–525, https://doi.org/10.1109/TGRS.2016.2610522, 2017.
Rosenfeld, D., Zheng, Y., Hashimshoni, E., Pohlker, M. L., Jefferson, A., Pohlker, C., Yu, X., Zhu, Y., Liu, G., Yue, Z., Fischman, B., Li, Z., Giguzin, D., Goren, T., Artaxo, P., Barbosa, H. M., Poschl, U., and Andreae, M. O.: Satellite retrieval of cloud condensation nuclei concentrations by using clouds as CCN chambers, P. Natl. Acad. Sci. USA, 113, 5828–5834, https://doi.org/10.1073/pnas.1514044113, 2016.
Sassen, K. and Wang, Z.: Classifying clouds around the globe with the CloudSat radar: 1-year of results, Geophys. Res. Lett., 35, L04805, https://doi.org/10.1029/2007GL032591, 2008.
Seaman, C. J., Noh, Y.-J., Miller, S. D., Heidinger, A. K., and Lindsey, D. T.: Cloud-base height estimation from VIIRS. Part I: Operational algorithm validation against CloudSat, J. Atmos. Ocean. Tech., 34, 567–583, https://doi.org/10.1175/jtech-d-16-0109.1, 2017.
Sharma, S., Vaishnav, R., Shukla, M. V., Kumar, P., Kumar, P., Thapliyal, P. K., Lal, S., and Acharya, Y. B.: Evaluation of cloud base height measurements from Ceilometer CL31 and MODIS satellite over Ahmedabad, India, Atmos. Meas. Tech., 9, 711–719, https://doi.org/10.5194/amt-9-711-2016, 2016.
Stephens, G. L., Vane, D. G., Boain, R. J., Mace, G. G., and Sassen, K.: The CloudSat mission and the A-Train: A new dimension of space-based observations of clouds and precipitation, B. Am. Meteorol. Soc., 83, 1771–1790, 2002.
Stubenrauch, C. J., Rossow, W. B., Kinne, S., Ackerman, S., Cesana, G., Chepfer, H., Di Girolamo, L., Getzewich, B., Guignard, A., Heidinger, A., Maddux, B. C., Menzel, W. P., Minnis, P., Pearl, C., Platnick, S., Poulsen, C., Riedi, J., Sun-Mack, S., Walther, A., Winker, D., Zeng, S., and Zhao, G.: Assessment of global cloud datasets from satellites: project and database initiated by the GEWEX radiation panel, B. Am. Meteorol. Soc., 94, 1031–1049, https://doi.org/10.1175/bams-d-12-00117.1, 2013.
Su, T., Zheng, Y., and Li, Z.: Methodology to determine the coupling of continental clouds with surface and boundary layer height under cloudy conditions from lidar and meteorological data, Atmos. Chem. Phys., 22, 1453–1466, https://doi.org/10.5194/acp-22-1453-2022, 2022.
Tan, Z., Huo, J., Ma, S., Han, D., Wang, X., Hu, S., and Yan, W.: Estimating cloud base height from Himawari-8 based on a random forest algorithm, Int. J. Remote Sens., 42, 2485–2501, https://doi.org/10.1080/01431161.2020.1854891, 2020.
Thorsen, T. J., Fu, Q., and Comstock, J.: Comparison of the CALIPSO satellite and ground-based observations of cirrus clouds at the ARM TWP sites, J. Geophys. Res.-Atmos., 116, D21203, https://doi.org/10.1029/2011jd015970, 2011.
US NOAA: NCEP Products Inventory: Global Products, https://www.nco.ncep.noaa.gov/pmb/products/gfs/, last access: 17 December 2024.
Viúdez-Mora, A., Costa-Surós, M., Calbó, J., and González, J. A.: Modeling atmospheric longwave radiation at the surface during overcast skies: The role of cloud base height, J. Geophys. Res.-Atmos., 120, 199–214, https://doi.org/10.1002/2014jd022310, 2015.
Wang, F., Min, M., Xu, N., Liu, C., Wang, Z., and Zhu, L.: Effects of linear calibration errors at low temperature end of thermal infrared band: Lesson from failures in cloud top property retrieval of FengYun-4A geostationary satellite, IEEE T. Geosci. Remote, 60, 5001511, https://doi.org/10.1109/TGRS.2022.3140348, 2022.
Wang, T., Shi, J., Ma, Y., Letu, H., and Li, X.: All-sky longwave downward radiation from satellite measurements: General parameterizations based on LST, column water vapor and cloud top temperature, ISPRS J. Photogramm., 161, 52–60, https://doi.org/10.1016/j.isprsjprs.2020.01.011, 2020.
Wang, X., Min, M., Wang, F., Guo, J., Li, B., and Tang, S.: Intercomparisons of cloud mask product among Fengyun-4A, Himawari-8 and MODIS, IEEE T. Geosci. Remote, 57, 8827–8839, https://doi.org/10.1109/TGRS.2019.2923247, 2019.
Wang, Z., Vane, D., Stephens, G., and Reinke, D.: Level 2 combined radar and lidar cloud scenario classification product process description and interface control document, JPL Document, CloudSat Project, A NASA Earth System Science Pathfinder Mission, 2012.
Warren, S. G. and Eastman, R.: Diurnal Cycles of Cumulus, Cumulonimbus, Stratus, Stratocumulus, and Fog from Surface Observations over Land and Ocean, J. Climate, 27, 2386–2404, https://doi.org/10.1175/jcli-d-13-00352.1, 2014.
Winker, D. M., Vaughan, M. A., Omar, A., Hu, Y., Powell, K. A., Liu, Z., Hunt, W. H., and Young, S. A.: Overview of the CALIPSO mission and CALIOP data processing algorithms, J. Atmos. Ocean. Tech., 26, 2310–2323, https://doi.org/10.1175/2009JTECHA1281.1, 2009.
Yang, J., Li, S., Gong, W., Min, Q., Mao, F., and Pan, Z.: A fast cloud geometrical thickness retrieval algorithm for single-layer marine liquid clouds using OCO-2 oxygen A-band measurements, Remote Sens. Environ., 256, 112305, https://doi.org/10.1016/j.rse.2021.112305, 2021.
Young, S. A. and Vaughan, M. A.: The retrieval of profiles of particulate extinction from Cloud Aerosol Lidar Infrared Pathfinder Satellite Observations (CALIPSO) data: Algorithm description, J. Atmos. Ocean. Tech., 26, 1105–1119, https://doi.org/10.1175/2008JTECHA1221.1, 2009.
Zhang, Y., Zhang, L., Guo, J., Feng, J., Cao, L., Wang, Y., Zhou, Q., Li, L., Li, B., Xu, H., Liu, L., An, N., and Liu, H.: Climatology of cloud-base height from long-term radiosonde measurements in China, Adv. Atmos. Sci., 35, 158–168, https://doi.org/10.1007/s00376-017-7096-0, 2018.
Zheng, Y. and Rosenfeld, D.: Linear relation between convective cloud base height and updrafts and application to satellite retrievals, Geophys. Res. Lett., 42, 6485–6491, https://doi.org/10.1002/2015gl064809, 2015.
Zheng, Y., Sakradzija, M., Lee, S.-S., and Li, Z.: Theoretical Understanding of the Linear Relationship between Convective Updrafts and Cloud-Base Height for Shallow Cumulus Clouds. Part II: Continental Conditions, J. Atmos. Sci., 77, 1313–1328, https://doi.org/10.1175/jas-d-19-0301.1, 2020.
Zhou, Q., Zhang, Y., Li, B., Li, L., Feng, J., Jia, S., Lv, S., Tao, F., and Guo, J.: Cloud-base and cloud-top heights determined from a ground-based cloud radar in Beijing, China, Atmos. Environ., 201, 381–390, https://doi.org/10.1016/j.atmosenv.2019.01.012, 2019.
Zhou, R., Pan, X., Xiaohu, Z., Na, X., and Min, M.: Research progress and prospects of atmospheric motion vector based on meteorological satelliteimages, Reviews of Geophysics and Planetary Physics, 55, 184–194, https://doi.org/10.19975/j.dqyxx.2022-077, 2024 (in Chinese with English abstract).
Zhu, Y., Rosenfeld, D., Yu, X., Liu, G., Dai, J., and Xu, X.: Satellite retrieval of convective cloud base temperature based on the NPP/VIIRS Imager, Geophys. Res. Lett., 41, 1308–1313, https://doi.org/10.1002/2013gl058970, 2014.
- Abstract
- Introduction
- Data
- Physics- and machine-learning-based cloud base height algorithms
- Results and discussions
- Conclusions and discussion
- Appendix A
- Appendix B
- Data availability
- Author contributions
- Competing interests
- Disclaimer
- Acknowledgements
- Financial support
- Review statement
- References
- Supplement
- Abstract
- Introduction
- Data
- Physics- and machine-learning-based cloud base height algorithms
- Results and discussions
- Conclusions and discussion
- Appendix A
- Appendix B
- Data availability
- Author contributions
- Competing interests
- Disclaimer
- Acknowledgements
- Financial support
- Review statement
- References
- Supplement