Investigation of near-global daytime boundary layer height using high-resolution radiosondes: first results and comparison with ERA5, MERRA-2, JRA-55, and NCEP-2 reanalyses

The planetary boundary layer (PBL) governs the vertical transport of mass, momentum, and moisture between the surface and the free atmosphere, and thus the determination of PBL height (BLH) is recognized as crucial for air quality, weather, and climate analysis. Although reanalysis products can provide important insight into the global view of BLH in a seamless way, the BLH observed in situ on a global scale remains poorly understood due to the lack of high-resolution (1 or 2 s) radiosonde measurements. The present study attempts to establish a near-global BLH climatology at synoptic times (00:00 and 12:00 UTC) and in the daytime using high-resolution radiosonde measurements over 300 radiosonde sites worldwide for the period 2012 to 2019, which is then compared against the BLHs obtained from four reanalysis datasets, including ERA5, MERRA-2, JRA-55, and NCEP-2. The variations in daytime BLH exhibit large spatial and temporal dependence, and as a result the BLH maxima are generally discerned over the regions such as the western United States and western China, in which the balloon launch times mostly correspond to the afternoon. The diurnal variations in BLH are revealed with a peak at 17:00 local solar time (LST). The most promising reanalysis product is ERA5, which underestimates BLH by around 130 m as compared to radiosondes released during daytime. In addition, MERRA-2 is a well-established product and has an underestimation of around 160 m. JRA-55 and NCEP-2 might produce considerable additional uncertainties, with a much larger underestimation of up to 400 m. The largest bias in the reanalysis data appears over the western United States and western China, and it might be attributed to the maximal BLH in the afternoon when the PBL has risen. Statistical analyses further indicate that the biases of reanalysis BLH products are positively associated with orographic complexity, as well as the occurrence of static instability. To our best knowledge, this study presents the first near-global view of high-resolution radiosonde-derived boundary layer Published by Copernicus Publications on behalf of the European Geosciences Union. 17080 J. Guo et al.: Investigation of near-global daytime boundary layer height height and provides a quantitative assessment of the four frequently used reanalysis products.

Abstract. The planetary boundary layer (PBL) governs the vertical transport of mass, momentum, and moisture between the surface and the free atmosphere, and thus the determination of PBL height (BLH) is recognized as crucial for air quality, weather, and climate analysis. Although reanalysis products can provide important insight into the global view of BLH in a seamless way, the BLH observed in situ on a global scale remains poorly understood due to the lack of high-resolution (1 or 2 s) radiosonde measurements. The present study attempts to establish a near-global BLH climatology at synoptic times (00:00 and 12:00 UTC) and in the daytime using high-resolution radiosonde measurements over 300 radiosonde sites worldwide for the period 2012 to 2019, which is then compared against the BLHs obtained from four reanalysis datasets, including ERA5, MERRA-2, JRA-55, and NCEP-2. The variations in daytime BLH exhibit large spatial and temporal dependence, and as a result the BLH maxima are generally discerned over the re-gions such as the western United States and western China, in which the balloon launch times mostly correspond to the afternoon. The diurnal variations in BLH are revealed with a peak at 17:00 local solar time (LST). The most promising reanalysis product is ERA5, which underestimates BLH by around 130 m as compared to radiosondes released during daytime. In addition, MERRA-2 is a well-established product and has an underestimation of around 160 m. JRA-55 and NCEP-2 might produce considerable additional uncertainties, with a much larger underestimation of up to 400 m. The largest bias in the reanalysis data appears over the western United States and western China, and it might be attributed to the maximal BLH in the afternoon when the PBL has risen. Statistical analyses further indicate that the biases of reanalysis BLH products are positively associated with orographic complexity, as well as the occurrence of static instability. To our best knowledge, this study presents the first near-global view of high-resolution radiosonde-derived boundary layer

Introduction
The planetary boundary layer (PBL) is where most exchanges of heat, moisture, momentum, and mass take place between the free atmosphere and ground surface (Stull, 1988;Liu and Liang, 2010). The spatial and temporal variability in the PBL, through a variety of physical processes, has a profound influence on research fields such as air quality (Stull, 1988;Li et al., 2017), convective storm (Oliveira et al., 2020), and global warming (Davy and Esau, 2016), among others. It is well known to be influenced by radiative cooling at night and by downward solar radiation reaching the ground surface at daytime, respectively, forming a stable boundary layer (SBL) and convective boundary layer (CBL), with a typical PBL depth (BLH) of less than 500 m and 1-3 km (Zhang et al., 2020a), respectively. For climate models, most of the PBL processes occur at sub-grid scales and thus are either underrepresented or not fully represented (von Engeln and Teixeira, 2013). Meanwhile, there are many problems in elucidating the PBL processes using numerical model simulations (Martins et al., 2010), even over the relatively homogeneous ocean (Belmonte Rivas and Stoffelen, 2019), which is likely due to the scarcity of fine-scale vertical observations of the atmosphere.
Over the oceans Belmonte Rivas and Stoffelen (2019) performed a climatological comparison between state-of-the-art reanalysis and scatterometer surface winds in the PBL, revealing mean and transient PBL model errors. Houchi et al. (2010), based on high-resolution radiosondes, verified the climatological wind profiles and found in particular wind shear a factor of 2-3 lower simulated by the European Centre for Medium-Range Weather Forecasts (ECMWF) model. Wind shear is recognized to be able to significantly modulate turbulent mixing of atmospheric pollutants (Zhang et al., 2020b), and thus the inabilities of the model in this regard may have repercussions for air quality prediction.
The critical interaction between PBL turbulence and vertical structures of thermodynamic variables, as the heart of PBL physics, makes the determination of BLH a big challenge due largely to the difficulty for those instruments with coarse vertical resolution in resolving the sharp gradients of temperature and water vapor at the top of the PBL, as well as estimating PBL-top entrainment and lateral entrainment (Teixeira et al., 2021). Thus, this highlights the importance of high-resolution vertical measurements of thermodynamic variables. The temporal and spatial variations in BLH have been extensively assessed in previous studies at a regional or national scale, such as the contiguous United States (Seidel et al., 2012;Zhang et al., 2020a), Europe (Palarz et al., 2018), and the Arctic and Antarctic (Zhang et al., 2011), which are mainly implemented by low-resolution ra-diosonde measurements, reanalysis, or both. Fortunately, a few pioneering studies in characterizing BLH have adopted high-resolution measurements at a national scale over China Zhang et al., 2018;Su et al., 2018) and the United States (Seidel et al., 2010). Notable diurnal and seasonal cycles have been revealed (e.g., Short et al., 2019). Besides the regional results, several attempts have been made to provide global-scale retrievals of BLH using the Global Positioning System radio occultation (GPS RO) and Integrated Global Radiosonde Archive (IGRA) version 2 (Seidel et al., 2010;Gu et al., 2020;Ratnam and Basha, 2010), in which seasonal variations in and maritime-continental contrasts of BLHs have been achieved. The measurements of GPS RO, at a vertical resolution of 100 m around the PBL top, are typically used to determine BLH by searching for the altitude with a sharp gradient in the refractivity profile (Basha et al., 2018). However, such a sharp gradient of refractivity might overestimate BLH compared to other methods that the community usually used, such as the parcel method (Seidel et al., 2010). Compared with high-resolution soundings, IGRA is sparsely sampled in the vertical (about 10-30 layers below 500 hPa), which could result in large uncertainties in estimating BLH. Likewise, additional errors could be introduced in reanalysis products for their sparse vertical resolutions (about 6-42 layers below 500 hPa), which are equivalent to or bigger than IGRA. A large spread emerges in the explicit determination of BLH from a variety of instruments; in spite of that the BLH detection based on radiosonde is the most accepted methodology for deriving the CBL and SBL (Seidel et al., 2012;de Arruda Moreira et al., 2018).
A wide range of reanalysis products, such as those from the fifth generation ECMWF atmospheric reanalysis of the global climate (ERA5), the National Aeronautics and Space Administration (NASA) Modern-Era Retrospectiveanalysis for Research and Applications version 2 (MERRA-2), Japanese 55-year Reanalysis (JRA-55), and the National Centers for Environmental Prediction (NCEP) climate forecast system version 2 (NCEP-2), provide a rich ensemble of climate data products (Saha et al., 2014;Hersbach et al., 2020;Kobayashi et al., 2015;Gelaro et al., 2017) but are sensitive to both empirical parameterizations and the diagnostic method chosen, while verification by direct observations of BLH are sparse (Seibert et al., 2000). Some intercomparisons between instruments or model data, such as radiosonde, CALIOP, and ERA-interim reanalysis, have been previously conducted and yielded good consistency in seasonal and spatial variation (e.g., Zhang et al., 2016). However, Basha et al. (2018) demonstrate that ERA-interim can underestimate BLH by around 900 m compared to GPS RO. This underestimation may be caused by different kinetic or thermodynamic assumptions used. For instance, ERA-interim is implemented with a bulk Richardson number (BRN) method (Palm et al., 2005), which is believed to be suitable for all atmospheric conditions (Ander-son, 2009). It is worth highlighting that the state-of-the-art reanalysis could be one of the most promising data sources for obtaining the synoptic or climatological features of BLH.
Despite much progress made in developing the BLH products, there are still some unresolved issues in quantifying the variability in BLH from a global perspective. These issues include the worldwide variation in BLH by high-resolution vertical soundings, the intercomparisons among reanalysis datasets, and further evaluations with radiosonde observations, especially in the daytime based on the same retrieval algorithm. To this end, this study seeks to address the following scientific questions: (1) a climatological distribution of near-global BLH by using high-resolution radiosonde measurements; (2) intercomparisons of ERA5, MERRA-2, JRA-55, and NCEP-2 with additional evaluation with radiosondes; and (3) the investigation of potential sources for the biases of BLH between observation and reanalysis. The rest of the paper is organized as follows. The descriptions of highresolution radiosonde data, reanalysis products, and the bulk Richardson number method are given in Sect. 2. Section 3 presents the spatial distributions of BLH by radiosonde and reanalyses and their intercomparisons. A brief conclusion and remarks are finally outlined in Sect. 4.
2 Data descriptions and BLH retrieval method

High-resolution radiosonde measurements
In 2018, IGRA provided atmospheric soundings at around 445 radiosonde sites across the globe, including pressure, temperature, humidity, and wind vector. The number of pressure levels below 500 hPa is around 10-30. By comparison, for high-resolution radiosondes, the sampling rate is 1 or 2 s, corresponding to a vertical resolution of approximately 5-10 m throughout the atmosphere. The high-resolution radiosonde measurements used in the present study are obtained from 342 sites around the world, which are provided by several organizations, including the China Meteorological Administration (CMA), the National Oceanic and Atmospheric Administration (NOAA) of the United States, the German Deutscher Wetterdienst (Climate Data Center), the Centre for Environmental Data Analysis (CEDA) of the United Kingdom, the Global Climate Observing System (GCOS) Reference Upper-Air Network (GRUAN), and the University of Wyoming.
The CMA has maintained the China Radiosonde Network (CRN), which contains 120 operational stations homogeneously distributed across mainland China with a vertical sampling rate of 1 s (5-8 m resolution), since 2011 (Guo et al., 2016(Guo et al., , 2019Zhang et al., 2016Zhang et al., , 2018Su et al., 2020). The NOAA started the Radiosonde Replacement System (RRS) program in 2005, which involved 89 sites with a vertical resolution of 5 m (Zhang et al., 2019). The German Deutscher Wetterdienst (Climate Data Center) has been shar-ing the radiosonde measurements at 14 sites with a sampling rate of 2 s since 2010. Moreover, the 10 m resolution soundings at 12 sites were provided by the CEDA, which began to share soundings in 1990, and eight radiosonde sites were shared by GRUAN with a vertical resolution smaller than 10 m. An additional 93 sites came from the University of Wyoming, which started in 2017, with a sampling rate of 2 or 1 s. In total, over 678 000 soundings at 342 stations are used here for the period of January 2012 to December 2019, in total 8 years, including 633 000 soundings at the regular release times of 00:00 and 12:00 UTC and 43 000 more irregular observations during intensive observation periods (IOPs).
Radiosonde measurements are taken twice per day following the World Meteorological Organisation (WMO) protocol for synoptic times at 00:00 and 12:00 UTC (Seibert et al., 2000) except for special field campaign observations at specified stations or time ranges during IOPs. The protocol implies that stations at different longitudes sample the diurnal cycle differently. For instance, stations near 0 • E (London) and 180 • E (Samoa) sample at midnight and midday, while stations near 90 • E (Bangladesh) and 90 • W (Chicago) sample at dawn and dusk, with intermediate longitudes at linearly varying intermediate local solar times (LSTs) of day. For wintertime regions near 90 • W and 90 • E, the release times are insufficient for evaluating the BLH during daytime. Hence, the BLH estimates from regular radiosondes will vary with longitude and season (McGrath-Spangler and Denning, 2012). Generally, the principal PBL mechanism at night is associated with an SBL, which gradually transitions into the CBL in the morning (Stull, 1988;Zhang et al., 2018). The transition from the SBL to CBL is generally quick and occurs swiftly after sunrise, but the reverse process can be slow in the late evening (Taylor et al., 2014). Despite the dominance of the CBL during the daytime, an SBL still occurs, especially in the event of overcast sky (Zhang et al., 2018(Zhang et al., , 2020b and near strong divergences in moist convective downbursts (King et al., 2017). To illustrate the daytime variation in BLH, we only selected the soundings that are launched 2 h after sunrise and 2 h before sunset. The sunrise and sunset times are gauged in a longitude bin size of 15 • and are based on the latitude of station and the calendar day of the release. Using this definition, a total of 190 013 profiles, including soundings launched at both synoptic times and during IOPs, spanning January 2012 to December 2019 are used to obtain the BLHs in the daytime. The spatial distribution of file number for each site is displayed in Fig. S1 in the Supplement, in which the sites with less than 10 matches are excluded.
2.2 ERA5, MERRA-2, JRA-55, and NCEP-2 reanalysis datasets ERA5 is the successor of ERA-interim and has undergo a variety of improvements, including more recent parameterization schemes and data assimilation system, better spatial res-olution, both horizontally and vertically (137 levels), and improved representation of evaporation balance, cyclones, soil moisture, and global precipitation (Hersbach et al., 2020). The BLH is composited in the ERA5 product on a 1440×721 grid with 0.25 • latitude and 0.25 • longitude resolution. It is computed by the bulk Richardson number method with a temporal resolution of 1 h. MERRA-2 is the latest atmospheric reanalysis of the modern satellite era produced by NASA's Global Modeling and Assimilation Office (GMAO). It includes aerosol data assimilation, improvements on ozone, and cryospheric processes (Gelaro et al., 2017). In this product, the BLH is packaged and defined by identifying the lowest level at which the heat diffusivity drops below a threshold value (McGrath-Spangler and Denning, 2012). The formula for calculating BLH is as follows: where BLH(MERRA2_packaged) is in units of meters, P PBLtop is the BLH (packaged parameter in MERRA-2; in units of Pa), and P Surface is the surface pressure (in units of Pa). However, to preclude the uncertainty raised by different methods adopted, the BLH by MERRA-2 is extracted by the bulk Richardson number method by utilizing the parameters of horizontal wind, temperature, geopotential height, relative humidity (RH), and surface pressure as inputs. These input data are provided on a grid of 576 × 361 points with 0.5 • latitude and 0.625 • longitude resolution, and they have 42 pressure levels (about 16 layers below 500 hPa) with a temporal resolution of 3 h. JRA-55 is the second Japanese global atmospheric reanalysis commissioned by the Japan Meteorological Agency (JMA) (Kobayashi et al., 2015). Data contain 37 pressure levels between 1 and 1000 hPa (16 layers below 500 hPa), provided on a grid of 288 × 145 points with a horizontal spacing of 1.25 • ×1.25 • and a temporal resolution of 6 h. The parameters, including geopotential height, temperature, horizontal wind, surface pressure, and RH, are used to assess BLH as before.
NCEP-2 has a coarser model resolution than ERA5 (Rinke et al., 2019) with a spatial resolution of 2.5 • latitude and 2.5 • longitude. The total level is 17 (6 layers below 500 hPa), which is substantially less than MERRA-2, JRA-55, or ERA5, and the temporal resolution is 6 h. Similar parameters to JRA-55 are preserved to compute BLH. It is noteworthy that all model times include 00:00 and 12:00 UTC and hence collocate well with the synoptic radiosonde times.

Bulk Richardson number method
In the spirit of a like-for-like comparison, the BLHs derived from radiosonde and reanalysis data (MERRA-2, JRA-55, and NCEP-2) are calculated using the bulk Richardson number (BRN), which also serves as the built-in algorithm in Figure 1. Profiles of basic atmospheric parameters from the ground up to 2.5 km a.g.l., including wind speed (orange), bulk Ri (black), temperature (blue), and RH (green) at 06:00 UTC (14:00 LST) on 6 June 2016 at Chongqing (29.6 • N, 106.4 • E; 541 m) from radiosonde (a), MERRA-2 (b), NCEP-2 (c), and JRA-55 (d) reanalysis datasets. The boundary layer height (BLH) in each subplot is marked as dashed red lines and red text, and the BLH for ERA5 is 1265 m in this case (dashed black lines).
ERA5 for BLH products. The BRN, an algorithm used to reflect how strongly buoyancy is coupled to the vertical momentum (Scotti, 2015), has been widely used for the climatological study of BLH from radiosonde measurements thanks to its applicability and reliability for all PBL regimes (Anderson 2009;Seidel et al., 2012;Guo et al., 2019). It determines the BLH by identifying the level at which the bulk Richardson number, represented by Ri (z), reaches its critical value The dots with gray marginal lines in each map denote the mean BLH derived by sondes at 00:00 UTC, and the dotted red lines present the mean BLH derived by radiosonde on a grid with 5 • longitude. Stations with less than 10 profiles are not included in the analysis. The 2D scatter plot in the bottom left corner of each panel illustrates the correlations between reanalysis-derived and sonde-derived BLHs at 00:00 UTC, for which the asterisk (*) superscripts indicate that the correlation coefficients are statistically significant (p<0.05), and the red lines denote the least-squares regression line. (Palm et al., 2005) and is formulated as follows: where g is the gravitational acceleration, z AG the height above ground level (a.g.l.), θ v the virtual potential temperature, u * the surface friction velocity, u and v the horizontal wind components, and b a constant, which is usually set to zero due to the fact that friction velocity is much weaker compared with the horizontal wind (Seidel et al., 2012). The subscripts of z and s denote the parameters at z height above ground and ground level, respectively. It is known that Ri(z) increases with increasing free flow stability (Zilitinkevich and Baklanov, 2002). Below a critical value of 0.25, the flow is dynamically unstable and likely causes turbulent motion. Nevertheless, since turbu- lence can also occur away from this critical value (Haack et al., 2014), care must be taken in that the critical value might not be well defined, leading to uncertainty in estimating BLH. Meanwhile, the BLH estimates were found not to change very much by differing the input of critical values (Ri = 0.2; 0.25; 0.3) . Therefore, for a given discrete Ri profile, here we identify the BLH as the interpolated height at which the Ri(z) firstly crosses the critical value of 0.25 starting upward from the ground surface. Moreover, it is well recognized that the vertical resolution of radiosonde measurements has a large impact on the BLH estimated. For instance, BLHs are usually lower for a sparser vertical resolution (Seidel et al., 2012). Therefore, factors that cause uncertainty in estimating BLH by using the bulk Richardson method include, but are not limited to, meteorological parameters, the surface friction, vertical resolution of data, and the critical value of Ri.

Collocation procedure and a case study
In contrast to the reanalysis data, the longitude and latitude distributions of high-resolution radiosonde are irregular. A precise comparison between reanalysis data and sounding is required for consistency in time, latitude, and longitude. The matching procedures implemented in this present study go as follows. (1) A latitudinal and longitudinal matching procedure is carried out by finding the geographical grid cell of the reanalysis product that contains the radiosonde station.
(2) Time matching for ERA5 is to find the exact UTC time (hour) of the weather balloon launch. (3) For MERRA-2, NCEP-2, and JRA-55 datasets, the requirement is to limit the time difference with the weather balloon launch time to 1 h.
A case at 06:00 UTC on 6 June 2016 in Chongqing (29.6 • N, 106.4 • E; 541 m) is shown in Fig. 1. In this case, BLH obtained by sounding is 1337 m and is closest to that by ERA5, which underestimates the height by 72 m. Compared with the radiosonde profile, MERRA-2 can capture the main vertical structures and the magnitude of wind speed (WS), RH, and temperature but not the fine-scale vertical variations (Fig. 1b). It also slightly undervalues the BLH by 125 m. The basic parameters outlined by NCEP-2, for instance, RH (5 % larger than sounding), temperature (3 • C less than sounding), and wind speed (4.5 m s −1 larger than sounding), all have notable differences from the sounding (Fig. 1c). Eventually, the NCEP-2-derived BLH is considerably underestimated by 729 m. By and large, the profiles from JRA-55 are not as accurate as those from MERRA-2. More specifically, the wind speed at some heights, prominently above 2 km, is underestimated (Fig. 1d); the mean RH is 4 % less than that from the sounding. As a result, JRA-55 substantially underestimates BLH by 399 m. Based on this case, we can note that the performances of ERA5 and MERRA-2 are obviously better than those from JRA-55 and NCEP-2 in terms of the BLH. The remarkable underestimation by NCEP-2 can be attributed to the underestimations in near-surface virtual potential temperature (roughly 2.46 K less than sounding) and temperature. By comparison, the smaller BLH in JRA-55 could be attributed to the underestimated RH.

Normalized sensible and latent heat flux in the daytime
The sensible heat flux represents the level of energy that induces CBL growth (Wei et al., 2017), whereas the latent heat fluxes characterize the evaporation of moisture from the soil to the CBL, which feeds back on the development of the CBL and the formation of PBL cloud (Pal and Haeffelin, 2015). For a given amount of heat flux, small latent heat fluxes usually mean more energy being available for PBL growth (Chen et al., 2016). When less energy is constrained by the moist ground, more energy is available to heat the air. Moreover, the surface heat flux is closely associated with near-surface meteorological variables. For instance, a lower RH usually indicates a larger sensible heat flux and lower latent heat flux (Guo et al., 2019;Zhang et al., 2013). Suppose that the heat supplied to the air at the radiosonde balloon launch time is the area shaded under the heat flux curve (Fig. 11.12 in Stull, 1988); the normalized sensible heat flux in the daytime is defined by where T sunrise and T launch are the sunrise time and radiosonde balloon launch time, Q H is the sensible heat flux, and ρ is the near-surface density; c p equals 1004 J C •−1 kg −1 . A similar principle is applied to the calculation of normalized latent heat flux as well.
3 Results and discussion

Overview of BLHs at two synoptic times and over the day
The near-global mean BLHs at 00:00 UTC from 2012 to 2019 by four reanalysis products are shown in Fig. 2, in which the results obtained from radiosonde are overlaid by colored circles. The stations with sounding covering at least 2 continuous years are kept. The four reanalysis products yield an analogous result with respect to the spatial variation in BLHs, which are positively correlated with the soundingderived BLH, with correlation coefficients of 0.90, 0.81, 0.47, and 0.46 for ERA5, MERRA-2, NCEP-2, and JRA-55, respectively. It is evident that the BLHs from NCEP-2 over the continents of Africa, Asia, and South America are 300 m thicker than those of the other three products (Fig. 2b). Furthermore, the BLH in Antarctic by ERA5 is notably 500 m lower than that by NCEP-2 and MERRA-2 (Fig. 2a). Most of the mean BLHs by radiosonde are consistent with the reanalysis products, except that the values from all four reanalysis products over the Pacific Ocean and the contiguous United States are underestimated by about 300 m. Moreover, it is worth noting here that the BLHs by JRA-55 are considerably underestimated by around 1 km over these regimes.
For 00:00 UTC, the regions nearly from the east coast to the west coast of the Pacific Ocean (UTC + 8 to UTC + 12 and UTC − 12 to UTC − 8) are covered by sunshine and thus are filled with a deeper PBL. Comparable results at 12:00 UTC are presented in Fig. S2. Africa, the Middle East, and the west of India and China, corresponding to local noon and afternoon, have maximal BLHs of around 1.8 km. Moreover, it is noteworthy that the values from NCEP-2 and JRA-55 over these areas are visibly lower than those from ERA5 and MERRA-2, particularly over Africa and the Middle East, whereas these low values can barely be validated with soundings due to their sparse distribution. Over these areas, the BLHs are underestimated by reanalysis by about 200 m relative to the sounding results. Notably, BLHs from NCEP-2 over the continents of Africa are 1 km lower than those from ERA5 and MERRA-2. According to the results at 00:00 and 12:00 UTC, the comparisons between reanalysis products and soundings demonstrate that the BLHs are well resolved in the nighttime but are underestimated at daytime by reanalysis datasets.
For the near-global variation in BLH at a certain synoptic time, daytime and nighttime appear on the map simultaneously but as a function of longitude, which is displayed in Fig. 2. Thus, the variations at a fixed synoptic time on the map create a picture of the diurnal BLH variation. Given the dominance of the CBL in the daytime, investigating the BLHs in the daytime is thus favorable for unraveling the underlying causes for the discrepancies existing in the BLHs from both radiosonde and reanalysis. Therefore, the following results show the variations in daytime BLH only, unless otherwise noted.
The climatological mean variations in the daytime BLH from the soundings and four reanalysis products are drawn in Fig. 3. The period spans from January 2012 to December 2019 for most of the stations provided by China, the United States, Germany, and the United Kingdom. As implied by the results from soundings (Fig. 3e), the deepest PBL is observed over the Tibetan Plateau (TP) and the northwest of China, the south of Africa, and the west of the United States, with values as high as 1.7 km. The possible reason for this phenomenon is that the weather balloons over these regions are basically launched in the early afternoon of boreal summer (June-July-August) when the maximal BLH is usually observed (Collaud Coen et al., 2014;. The BLHs over the Pacific Ocean are noticeably large, with values of 1.3 km. The longitudinal variation in BLH is evident likely due to LST variations in the soundings. Additionally, BLHs at middle and low latitudes are larger than at high latitudes, which is consistent with the findings in Gu et al. (2020).
By and large, the climatological results of BLH by radiosonde and four model products are comparable, indicating that both capture the spatial variations implied by the sounding LSTs sampled. Among the model products, ERA5 shows the best prediction of BLH contrasted with radiosonde, with a correlation coefficient of 0.88 (Fig. 3a). Furthermore, the results from MERRA-2 are positively correlated with those from the soundings, with a correlation coefficient of 0.66 (Fig. 3b). The performances of JRA-55 and NCEP-2 are significantly poorer than those of ERA5 and MERRA-2, with correlation coefficients of 0.4 and 0.41, respectively (Fig. 3c,  d). The values of BLH over the west of the United States and the west of China are seriously underestimated by NCEP-2 and JRA-55 by around 800 m. Thus, we note that ERA5 and MERRA-2 are more robust in deriving the BLH, purely based on the climatological distribution of BLHs. Figure 4 illustrates the diurnal variations in BLH at 00:00 and 12:00 UTC and during daytime. A notable diurnal variation can be noticed, with a minimum of 343 m at 04:00 LST and a maximum of 1224 m at 17:00 LST (Fig. 4a). The magnitude in BLH during daytime is essentially larger than that at 00:00 and 12:00 UTC and has a maximal value of 1926 m at 17:00 LST (Fig. 4b). It follows that most of the soundings (about 78 %) that are released at 00:00 and 12:00 UTC are excluded by the collocation procedure designed for collecting samples in the daytime. Note that the result during daytime will not significant change with or without IOP data.

Correlations with near-surface meteorological variables and surface heat flux
The PBL is the lowest part of the troposphere and evolves diurnally due to near-surface thermodynamic variables through turbulent exchanges of momentum, heat, and moisture (Pithan et al., 2015). Thus, the surface meteorological variables depend on the underlying land surface and its coupling with the PBL, and they could act as a good proxy for BLH under some specific circumstances (Zhang et al., 2013(Zhang et al., , 2018). An analysis of the correlation between the BLHs by radiosondes and near-surface meteorological variables is presented in Fig. 5. The variables include near-surface air temperature at 2 m a.g.l. (T 2 m ), pressure (Ps), RH, and WS, which are extracted from the first level in sounding. The first level is assumed to be associated with the near-surface variables (Serreze et al., 1992;Wang and Wang 2016). We note that BLH, T 2 m , RH, and WS all have substantial diurnal and seasonal variability, as partly expressed in Eq. (2). Moderate positive (negative) correlation coefficients can be noticed between BLH and T 2 m (RH), with mean values of 0.39 (-0.51) (Fig. 5a, c), implying that both T 2 m and RH could be an adequate indicator for the temporal variation in BLH. Moreover, the correlations between BLH and WS are also positive, with a mean value of 0.24 (Fig. 5d). By contrast, the correlation between Ps and BLH is negatively significant above most of the regions (Fig. 5b).
The correlation analyses between BLH and normalized heat fluxes, which are assessed by ERA5 reanalysis products, are displayed in Fig. 6. It is notable that positive and negative correlation coefficients usually exist in normalized sensible and latent heat flux, with a global mean of 0.29 and −0.31. This correlation is not high because BLH also depends on the radiative heating and cooling and the temperature profile in different stations (Yang et al., 2004).
For the climatological variation in BLH, the near-surface variables, such as T 2 m , RH, and WS, and the normalized sensible and latent heat flux could be a good indicator. Conversely, the development of BLH could also limit the magnitude of RH (McGrath-Spangler, 2016).

Comparisons with reanalysis products
The radiosonde stations are mainly dispersed over the United States, China, Australia, Europe, the Pacific Ocean, and the polar regions, and only a few stations contribute over the rest of the world. The polar regions contain a station with a latitude higher or lower than 67.7 • N and S. Therefore, six regions are specifically examined in terms of the bias between radiosonde and model product.
The BLH differences between ERA5 and radiosonde are shown in Fig. 7, in which we specify the differences over the six above-mentioned regions. As observed in Fig. 7e, the BLH over most of the stations is underestimated to a slight extent, with a near-global mean of 131.96 m. As expected, the most underestimated regions cover the west of the United States and southern China (Fig. 7e), with a difference of around 200 m. In addition, it is worth mentioning that the BLHs over the Pacific Ocean are overestimated in four seasons, with a bias of around 400 m (Fig. 7h). Among the six classified regions, BLHs in Europe, East Asia, and the polar regions are reliably determined by ERA5, with an average bias of around 50 m (Fig. 7b, c, i). The bias seems to exhibit a seasonal dependence, and it is around 62 m larger in the warm seasons compared to cool seasons in both hemispheres. Regardless of the small bias, the newest model product, ERA5, properly estimates the BLH, especially above Europe, the eastern United States, East Asia, and the polar regions. Similarly, the BLHs by MERRA-2 are underestimated, with a near-global mean bias of 166.35 m (Fig. 8), which is slightly larger than that of ERA5 (131.96 m). This could indicate that the MERRA-2-derived BLH is more dispersed than ERA5. The spatial distribution of the bias value is broadly identical to that of ERA5 except that the BLHs over Europe, Australia, and the polar regions are well estimated by MERRA-2 due to much smaller mean biases at 42. 78,52.98,and 66.20 m,respectively (Fig. 8b,g,i).
In addition, the packaged BLH in MERRA-2 is also evaluated with radiosonde. BLH is as high as 3 km over the TP region at 06:00 UTC (Fig. S3), corresponding to an overestimation of 0.8 km over this region (Fig. S4). Over the rest of the regions, BLH is slightly or moderately overestimated by around 50 m. However, The BLH difference among various methods could reach up to a kilometer or even more (Seidel et al., 2010), which is probably owing to the variety of kinetic or thermodynamic theories applied in different algorithms.
By comparison, the mean bias produced by JRA-55 is larger than those from ERA5 and MERRA-2, with a mean value of 351.49 m, as shown in Fig. 9. The BLHs above most stations are underestimated by JRA-55, particularly for the sites over western China, the western United States, and the Pacific Ocean, with an underestimation of about 800 m. The most underestimated stations cluster at the latitude range of 40-45 • N, with a mean difference of around 1 km (Fig. 9f).
Although the near-global mean bias is significantly larger than ERA5 and MERRA-2, the estimations over Europe and the polar regions seem to be more in line with the observations, with mean values of 174.99 and 93.84 m, respectively (Fig. 9b, i).
The mean bias of NCEP-2 is larger than that of JRA-55, with a mean value of 420.86 m, as illustrated by Fig. 10. The distribution results are similar to JRA-55, except for Europe and Australia, where the bias is about twice that of JRA-55.
In general, the comparison analysis of the daytime BLH results between soundings and four reanalysis datasets indicates that ERA5 reanalysis produces the BLH that is closest to the high-resolution soundings. Interestingly, MERRA-2 can provide a good spatial distribution of BLH. JRA-55 and NCEP-2 can only give a good prediction over some regions, most of which tend to produce much larger BLH estimates compared to those from ERA5 and MERRA-2.

Potential sources for the bias between reanalysis products and radiosonde
The possible sources for the difference between radiosonde and reanalysis could be rather complicated. From the spatial pattern of BLH discrepancy results between radiosonde and reanalysis (Figs. 7-10), we can notice that the regions with large differences tend to be observed over regions with high elevation, such as the TP in China and Rocky Mountains in the United States. These regions generally have much more complex orography. Coincidently, the soundings over the above-mentioned two regions are all obtained from the afternoon, in which the PBL develops to the maximum (Fig. 4). As expected, the highest biases generally are accompanied by peak BLHs, which has also been confirmed in our previous studies (cf. Fig. 2c in Li et al., 2017). Therefore, the biases depend on the LST when the weather balloon is launched, which at least could not be ruled out.
In addition, the large differences primarily appear at low and middle latitudes where thermal convection frequently occurs. Therefore, it is reasonable to infer that static stability could exert an influence on the comparison results. Then, we will analyze the probable influences from terrain and static stability on BLH differences.
We evaluate the influence from the orographic complexity around the sounding station and calculate the standard derivation (SD) of elevation within a 1 • × 1 • grid, with the help of a 30 arcsec digital elevation model (DEM) dataset. Terrain is complex over western China and the western United States where most of the soundings are released in the afternoon, and large BLH biases are usually found. Therefore, for all soundings that are launched during the time period spanning from 13:00 LST to 18:00 LST, we analyze the relationship between BLH biases and the SD of the DEM (Fig. 11). It follows that the influence from the orography appears instrumental given the correlation coefficient varying from −0.84 to −0.95. Furthermore, the errors or uncer-  tainties in ERA5 are less easily impacted by the orographic complexity given a much flatter fitted line (Fig. 11a).
Based on the correlation between orographic complexity (manifested by the SD of the DEM) and the bias of a reanalysis relative to radiosonde measurements, it is likely that the performances of MERRA-2, JRA-55, and NCEP-2 might be restricted by the complex underlying terrains. One of the reasons could be because global reanalysis with coarse resolution cannot resolve the sub-grid processes due to topography. However, ERA5 appears to be less dependent on terrain. In other words, the models used in ERA5 show sufficient ca- pability and excellent performance in reproducing the atmospheres, particularly in the PBL over complex terrains.
Lower tropospheric stability (LTS) is an indicator to describe the thermodynamic state of the lower atmosphere and is defined by the differences in potential temperature at 700 and 1000 hPa . Typically, the smaller the LTS is, the more unstable the low troposphere will be. The mean LTS over each station is defined by the ensemble mean by four reanalysis datasets, and its spatial distribution is depicted in Fig. 12. The lower troposphere over the western United States and western China is more unstable compared to the rest of the world, with LTS of around 6 K (Fig. 11a), Figure 10. Similar to Fig. 7 but for the differences between NCEP-2-derived BLHs and radiosonde-determined BLHs.
which is likely associated with the afternoon launch time of weather balloons. According to the correlation between the bias of BLH and the mean LTS, it is clear that the underestimation in BLH by JRA-55 and NCEP-2 products is negatively correlated with LTS, with correlation coefficients of 0.32 and 0.36 (Fig. 12b).
Besides the LTS, the role of lifted index could be another influential factor. The lifted index is a predictor of latent instability (Galway, 1956), and it is defined as the temperature difference between the environment temperature and an air parcel lifted adiabatically at 500 hPa. The index is computed by the air temperature, RH, and pressure profiles from ra- Figure 11. Density plots of the differences of BLHs between radiosonde and ERA5 (a), MERRA-2 (b), JRA-55 (c), and NCEP-2 (d) as a function of the SD of the DEM, for which the black lines denote the least-squares regression line. The box-and-whisker plots of the anomalies in BLH in five evenly intervals are overlaid in each panel, and the correlation coefficients are marked in the upper right corner of each panel. Note that all samples are collected from soundings that are launched in the afternoon, spanning from 13:00 to 18:00 LST. diosondes. We calculate the percentage of the negative lifted index above each station, which represents the occurrence rate of latent instability that exists in the daytime (Fig. 12c). The stations with high probability of strong instability, by the occurrence rate of negative lifted index (P (lifted index)<0), are predominantly dispersed over the western United States, the west and south of China, and the Pacific Ocean, reaching a percentage as high as around 70 %. These stations are regularly overlapped with great biases in the reanalysis products as shown in Figs. 7-10. According to the analysis, it is clear that all four reanalysis products are positively associated with P (lifted index<0), with correlation coefficients ranging from −0.34 to −0.47 (Fig. 12d). The positive (negative) correlation coefficients in lifted index suggests that the underestimation by reanalysis might be associated with the instable activity in the lower troposphere that has not been adequately represented or simulated by the models used in reanalyses. In light of the surface heating during the day and the growth of the PBL due to air ascent, it is also inferred that afternoon BLHs suffer the greatest errors if this is caused by inadequate air mixing within the free troposphere in models.

Conclusions and summary
A climatology of near-global BLH from high-resolution radiosonde measurements has been described for the daytime BLH. The high-resolution radiosonde data have a much finer spatial resolution of 5 or 10 m, compared to that by IGRA, and can establish a finer and more precise structure of the PBL. In addition, direct comparisons among four wellestablished reanalysis model products have been conducted. The present study adopts over 300 sounding stations with a high resolution, spanning from 2012 to 2019, to investigate the climatological variation in near-global BLH in the daytime and evaluates four model products at the radiosonde sampling.
Notable spatial variation can be observed in the climatological mean of BLH at 00:00 and 12:00 UTC. In the afternoon, the regions over the western United States and western China have the largest BLHs with values as high as 1.7 km, whereas 00:00 and 12:00 UTC compare generally well to earlier times of day in the rest of the world with thus a lower BLH. In addition, BLHs at middle and low latitudes are larger than those at high latitudes. The T 2 m , RH, and the normalized sensible and latent heat flux are good predictors for the spatiotemporal evolution of BLH. The most important result is that we found that all the four reanalysis products generally underestimate the daytime BLH, with a near-global mean varying from around 132 m to 420 m. The largest bias in reanalysis appears over the western United States and western China, where the boundary layers grow vigorously in the afternoon. ERA5 and MERRA-2 definitely have better performance than JRA-55 and NCEP-2 in terms of the magnitude of BLH and a higher correlation coefficient with the soundings. The newest version of reanalysis, ERA5, has the smallest bias and the highest positive correlation relative to radiosondes. The underestimation by NCEP-2 and JRA-55 is robust over some regions, for instance, western China and the western United States, with differences even exceeding 800 m. However, all products can obtain a precise estimate over some regions, for instance Europe, the eastern United States, and the polar regions, probably due to morning LST soundings and smaller daytime PBL development. The BLH over the Pacific Ocean is underestimated in all seasons and by all products. The underestimation tends to have a seasonal dependence; i.e., the warm season has a larger underestimation. However, BLH is moderately overestimated by the packaged BLH parameter in MERRA-2 possibly due to different BLH-deriving methods used.
We investigated two possible sources contributing to the biases, including topography and static stability. The analysis shows that the DEM spread does have a negative correlation with the bias, suggesting that the reanalysis data cannot provide a reliable simulation result under complex terrain conditions. In addition, reanalysis BLH errors tend to be negatively correlated with the occurrence rate of unstable air, suggesting that the reanalyses do not accurately determine BLH when the lower troposphere is unstable.
Although this study suffers from the inhomogeneous distribution of the radiosonde sites, the climatological BLHs at the near-global scale can help us understand the variation characteristics of BLH in different regions and for different LSTs. For the first time, we present near-global BLH estimates from high-resolution radiosondes and further conduct a comprehensive comparison of BLH products for four widely used reanalysis datasets using the BLHs derived from the soundings. The findings provide insights into the limitations of reanalysis data and, more importantly, are expected to greatly benefit future research works related to applications of different kinds of reanalysis data in the future.
Author contributions. JG conceptualized this study. JG and JZ carried out the analysis with comments from other co-authors. JG and JZ wrote the paper. KY, AS, HL, SY, and TS provided useful suggestions for the study. JS contributed to the big data analysis. All authors contributed to the improvement of paper.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.