Accurate estimation of wind speed at wind turbine hub
height is of significance for wind energy assessment and exploitation.
Nevertheless, the traditional power law method (PLM) generally estimates the
hub-height wind speed by assuming a constant exponent between surface and
hub-height wind speed. This inevitably leads to significant uncertainties in
estimating the wind speed profile especially under unstable conditions. To
minimize the uncertainties, we here use a machine learning algorithm known
as random forest (RF) to estimate the wind speed at hub heights such as at
120 m (WS
With the rapid economic development of the world, the massive consumption of fossil fuels produces an increasing emission of carbon dioxide, sulfur dioxide, and other pollutants (Yuan, 2016; Magazzino et al., 2021). To tackle this problem, it is increasingly becoming imperative to develop renewable clean energy (Hong et al., 2012; Luo et al., 2022). Among the myriad renewable energy resources, wind energy has gained more and more favor because of its abundant availability, good sustainability, and high cost-effectiveness (Li et al., 2018; Leung et al., 2012). As one of the largest energy consuming countries in the world, China is currently facing an increasingly serious energy and climate situation (Khatib et al., 2012). The Chinese government proposes to peak its carbon dioxide emissions before 2030 and achieve carbon neutrality before 2060 (Shi et al., 2023; Su et al., 2022a, b). With the stimulus of policies and the favor of investors, the wind power industry in China is flourishing. Therefore, the scientific assessment of wind energy resources in China is of great importance for the healthy development of the wind energy industry in the decades to come.
Characterizing the wind speed at wind turbine hub height is key for wind energy assessment (Yu and Vautard, 2022). The wind turbine is usually installed at the top of the wind mast with a height of 100–120 m above ground level (a.g.l.), which roughly corresponds to the surface layer (Veers et al., 2019). The wind speed data that have been widely used for wind energy assessment are mainly obtained from wind mast, Doppler lidar, or reanalysis data (Debnath et al., 2021; Lolli et al., 2011; Lolli, 2021). The 10 m wind data measured by ground meteorological stations can be used for wind energy assessment (Oh et al., 2012; Liu et al., 2019). The wind tower or mast can also provide wind speed observation data below 100 m a.g.l. (Durisic et al., 2012; J. Liu et al., 2018). Moreover, the reanalysis data, such as the fifth generation European Centre for Medium-Range Weather Forecasts atmospheric reanalysis system (ERA5), can provide the hourly wind speed at a height of 10 or 100 m a.g.l. for wind energy assessment (Laurila et al., 2021; Gualtieri, 2021). However, the wind turbines are increasing in height and rotor diameter with the development of technology, which go beyond the surface layer and enter the Ekman layer. Such as for some offshore wind power plants, the blade tips of the largest wind turbines can reach heights of 250 m a.g.l. (Gaertner et al., 2020). In addition, increasing wind turbine hub height reduces the impact of surface friction, enabling wind turbines to operate in high-quality wind resource environments (Veers et al., 2019). Therefore, the wind profile is important for the selection of wind turbine hub height and the assessment of wind energy.
It is widely recognized that the wind profile is mainly obtained by
empirical formulae (Li et al., 2018), such as the power law method (PLM).
The PLM generally assumes that the wind speed below 150 m in the
planetary boundary layer (PBL) varies exponentially with height
(Hellman et al., 1914). This means that the wind speed at the wind turbine hub height can
be calculated from the surface wind speed based on a constant power law
exponent (
With the development of machine learning (ML) technology, ML algorithms have been widely used in the field of wind speed and wind power prediction (Magazzino et al., 2021). Chi et al. (2015) compared two wind speed-forecasting mechanisms in China based on linear regression and support vector machine algorithms. They find that ML algorithms have better accuracy in solving the nonlinear problem. Lahouar and Slama (2017) use several meteorological factors to forecast wind power based on a random forest (RF) model. The results indicate that, compared with physical and statistical approaches, the ML model can achieve better accuracy when coping with problems that cannot be analytically defined. Therefore, it is worth trying to use ML algorithms to retrieve the wind speed at wind turbine hub height from available observations.
Given the abovementioned problems, we attempt to use a ML algorithm known as RF to retrieve wind speed at wind turbine hub height from a radar wind profiler (RWP) and surface synoptic observations. An RF model has been trained based on the surface in situ wind speed, upper-height RWP wind speed, and corresponding surface meteorological data from May 2018 to August 2020. The performances of the classical PLM model and the RF model are then compared. Next, the wind speeds from the RF model are used to evaluate the wind power. The results of our study can provide useful information for the development of the wind energy industry in coastal China. The observational data are introduced in Sect. 2. The RF model construction and wind energy evaluation method are displayed in Sect. 3. Section 4 discusses the accuracy of the RF model and the variation in wind energy resources. A summary of results is presented in Sect. 5.
The RWP is a ground-based remote sensing device that is used to measure the
atmospheric wind profiles from the surface to 5–8 km a.g.l. (B. Liu et al., 2019;
Guo et al., 2021a). It has high and low detection modes in the vertical
direction, and their corresponding vertical resolutions are 120 and 60 m,
respectively (Liu et al., 2020; Chen et al., 2023). Nevertheless, the wind
profiles near the ground surface, especially those below 300 m a.g.l., are
usually highly uncertain due to the influence of the ground and intermittent
clutter (May and Strauch, 1998; Allabakash et al., 2019). Therefore, there
exists a large data gap between ground surface and the lowest measurement
height provided by the RWP. Here, the RWP data are obtained at Qingdao
(36.33
The wind cup anemometer can measure the instantaneous wind speed and is
installed at 10 m a.g.l. (Mo et al., 2015). The sensing part of the wind cup
anemometer is composed of three or four conical or hemispherical empty cups.
It can provide surface wind data with an error of less than 10 % (Zhang et
al., 2020). This device is also installed at Qingdao station. Here, the 10 m wind speed (WS
The radiosonde (RS) provides the vertical profiles of wind speed and wind
direction at 5–8 m intervals (Guo et al., 2020). The accuracy of RS wind
speed is within 0.1 m s
The ERA5 is the reanalysis data combining model data and observations, which
provides global hourly estimates of atmospheric variables (Hoffmann et al.,
2019). The horizontal resolution can reach
The schematic diagram of surface-layer wind profile observations is shown in
Fig. 2. The wind mast or tower can provide wind speed data below 100 m a.g.l.
(Durisic et al., 2012; J. Liu et al., 2018). The RWP can measure the wind
profiles from 300 m to a height of 5–8 km a.g.l. (B. Liu et al., 2019). It
leads to a gap (100 to 300 m) in the observations of the wind profile. At
present, the PLM is most often applied to extrapolate the surface
wind speed to the wind turbine hub height, such as wind speed at 120 m (WS
The schematic diagram of surface-layer wind profile observations. The photos are provided by Baidu (© Baidu).
The PLM was proposed by Hellman et al. (1914). It assumes that the
wind speed below 150 m in the PBL varies exponentially with height. As a
result, the wind speed at wind turbine hub height is typically estimated
using the following formula (Abbes et al., 2012):
RF is an ensemble ML method which has been widely used in regressive
calculations (Breiman, 2001). It is a method to integrate many decision trees
into forests and predict the result. A schematic diagram of RF is shown in
Fig. S1. RF is composed of many decision trees, and each decision tree
is irrelevant. The performance of RF is determined by the aggregation of the
results of all the trees (Ma et al., 2021). For the RF model, the number of
trees (
In the construction of the RF model, it is necessary to obtain the relevant
variables that may affect the surface wind profile according to the physical
mechanism and previous research. At present, the PLM is often used to
calculate the wind speed at hub height. It confirms that the wind speed at
hub height is related to the wind speed at other heights (Durisic et al.,
2012; Li et al., 2018). Therefore, WS
To estimate WS
Importance analysis of inputs for the RF model at
The RF algorithm requires the
The accuracy and generalization of the RF model depend on training and testing samples (Ma et al., 2021). However, the training and testing samples are obtained at 08:00 and 20:00 LST. It needs to be discussed whether the RF model also applies to other times. This depends on whether the RF model has enough generalization for the training samples and whether the inputs at other times have appeared in the training samples. Figures S3–S5 show the differences between estimated wind speed and observed wind speed of the three RF models, which are a function of the inputs. For the three RF models, the deviations are relatively stable and do not change with the increase in inputs. It indicates that the three RF models have good generalization for the training and testing samples. This is because RF tends to increase random disturbance in the sample space, parameter space, and model space, thereby reducing the impact of “cases” and improving the generalization ability (Breiman, 2001). Moreover, Fig. S6 shows the distribution of inputs at different times. The dashed red lines represent the maximum and minimum values of each variable in training samples. In the range of the red line, the three RF models can provide stable output due to its good generalization ability. It can be found that almost all the inputs have appeared in training samples. Therefore, the three RF models have sufficient generalization and can be used at other times.
For the wind speed at hub height, a series of indicators have been used to evaluate wind energy, such as Weibull distribution and wind power density (WPD) (Pishgar-Komleh et al., 2015). These parameters are commonly used to evaluate the wind energy at a certain station (Fagbenle et al., 2011; J. Liu et al., 2018).
The Weibull distribution can calculate the cumulative probability
WPD is the wind energy per unit area that the airflow passes vertically
in unit time and generally takes the following form (Akpinar et al., 2005):
Figure 4 shows the wind profile from different methods at different times.
The red, black, and blue lines represent the mean wind speed from RS, the PLM, and
RF, respectively. For the PLM, the retrieved results below 80 m a.g.l. are
consistent with the RS observations. Gryning et al. (2007) also pointed out
that the wind profile based on surface-layer theory is valid up to a height
of 50–80 m. Above 80 m a.g.l., the wind speeds retrieved by the PLM deviate from
the RS observations. This deviation is increasing with the height. The
comparison results between the PLM and RS at 120, 160, and 200 m a.g.l. (Fig. 5)
also confirmed it. This is due to the fact that above the surface layer, the
Coriolis force, baroclinity, and wind shear increase the complexity of the
wind profile (Brümmer, 1991). Moreover, most of estimated results from the
PLM are underestimated when the observed wind speed is high, especially at
200 m a.g.l. The reason is that the surface wind profile is affected by
turbulence, surface friction, and other factors (Tieleman, 2021; Solanki et
al., 2022). The turbulence caused by an inhomogeneous underlying surface can
change the wind direction and reduce the horizontal wind speed (Coleman et
al., 2021). Especially in coastal areas, the sea–land interaction and
complex surface types make the variations in near-surface wind profiles more
complex. The simple exponential relationship is unable to obtain the surface
wind profile with high accuracy, especially at high-wind-speed conditions. By
comparison, WS
Vertical profiles of the wind speed from different
methods at
In addition, for both the PLM and RF, the retrieved wind profile at 20:00 LST is
closer to the RS observations. The comparisons between the observed wind
speed and the estimated wind speed for the PLM and RS at different times are
shown in Fig. S7. The fitting results of the PLM and RF at 20:00 LST are slightly
higher than those at 08:00 LST. It indicates that the performance of the PLM and
RF vary with the hour of the day. This is because the wind profile depends not
only on the surface friction but also on the atmospheric stratification
(Gryning et al., 2007). The surface layer is in an unstable stratification
due to heat transfer caused by solar radiation during daytime, while the
surface layer tends to stabilize stratification due to surface radiation cooling
during nighttime (Yu et al., 2022; Solanki et al., 2022). WS
Comparisons between observed wind speed and estimated
wind speed for
Figure 6 shows the comparisons between the observed results and the
estimated results for the PLM and RF in different seasons. The red, green,
blue, and black represent the spring, summer, autumn, and winter,
respectively. At three heights, the performance of the PLM is the best in winter
and the worst in summer. It shows that the performance of the PLM is affected by
seasonal factors, which is due to the wind shear varying dramatically with
the season (Banuelos-Ruedas et al., 2010). Pérez et al. (2005) indicate
that the surface-layer wind speed profile is mainly affected by the
convection produced by surface heating in summer. WS
Comparisons between observed wind speed and estimated
wind speed for
Figure 7 shows the diurnal and seasonal variations in WS
Monthly and diurnal cycles of
The histograms of WS
Statistics for the Weibull distribution of WS
Probability distribution and Weibull distribution of
Figure 9 shows the diurnal variations in WPD from the PLM and RF at 120, 160, and 200 m a.g.l. The solid and dotted red lines represent the variation in WPD from RF and the PLM, respectively. The gray bar represents the APE of WPD between RF and the PLM. The diurnal pattern of WPD from RF is like that from the PLM. At three heights, the hourly mean WPD is larger during daytime from 09:00 to 16:00 LST with a peak at 14:00 LST and is lower at nighttime from 00:00 to 04:00 LST. On the contrary, the APE is lower during daytime (08:00 to 18:00 LST) and larger at nighttime (20:00 to 06:00 LST). At 120 m, the mean APEs during daytime and nighttime are 14.09 % and 35.80 %, respectively. Considering that the results from RF are underestimated at high-wind-speed conditions, the APE of WPD between the PLM and actual observations during daytime should be slightly greater than 14.09 %. Moreover, the diurnal variations in APE at 160 and 200 m a.g.l. generally resemble the features obtained at 120 m a.g.l. But the APE of WPD between RF and the PLM increases with the height. These results indicate that the PLM is more suitable for wind energy assessment in the daytime, and the error in wind energy assessment based on the PLM is gradually increased as the height increases.
Diurnal variation in the wind power density (WPD) at
Figure 10 shows the monthly variations in WPD from the PLM and RF at 120, 160, and 200 m a.g.l. The monthly variation in WPD from RF is also similar to that from the PLM. The monthly WPD is relatively high for the period from March to May, as compared to the lower values from June to October. At 120 m, the APE is largest in summer and is lowest in winter. The seasonal APEs during spring, summer, autumn, and winter are 23.65 %, 40.83 %, 19.67 %, and 12.62 %, respectively. The monthly variations in APE at 160 and 200 m are consistent with that at 120 m. It indicates that the PLM is more suitable for wind energy assessment in autumn and winter. In addition, the APEs during spring at 120, 160, and 200 m are 23.65 %, 28.12 %, and 34.22 %, respectively. Due to the performance of the RF model being the worst in spring, the APE of WPD between the PLM and the real value during spring may increase. Jung et al. (2021) also find that the global median absolute percentage error in the wind energy estimations is 36.9 % assuming the power law exponent is 0.14. Overall, the PLM has some limitations in wind energy assessment above 100 m. When using the PLM to evaluate wind energy at a greater height, it is necessary to pay attention to its errors. Moreover, the use of an RF model that takes the factors such as surface friction, heat transfer, and upper-height wind speed constraints into account is suggested to evaluate wind energy.
Similar to Fig. 9 but for the monthly variation.
The traditional methods such as the PLM used to estimate wind speed at hub
height generally assume a constant exponent
The comparison against observations indicates that WS
Our work provides a new pathway to fill the data gap of wind speed at the hub height for the high capability of the state-of-the-art ML algorithm, which lays a solid foundation for more robust wind energy assessments. However, the high-precision wind profile estimate is only one part of the efficient utilization of wind energy resources. The cost of wind turbines, topography conditions, and other factors also need more attention, which deserves further investigation in the future.
The output data and codes used in this paper can be provided for
non-commercial research purposes upon reasonable request (Jianping Guo,
email: jpguocams@gmail.com). The anemometer data can be downloaded from
The supplement related to this article is available online at:
The study was completed with cooperation between all authors. JG and BL designed the research framework. BL and JG conducted the experiment and wrote the paper. XM, HL, SJ, YM, and WG analyzed the experimental results and helped work on the manuscript.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research has been supported by the National Natural Science Foundation of China (grant no. 42001291), the Fundamental Research Funds for the Central Universities (grant no. 2042022kf1003), and the Open Grants of the State Key Laboratory of Severe Weather (grant no. 2021LASW-B09).
This paper was edited by Dantong Liu and reviewed by three anonymous referees.