Developing a novel hybrid model for the estimation of surface 8h ozone ( O 3 ) across the 1 remote Tibetan Plateau during 2005-2018

13 We developed a two-stage model named random forest-generalized additive model (RF-GAM) 14 based on satellite data, meteorological factors, and other geographical covariates to predict the 15 surface 8-h O3 concentration across the remote Tibetan Plateau. The 10-fold cross-validation result 16 suggested that RF-GAM showed the excellent performance with the highest R value (0.76) and 17 lowest root mean square error (RMSE) (14.41 μg/m) compared with other seven machine learning 18 models. The predictive performance of RF-GAM model showed significantly seasonal discrepency 19 with the highest R value observed in summer (0.74), followed by winter (0.69) and autumn (0.67), 20 and the lowest one in spring (0.64). Additionally, the unlearning ground-observed O3 data collected 21 from open websites were applied to test the transferring ability of the novel model, and confirmed 22 https://doi.org/10.5194/acp-2019-972 Preprint. Discussion started: 14 January 2020 c © Author(s) 2020. CC BY 4.0 License.

threat to vegetation growth (Emberson et al., 2018;Feng et al., 2015Feng et al., , 2019Qian et al., 2018). Moreover, the tropospheric O 3 can perturb the radiative energy budget of the earth-atmosphere system, as it is the third most important greenhouse gas next to carbon dioxide (CO 2 ) and methane (CH 4 ), thereby changing the global climate (Bornman et al., 2019;Fu et al., 2019;Wang et al., 2019). Recently, the particulate matter with a concentration of less than 2.5 µm (PM 2.5 ) showed a persistent decrease, while the O 3 issue has been increasingly prominent in China (Li et al., 2017b(Li et al., , 2019b. Therefore, it was critical to accurately reveal the spatiotemporal variation in O 3 pollution and assess its heath risk in China. A growing body of studies began to investigate the spatiotemporal variation in the O 3 level worldwide. Wang et al. (2014b) demonstrated that the 8 h O 3 concentrations in nearly all of the provincial cities experienced remarkable increases during 2013-2014. Following this work, Li et al. (2017b) reported that the annual mean O 3 concentration over China increased by 9.18 % during 2014-2016. In other Asian countries except China, Vellingiri et al. (2015) performed long-term observation and found that the O 3 concentration in Seoul, South Korea, has displayed a gradual increase in recent decades. In the southeastern United States,  observed that the surface O 3 concentration has displayed a gradual decrease in the past 10 years. Although the number of ground-level monitoring sites has been increasing globally, the limited monitoring sites still cannot accurately reflect the fine-scale O 3 pollution status because each site shows little spatial representativeness (0.25-16.25 km 2 ) (Shi et al., 2018). Furthermore, the number of monitoring sites in many countries (e.g. China and the United States) displays an uneven distribution characteristic at the spatial scale. In China, most of these sites are concentrated in the North China Plain (NCP) and Yangtze River Delta (YRD), while western China has an extreme lack of groundlevel O 3 data, which often increases the uncertainty of health assessment. Therefore, many studies used various models to estimate the O 3 concentrations without monitoring sites. Chemical transport models (CTMs) were often considered to be the typical methods to predict the surface O 3 level. Zhang et al. (2011) employed the GEOS-Chem model to simulate the surface O 3 concentration over the United States, suggesting that the model could capture the spatiotemporal variation in surface O 3 concentration at a large spatial scale. Later on, Wang et al. (2016) developed a hybrid model called land use regression (LUR) coupled with CTMs to predict the surface O 3 concentration in the Los Angeles Basin, California. In recent years, these methods were also applied to estimate the surface O 3 level over China. Liu et al. (2018) used the Community Multiscale Air Quality (CMAQ) model to simulate the nationwide O 3 concentration over China in 2015. Nonetheless, the high-resolution O 3 prediction using CTMs might have widely deviated from the measured value, owing to the imperfect knowledge about the chemical mechanism and the higher uncertainty of the emission inventory. Moreover, the continuous emission data of NO x and VOCs were not always open access, which restricted the long-term estimation of the surface O 3 concentration using CTMs.
Fortunately, the daily satellite data enable the fine-scale estimations of the O 3 level at a regional scale due to broad spatial coverage and high temporal resolution (McPeters et al., 2015). Shen et al. (2019) confirmed that the satelliteretrieved O 3 column amount could accurately reflect the spatiotemporal distribution of the surface O 3 level. Therefore, some studies tried to use traditional statistical models coupled with high-resolution satellite data to estimate the ambient O 3 level. Fioletov et al. (2002) used the satellite measurement to investigate the global distribution of O 3 concentrations based on a simple linear model. Recently, Kim et al. (2018) employed the integrated empirical geographic regression method to predict the long-term  variation in ambient O 3 concentration over the United States based on O 3 column amount data. Although the statistical modelling of ambient O 3 concentration is widespread around the world, most of the traditional statistical modelling only utilised the linear model to predict the ambient O 3 concentration, which generally decreased the prediction performance because the nonlinearity and high-order interactions between O 3 and predictors cannot be managed by a simple linear model.
As an extension of traditional statistical model, machinelearning methods have been widely applied to estimate the pollutant levels in recent years because of their excellent predictive performances. Among these machine-learning algorithms, decision tree models such as random forest (RF) and extreme gradient boosting (XGBoost) generally showed fast training speed and excellent prediction accuracy (Li et al., 2020;Zhan et al., 2018). Furthermore, decision tree models can obtain the contribution of each predictor to air pollutants, which was beneficial to the parameter adaption and model optimisation. Chen et al. (2018b) has firstly employed the RF model to simulate the PM 2.5 level in China since 2005. Following this work, we recently used the XGBoost model to estimate the 8 h O 3 concentration on the island of Hainan for the first time and captured the moderate predictive performance (R 2 = 0.59) (Li et al., 2020). While the decision tree model showed many advantages in predicting the pollutant level, the spatiotemporal autocorrelation of pollutant concentration was not a concern in these studies. Li et al. (2019a) confirmed that the prediction error by the decision tree model varied greatly with space and time. Thus, it is imperative to incorporate the spatiotemporal variables into the original model to further improve the performance. To resolve the defects of decision tree models, Zhan et al. (2018) developed a hybrid model called RF-spatiotemporal Kriging (STK) to predict the O 3 concentration over China and achieved better performance (overall -R 2 = 0.69; southwestern China -R 2 = 0.66). Unfortunately, the RF-STK model still showed some weaknesses in predicting O 3 concentration. First of all, the predictive performance of the STK model was strongly dependent on the number of monitoring sites and their spatial densities. The model often showed worse predictive performance in regions with few monitoring sites (Gao et al., 2016). Moreover, the ensemble model cannot simulate the O 3 level during the periods without ground-level-measured data. In contrast, the generalised additive model (GAM) not only considers the time autocorrelation of O 3 concentration but also shows better extrapolation ability (Chen et al., 2018a;Ma et al., 2015). Thus, the ensemble model of RF and GAM was proposed to predict the spatiotemporal variation in the surface 8 h O 3 concentration.
The Tibetan Plateau, the highest plateau in the world, shows higher surface solar radiation compared with the regions outside the plateau. It was well documented that high solar radiation tended to generate a large amount of the OH radical, resulting in the O 3 formation via the reaction of VOC and the OH radical (Ou et al., 2015). While the total O 3 column amount on the Tibetan Plateau has displayed a slight decrease since the 1990s, the convergent airflow formed by subtropical anticyclones could bring ozone-rich air surrounding the plateau to the low atmosphere (Lin et al., 2008), thereby leading to a higher surface O 3 concentration over the plateau. Most studies focused on the stratosphere-troposphere transport of O 3 on the Tibetan Plateau, whereas limited effort was given to investigating the ground-level O 3 level over this region. To date, only several studies were concerned with the spatiotemporal variation in the surface O 3 concentration in this region based on field-observation data (Chen et al., 2019;Shen et al., 2014;Yin et al., 2017b). Unfortunately, the few monitoring sites on the Tibetan Plateau cannot capture the real O 3 pollution status, especially in remote areas (e.g. the northern part of the Tibetan Plateau), because each site only possessed limited spatial representativeness. Apart from these field measurements, Liu et al. (2018) (R = 0.60) and Zhan et al. (2018) (R 2 = 0.66) used CTMs and the machinelearning model to simulate the surface O 3 concentration over China in 2015, respectively. Both of these studies included the predicted O 3 level on the Tibetan Plateau. Although they finished the pioneering work, the predictive performances of both studies were not excellent. Therefore, it was imperative to develop a higher-quality model to enhance the modelling accuracy.
Here, we developed a new hybrid-method (RF-GAM) model integrating satellite data, meteorological factors, and geographical variables to simulate the gridded 8 h O 3 concentrations over the Tibetan Plateau for the first time. Based on the estimated surface O 3 concentration, we clarified the long-term variation (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018) of the surface O 3 concentration and quantified the key factors for the annual trend. Filling the gap of statistical estimation of the 8 h O 3 level in a remote region, this study provides useful datasets for epidemiological studies and air quality management.

Study area
The Tibetan Plateau is located in southwestern China, which ranges from 26.00 to 39.58 • N and from 73.33 to 104.78 • E. The Tibetan Plateau is surrounded by the Taklamakan Desert to the north and Sichuan Basin to the southeast. The land area of the Tibetan Plateau reaches 2.50×10 6 km 2 (Chan et al., 2006). Based on the air circulation pattern, the Tibetan Plateau can be roughly classified into the monsooninfluenced region and the westerly-wind-influenced region (Wang et al., 2014a). The annual mean air temperature in most regions is below 0 • C. The annual mean rainfall amount on the Tibetan Plateau ranges from 50 to 2000 mm. The terrain conditions are complex, and higher altitudes are concentrated in the central region. The Tibetan Plateau is generally treated as a remote region lacking in anthropogenic activity, and most of the residents are concentrated in the southeastern and southern parts of the Tibetan Plateau. The Tibetan Plateau consists of 19 prefecture-level cities, and their names and corresponding geographical locations are shown in Figs. 1 and S1.

Ground-level 8 h O 3 concentration
The daily 8 h O 3 data in 37 monitoring sites over the Tibetan Plateau from 13 May 2014 to 31 December 2018 were collected from the national air quality monitoring network. The O 3 levels in all of these sites were determined using an ultraviolet-spectrophotometry method. The highest 8 h moving average O 3 concentration each day was calculated as the daily 8 h O 3 level after data quality assurance. The data quality of all the monitoring sites was assured on the basis of the HJ 630-2011 specifications. The data with no more than two consecutive hourly measurements missing in all the days were treated as the valid data.

Satellite-retrieved O 3 column amount
The O 3 column amounts (DU: total molecules cm −2 ) during 2005-2018 were downloaded from the Ozone Monitoring Instrument O 3 (OMI O 3 ) level-3 data with a 0.25 • spatial resolution from the website of the National Aeronautics and Space Administration (NASA) (https://acdisc.gsfc.nasa. gov/data/Aura_OMI_Level3/OMDOAO3e.003/, last access: 19 May 2020). The OMI O 3 product shows global coverage and traverses the earth once a day. The O 3 column amount with a cloud radiance fraction >0.5, terrain reflectivity >30 %, and solar zenith angles >85 • should be removed. In addition, the cross-track pixels significantly influenced by the row anomaly should be deleted.

Meteorological data and geographical covariates
The daily meteorological data were obtained from ERA-Interim datasets with 0.125 • resolution. These meteorological data consisted of the 2 m dew-point temperature (d2m), 2 m temperature (t2m), 10 m U wind component (u10), 10 m V wind component (v10), boundary layer height (blh), sunshine duration (sund), surface pressure (sp), and total precipitation (tp). The 30 m resolution elevation data (DEM) were downloaded from the China Resource and Environmental Science Data Center (CRESDC). The data of the gross domestic product (GDP) and population density with 1 km resolution were also extracted from CRESDC. Population density and GDP in 2005, 2010, and 2015 were integrated into the model to predict the surface 8 h O 3 concentration over the Tibetan Plateau because these data were available every 5 years. Additionally, the land use data of 30 m resolution (e.g. water, grassland, urban, forest) were also extracted from CRESDC. Lastly, the latitude, longitude, and time were also incorporated into the model.
All of the explanatory variables collected were resampled to 0.25 • × 0.25 • grids to predict the O 3 level. The original meteorological data with 0.125 • resolution were resampled to the 0.25 • grid. The land use area, elevation, GDP, and population density in each grid were calculated using spatial clipping. Lastly, all of the predictors were integrated into an intact table to train the model.

Model development and assessment
The RF-GAM model was regarded as the hybrid model of RF and GAM. The RF-GAM model was a two-stage model in which the prediction error estimated by the RF model was then simulated by GAM. The prediction results of RF and GAM were summed as the final result of the RF-GAM model (Fig. 2). The detailed equation is as follows: where Z(s, t) is the estimated 8 h O 3 level at the location s and time t, P (s, t) represents the 8 h O 3 concentration predicted by the RF model, and E(s, t) denotes the prediction error by GAM. In the RF model, a large number of decision trees were planted based on the bootstrap sampling method. At each node of the decision tree, the random samples of all predictors were applied to determine the best split among them. Following the procedure, a simple majority vote was employed to predict the 8 h O 3 level. The RF model avoided a priori linear assumption of O 3 concentration and predictors, which was often not in good agreement with the actual state. The RF model has two key parameters, including n tree (the number of trees grown) and m try (the number of explanatory variables sampled for splitting at each node). The prediction performance of the RF model was strongly dependent on the two parameters. The optimal n tree and m try were determined based on the least out-of-bag (OOB) errors. Based on the iteration result, the optimal n tree and m try reached 500 and 5, respectively. Besides this, the backward variable selection method was performed on the RF submodel to achieve better performance. At each step of the predictor selection, the variable with the least important value was excluded from the next step. This one-variable-at-a-time exclusion method was repeated until only two explanatory variables remained in the submodel. Finally, all of the selected variables except the area of water were integrated into the model to achieve the best prediction performance. The detailed RF model is as where O 3 denotes the observed 8 h O 3 level in the monitoring site; the O 3 column represents the O 3 column amount in the corresponding grid; Elevation denotes the corresponding elevation of the site; and Agr, Urban, Forest, and Grassland are the agricultural land, urban land, forest land, and the grassland, respectively. Population represents the population density in the corresponding site. Prec, T , WS, P , t sun , and RH are precipitation, air temperature, wind speed, air pressure, sunshine duration, and relative humidity, respectively. Additionally, another five models, including the RF, generalised regression neutral network (GRNN), backward-propagation neural network (BPNN), Elman neural network (ElmanNN), and extreme learning machine (ELM), also used the backward variable selection method. The R 2 value was treated as an important parameter for adding or reducing the variable. The variable should be removed when the R 2 value of the submodel showed a remarkable decrease with the integration of this variable. Lastly, the optimal variable group was applied to establish the submodel. Following the RF submodel, the prediction error estimated by the RF submodel was further modelled by the GAM.
GAM could reflect the time autocorrelation of the predictive error of RF model, and thus the ensemble model of RF and GAM might decrease the modelling error of the one-stage model. All of the variables were incorporated into the models to establish the second-stage model, and the backward variable selection was also used to determine the optimal variable group.
The 10-fold cross-validation (CV) technique was employed to evaluate the predictive performances for all of the machine-learning models. All of the training datasets were randomly classified into 10 subsets uniformly. In each round of validation, nine subsets were used to train, and the remaining subset was applied to test the model performance. The process was repeated 10 times until every subset has been tested. Some statistical indicators, including the R 2 , rootmean-square error (RMSE), mean prediction error (MPE), relative percentage error (RPE), and the slope, were calculated to assess the model performance. The optimal model with the best performance was used to estimate the 8 h O 3 concentration in recent decades.     ). Besides this, the slope of the RF-GAM model was closer to 1 compared with other models. It was well documented that the RF model generally showed better performance than other models because this method did not need to define complex relationships between the explanatory variables and the O 3 concentration (e.g. linear or nonlinear). Furthermore, the variable importance indicators calculated by the RF model can help the user to distinguish the key variables from the noise ones and make full use of the strength of each predictor to assure the model robustness. Although BPNN, GRNN, XGBoost, ElmanNN, and ELM have been widely applied to estimate the air pollutant concentrations (Chen et al., 2018c;Zhu et al., 2019), these methods suffered from some weaknesses in predicting the pollutant level. For instance, both the BPNN and ElmanNN models could capture the locally optimal solution when the training subsets were integrated into the final model, which decreased the predictive performance of the model . Moreover, BPNN generally showed slow training speed, especially with the huge training subsets (Li and Park, 2009;. ELM often consumed more computing resources and experienced the overfitting issue due to the increase in sampling size Shao et al., 2015). The GRNN method advanced the training speed compared with the BPNN model and avoided the locally optimal solution during the modelling process (Zang et al., 2019), whereas the predictive performance is still worse than that of the RF model. XGBoost was often considered to be robust in predicting the air pollutant level (Li et al., 2020), while the model did not display excellent performance in the present study. This might be attributable to the sampling size in the present study not being big enough because the model generally showed better performance with big samples.  (Gao et al., 2016;Li et al., 2017a). Overall, the ensemble RF-GAM model showed significant improvement in predictive performance. The performances of RF-GAM displayed slight differences for each year during 2014-2018. As shown in Table 1, the R 2 value showed the highest value (0.76) in 2016, followed by that in 2018 (0.75), 2017 (0.73), and 2015 (0.72), and showed the lowest one in 2014 (0.69). Both the RMSE and MPE exhibited the lowest values in 2014, while these parameters did not show significant variation during 2015-2018. The lowest R 2 value and the highest RPE were found in 2014 due to having the lowest sample size, while the highest R 2 value and lowest RPE in 2016 were due to the maximum sample size. Geng et al. (2018) found that the predictive performance of the machine-learning model was strongly dependent on the number of training samples and sampling frequency. The lower RMSE and MPE in 2014 might be attributable to the lack of measured O 3 data in spring, which decreased the higher value of O 3 concentration. The performances of the RF-GAM model in four seasons were also assessed by 10-fold cross validation ( Table 2). The predictive performance of the RF- GAM model showed significant seasonal differences, with the highest R 2 value observed in summer (0.74), followed by winter (0.69) and autumn (0.67), and the lowest one in spring (0.64). However, both the RMSE and MPE displayed different seasonal characteristics from the R 2 value. Both the RMSE and MPE for RF-GAM followed the order of spring (15.32 and 11.94 µg m −3 ) > summer (15.13 and 11.75 µg m −3 ) > winter (14.58 and 11.44 µg m −3 ) > autumn (13.23 and 10.52 µg m −3 ). The lowest R 2 value in spring might be caused by multiple O 3 sources and complicate O 3 formation mechanisms. On the one hand, the O 3 in spring might be generated from the local anthropogenic emission or long-range transport (Li et al., 2017b(Li et al., , 2019b. On the other hand, a strong stratosphere-troposphere exchange process due to the lower height of the troposphere on the Tibetan Plateau might lead to the higher O 3 concentration in spring (Skerlak et al., 2014). Unfortunately, both the longrange transport and stratosphere-troposphere exchange process were missing in the RF-GAM model, which restricted the accuracy of O 3 estimation in spring. The large estimation errors (e.g. RMSE, MPE, and RPE) in spring and summer were attributable to the high 8 h O 3 concentration in these seasons, while the low prediction error observed in autumn was due to the low O 3 level.
Apart from the seasonal variation, we also investigated the spatial variabilities in the predictive accuracy for the RF-GAM model. The Tibetan Plateau was classified into five provinces, and then the predictive performance of RF-GAM model in each province was calculated. Among the five provinces, Gansu displayed the highest R 2 value (0.74), followed by Sichuan Province (0.71), Qinghai Province (0.70), the autonomous region of Tibet (0.69), and Yunnan Province (0.54) ( Table 3). The results shown herein were not in agreement with the previous studies by Geng et al. (2018), who confirmed that the predictive performance of the machinelearning model was positively associated with the sampling size. It was assumed that the spatial distribution of the sampling sites in Tibet was uneven and the sampling density was low, though Tibet possessed the highest number of monitoring sites of the provinces. The prediction errors (RMSE and MPE) did not exhibit the same characteristics as the R 2 value. The higher RMSE and MPE were found in the autonomous region of Tibet (14.81 and 11.24 µg m −3 ) and Qinghai Province (14.83 and 11.33 µg m −3 ) due to the higher values of blh and sund. The lowest values of the RMSE and MPE could be observed in Yunnan Province, which was due to the higher rainfall amount. The highest RPE was found in Yunnan Province (25.85 %), followed by Tibet (22.90 %), Qinghai (22.65 %), and Sichuan (22.62 %), and the lowest one was found in Gansu Province (22.51 %), which might be linked with the sample size.
Although 10-fold cross-validation verified that the RF-GAM model showed better predictive performance in estimating the surface 8 h O 3 concentration, the test method cannot validate the transferring ability of the final model. The monitoring site on the Tibetan Plateau before May 2014 is very limited, and only the daily 8 h O 3 data in Lhasa from the open-access website (https://www.aqistudy.cn/historydata/daydata.php?city= %E6%8B%89%E8%90%A8&month=2013-12, last access: 19 May 2020) were available to compare with the simulated data. As depicted in Fig. 4, the R 2 value of the unlearning 8 h O 3 level against the predicted 8 h O 3 concentration reached 0.67, which was slightly lower than that of the 10-fold crossvalidation R 2 value. Overall, the extrapolation ability of the RF-GAM model is satisfactory, and thus it was assumed that the model could be applied to estimate the O 3 concentration in other years. Both the RMSE and MPE for the unlearning 8 h O 3 level against the predicted 8 h O 3 concentration were significantly higher than those of the 10-fold cross validation. It was assumed that Lhasa showed a higher surface 8 h O 3 concentration over the Tibetan Plateau.
To date, some previous studies also simulated the surface O 3 concentration on the Tibetan Plateau using statistical models (Zhan et al., 2018). For instance, Zhan et al. (2018) employed the RF-STK model to estimate the surface O 3 concentration over China and explained the 66 % spatial variability in the O 3 level on the Tibetan Plateau. Apart from these statistical models, some classical CTMs were also applied to estimate the O 3 concentration in remote areas. Both  and Lin et al. (2018) used CMAQ to estimate the O 3 level across China, while the R 2 values in most of cities were lower than 0.50. In terms of the predictive performance, the RF-GAM model in our study showed significant advantages compared with previous studies. It should be noted that our RF-GAM model could outperform most of current models, chiefly because of (1) accounting for the temporal autocorrelation of the surface O 3 concentration and (2)

Variable importance
The results of variable importance for key variables are depicted in Fig. 5  cannot be ignored in the present study because the predictive performance would worsen if these variables were excluded from the model. It was widely acknowledged that the anthropogenic emissions focused on developed urban areas with high population density, especially on the remote plateau (Zhang et al., 2007;Zheng et al., 2017). Compared with the urban land, the grassland played a more important role in the O 3 pollution on the Tibetan Plateau. It was thus assumed that the grassland was widely distributed on the Tibetan Plateau, which could release a large amount of biogenic volatile organic compounds (BVOCs) (Fang et al., 2015). It was well known that photochemical reactions of BVOCs and NO x in the presence of sunlight caused the O 3 formation (Calfapietra et al., 2013;Yu et al., 2006). Furthermore, Fang et al. (2015) confirmed that BVOC emission on the Tibetan Plateau displayed a remarkable increase in the wet seasons.  Fig. 6, most of the cities in Qinghai Province (e.g. Huangnan, Hainan, and Guoluo) generally showed a higher 8 h O 3 concentration over the Tibetan Plateau, which was in a good agreement with the spatial distribution of the O 3 column amount (Fig. S3). Besides this, some cities in Tibet, such as Shigatse and Lhasa, also showed higher 8 h O 3 levels. It was assumed that the precursor emissions in these regions were significantly higher than those in other cities of the Tibetan Plateau (Fig. S4). Zhang et al. (2007) used the satellite data to observe that the higher VOCs and NO x emission was concentrated in the residential areas with high population density on the remote Tibetan Plateau. Apart from the effect of anthropogenic emission, the meteorological conditions could also be important factors for the 8 h O 3 concentration. As shown in Figs. S5-S10, a higher blh and sp on the northeastern Tibetan Plateau might promote the O 3 formation through the reaction of the VOC and OH radical, leading to a higher 8 h O 3 concentration in these cities (Ou et al., 2015). In addition, a lower tp occurred on the northern Tibetan Plateau and the northeastern Tibetan Plateau, both of which were unfavourable to the ambient O 3 removal (Yoo et al., 2014). In contrast, the higher tp observed on the southeastern Tibetan Plateau resulted in slight O 3 pollution.    (Table S1), whereas it decreased from the peak to 65.87±8.52 µg m −3 during 2015-2018 (Fig. 7). Based on the Mann-Kendall method (Fig. 8a), it was found that the surface O 3 concentration exhibited a slight increase on the whole, while the degree of increase was not significant (p>0.05). Besides this, it should be noted that the O 3 concentrations in various regions showed different rates of increase. As depicted in Fig. 8b (Fig. 9 and Table 4). The 8 h O 3 concentrations in most of prefecture-level cities showed similar seasonal characteristics, with overall seasonal variation on the Tibetan Plateau. Based on the result summarised in Table S2, it was found that the key precursors of ambient O 3 generally displayed higher emissions in winter compared with other seasons. However, the seasonal distribution of ambient O 3 concentration was not in accordance with the precursor emissions, suggesting that the meteorological factors might play more important roles in ambient O 3 concentration. It was well known that the higher air temperatures in spring and summer were closely related to the low sp and high sund, both of which promoted O 3 formation (Sitnov et al., 2017). Although summer showed the highest air temperature and the longest sunshine duration, the higher rainfall amount in summer decreased the ambient O 3 concentration via wet deposition (Li et al., 2017a(Li et al., , 2019b. Moreover, the highest blh occurred in spring, which was favourable to the strong stratosphere-troposphere exchange process on the Tibetan Plateau (Skerlak et al., 2014). Therefore, the 8 h O 3 concentrations in summer and winter were lower than that in spring. Nonetheless, the 8 h O 3 levels in Diqing, Shannan, and Nyingchi displayed the highest values in spring (56.38 ± 7.87, 73.90 ± 5.97, and 73.22 ± 2.77 µg m −3 ), followed by winter (45.88 ± 7.05, 61.71 ± 4.32, and 62.24 ± 3.63 µg m −3 ) and summer (44.35 ± 5.90, 61.00 ± 5.86, and 59.60 ± 2.33 µg m −3 ), and the lowest ones in autumn (37.45 ± 5.76, 54.70 ± 3.13, and 53.84 ± 2.06 µg m −3 ). The lower O 3 level in summer than winter was mainly attributable to the higher precipitation observed in the summer of these cities (Fig. S11). In addition, it should be noted that the NO x and VOC emissions of the southern Tibetan Plateau (e.g. Shannan) exhibited higher values in winter compared with other seasons.

The nonattainment days over the Tibetan Plateau during 2005-2018
The annual mean nonattainment days in the 19 prefecturelevel cities over the Tibetan Plateau are summarised in Ta Fig. 10 and Table 5). Besides this, some cities on the southern Tibetan Plateau, such as Shigatse and Shannan, also experienced more than 40 nonattainment days each year, suggesting that the Tibetan Plateau still faced the risk of O 3 pollution. Fortunately, some remote cities, such as Ali, Ngari, and Qamdo, did not experience excessive O 3 pollution all the time, which was ascribed to low precursor emissions and appropriate meteorological conditions. It should be noted that the nonattainment days in regions with high O 3 concentration showed significant seasonal difference, whereas the seasonal difference was not remarkable in cities with low O 3 pollution. As shown in Table 2, it should be noted that nearly all of the nonattainment days could be detected in spring and summer, which was in good agreement with the O 3 levels in different seasons, indicating that the O 3 pollution issue should be given more attention in spring and summer. The determination of nonattainment days showed some uncertainties, owing to the predictive error of modelled O 3 concentration. First of all, meteorological data used in RF-GAM model were collected from reanalysis data, and these gridded data often showed some uncertainties, which could

Summary and implications
In the present study, we developed a novel hybrid model (RF-GAM) based on multiple explanatory variables to estimate the surface 8 h O 3 concentration across the remote Tibetan Plateau. The 10-fold cross-validation method demonstrated that RF-GAM achieved excellent performance, with the highest R 2 value (0.76) and lowest root-mean-square error (RMSE) (14.41 µg m −3 ), compared with other models, including the RF-STK, RF, BPNN, XGBoost, GRNN, ElmanNN, and ELM models. Moreover, the unlearning ground-level-measured O 3 data validated the fact that the RF-GAM model showed better extrapolation performance (R 2 = 0.67, RMSE = 25.68 µg m −3 ). The result of variable importance suggested that time, sund, and sp were key factors for the surface 8 h O 3 concentration over the Tibetan Plateau. Based on the RF-GAM model, we found that the estimated 8 h O 3 concentration exhibited notable spatial variation, with higher values in some cities of the northern Tibetan Plateau, such as Huangnan (73.48 ± 4.53 µg m −3 ) and Hainan (72.24 ± 5.34 µg m −3 ), and lower values in some cities of the southeastern Tibetan Plateau, such as Aba (55.17 ± 12.77 µg m −3 ). Besides this, we also found that the O 3 level displayed a slow increase, from 64.74±8.30 to 66.45±8.67 µg m −3 from 2005 to 2015, while the O 3 concentration decreased to 65.87±8.52 µg m −3 in 2018. The estimated 8 h O 3 level on the Tibetan Plateau showed significant seasonal discrepancy in the order of spring (75.00 ± 8.56 µg m −3 ) > summer (71.05 ± 11.13 µg m −3 ) > winter (56.39 ± 7.42 µg m −3 ) > autumn (56.13 ± 8.27 µg m −3 ). Based on the critical value set by the WHO, most of the cities on the Tibetan Plateau had excellent air quality, while several cities (e.g. Huangnan, Haidong, and Guoluo) still suffered from more than 40 nonattainment days each year.
The RF-GAM model for O 3 estimation has several limitations. First of all, the O 3 estimation of the northern Tibetan Plateau might show some uncertainties because there are few ground-level monitoring sites, and thus we cannot validate the reliability of predicted values in regions without a monitoring site. Secondly, our approach did not include data on the emission inventory or traffic count because the continuous emissions of NO x and VOCs were not open access. Lastly, we only focused on the temporal variation in the surface O 3 concentration in the past 10 years, and the shortterm O 3 data cannot reflect the response of O 3 pollution to climate change. In the future work, we should combine more explanatory variables such as long-term NO x and VOC emissions to retrieve the surface O 3 level over the Tibetan Plateau in recent decades.
Author contributions. This study was conceived by RL and HF. Statistical modelling was performed by RL, YZ, YM, WZ, and ZZ. RL drafted the paper.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Study of ozone, aerosols and radiation over the Tibetan Plateau (SOAR-TP) (ACP/AMT inter-journal SI)". It is not associated with a conference.