Improved 1 km resolution PM2.5 estimates across China using enhanced space–time extremely randomized trees

Fine particulate matter with aerodynamic diameters ≤ 2.5 μm (PM2.5) has adverse effects on human health and the atmospheric environment. The estimation of surface PM2.5 concentrations has made intensive use of satellitederived aerosol products. However, it has been a great challenge to obtain high-quality and high-resolution PM2.5 data from both ground and satellite observations, which is essential to monitor air pollution over small-scale areas such as metropolitan regions. Here, the space–time extremely randomized trees (STET) model was enhanced by integrating updated spatiotemporal information and additional auxiliary data to improve the spatial resolution and overall accuracy of PM2.5 estimates across China. To this end, the newly released Moderate Resolution Imaging Spectroradiometer Multi-Angle Implementation of Atmospheric Correction AOD product, along with meteorological, topographical and land-use data and pollution emissions, was input to the STET model, and daily 1 km PM2.5 maps for 2018 covering mainland China were produced. The STET model performed well, with a high out-of-sample (out-of-station) cross-validation coefficient of determination (R2) of 0.89 (0.88), a low rootmean-square error of 10.33 (10.93) μg m−3, a small mean absolute error of 6.69 (7.15) μg m−3 and a small mean relative error of 21.28 % (23.69 %). In particular, the model captured well the PM2.5 concentrations at both regional and individual site scales. The North China Plain, the Sichuan Basin and Xinjiang Province always featured high PM2.5 pollution levels, especially in winter. The STET model outperformed most models presented in previous related studies, with a strong predictive power (e.g., monthly R2 = 0.80), which can be used to estimate historical PM2.5 records. More importantly, this study provides a new approach for obtaining high-resolution and high-quality PM2.5 dataset across mainland China (i.e., ChinaHighPM2.5), important for air pollution studies focused on urban areas. Published by Copernicus Publications on behalf of the European Geosciences Union. 3274 J. Wei et al.: Improved 1 km resolution PM2.5 estimates across China


Introduction
Atmospheric particulate matter is a general term describing all kinds of solid and liquid particles in the atmosphere. Fine particles are those particles in ambient air with aerodynamic diameters of no more than 2.5 µm (PM 2.5 ). Compared to coarser particles, PM 2.5 is rich in toxic and harmful substances and can directly enter the respiratory tract and alveoli of humans. Moreover, they have a long residence time and long transmission distance in the atmosphere (Aggarwal and Jain, 2015). Numerous studies have illustrated that high PM 2.5 concentrations adversely affect human health (Peng et al., 2009;Bartell et al., 2013;Chowdhury and Dey, 2016;Crippa et al., 2019;Song et al., 2019), severely impair the atmospheric environment (Z. Li et al., 2017), and significantly influence cloud and precipitation systems through aerosol radiative and microphysical effects (Koren et al., 2014;Seinfeld et al., 2016). Silva et al. (2013) showed that about 2.1 million people have died each year, resulting from increasing PM 2.5 concentrations around the world.
Nowadays, air pollution is becoming more severe due to continuously increasing anthropogenic aerosols in developing countries, especially in China (He et al., 2011;Liu et al., 2017;Zhai et al., 2019). Fine particulate matter has become the primary pollutant in urban environments, garnering much scrutiny from the public Sun et al., 2016;Wu et al., 2018). Therefore, the China Meteorological Administration established a ground PM 2.5 observation network to monitor the urban air quality in 2004 (Guo et al., 2009), followed by a denser network established by the Chinese Ministry of Environmental Protection in 2013. However, station-based monitoring is largely limited by the instruments and climatic conditions and cannot completely characterize air pollution over large areas. Satellite remote sensing technology has led to a variety of operational aerosol optical depth (AOD) products (Levy et al., 2013;Lyapustin et al., 2018), leading to estimates of PM 2.5 on large scales due to the positive relationship between AOD and PM 2.5 concentration .
Over the years, numerous approaches have been proposed to improve the PM 2.5 -AOD relationship. Physical models typically construct physical relationships between surface particulate matter concentrations and satellite AOD products through altitude and humidity corrections (Zhang and Li, 2015). Statistical regression models, e.g., the multiple linear regression model, the linear mixed-effect model, the two-stage model and the geographically weighted regression (GWR) model, have been widely used for applications due to their simplicity and versatility (Gupta and Christopher, 2009;Ma et al., 2014;Xiao et al., 2017;Yao et al., 2019). Artificial intelligence models mainly involve machine learning and deep learning models, e.g., the random forest (RF; Brokamp et al., 2018;Chen et al., 2018;Wei et al., 2019a), the extreme gradient boosting model (XGBoost; Chen et al., 2019), and the back-propagation and generalized regression neural networks (BRNN and GRNN; T. Li et al., 2017a). PM 2.5 is jointly affected by numerous factors, e.g., meteorological conditions, human activities and topography, showing great spatial and temporal heterogeneities. This makes it difficult for traditional physical and statistical regression approaches to accurately explain and construct PM 2.5 -AOD relationships, leading to poor PM 2.5 estimates. Despite their stronger data mining ability, most artificial intelligence approaches have been simplistically adopted in PM 2.5 predictions, neglecting the spatiotemporal characteristics of PM 2.5 (Brokamp et al., 2018;Chen et al., 2018Chen et al., , 2019T. Li et al., 2017a;Xue et al., 2019). Furthermore, deep learning is highly dependent on the performance of a computer and is less computationally efficient. In addition, most widely used aerosol products are generated at low spatial resolutions (3-50 km), a serious limitation for applications over small-scale regions such as urban areas.
To account for the spatiotemporal heterogeneity of PM 2.5 , the space-time extremely randomized trees (STET) model developed in our previous study for estimating PM 1 (Wei et al., 2019b) is adopted here with further refinements for improving the estimation of PM 2.5 using the high-spatialresolution (1 km) Moderate Resolution Imaging Spectroradiometer (MODIS) Multi-Angle Implementation of Atmospheric Correction (MAIAC) AOD product. Note that PM 1 and PM 2.5 emission sources, formation and transport mechanisms, and health impacts differ. Their spatial patterns and distributions also differ, and their particle ratio varies greatly, ranging from less than 0.5 to greater than 0.9 at both spatial and temporal scales, especially in highly polluted regions, as in China (Wei et al., 2019b). The STET model has been improved by using corrected AODs, adding pollutant emissions, updating the feature selection and improving the determination of spatiotemporal information. Based on this, spatially continuous high-resolution and high-quality PM 2.5 dataset across mainland China (i.e., ChinaHighPM 2.5 ) in 2018 are generated from the MODIS MAIAC AOD product at a 1 km resolution using meteorological, land-use, topographic, population and emission parameters. Section 2 describes the data sources and integration. Section 3 introduces the enhanced STET model in detail, and Sect. 4 presents the validation and comparison of our PM 2.5 estimates across China. Section 5 compares our model with those models developed in previous related studies, and Sect. 6 gives a summary and conclusions.
2 Data sources 2.1 PM 2.5 ground measurements Hourly in situ PM 2.5 observations at 1583 monitoring stations ( Fig. 1) across mainland China from 1 January 2017 to 31 December 2018 were collected then averaged to ob-tain daily mean PM 2.5 measurements. PM 2.5 observations are measured using the tapered element oscillating microbalance approach or β-attenuation monitors that have undergone further calibration and strict quality control procedures (Guo et al., 2009).

MAIAC AOD product
The MAIAC algorithm was developed to generate MODIS aerosol products from the darkest to the brightest surfaces at a 1 km spatial resolution over land (Lyapustin et al., 2011). On 30 May 2018, official 1 km resolution MAIAC aerosol products were released and made freely available to all users. This dataset is produced using the revised MAIAC algorithm with continuous improvements in scale transition using spectral regression coefficients, cloud detection, determination of aerosol models, over-water processing and general optimization in the global aerosol retrieval process (Lyapustin et al., 2018). MAIAC daily aerosol products from the Terra and Aqua satellites were collected from 2017 to 2018 across China, and 550 nm AOD retrievals with high quality assurance (QA CloudMask = Clear and QA AdjacencyMask = Clear) were used.
Here, the MAIAC AOD retrievals were first evaluated against surface observations at 18 AERONET monitoring stations in China (Fig. 1) using the spatiotemporal matching approach (Wei et al., 2019c, d). MAIAC AOD retrievals are highly accurate, with small estimation errors across mainland China. More than 84 % of the matchups satisfy the MODIS expected error (Levy et al., 2013) at the national scale (Fig. 2a). Besides vegetated surfaces, e.g., cropland and grassland, the MAIAC algorithm shows considerable accuracy over heterogeneous urban surfaces (Fig. 2b). MAIAC AOD products are more accurate and less biased than the widely used Dark Target (DT) and Deep Blue products at coarse spatial resolutions Wei et al., 2019e;Tao et al., 2019;Zhang et al., 2019). More importantly, the DT algorithm generates a large number of missing values over bright surfaces, and aerosol loadings are significantly overestimated over heterogeneous urban surfaces (Levy et al., 2013;Wei and Sun, 2017;Wei et al., 2018a. Therefore, higher-data-quality and higher-spatial-resolution MAIAC products, which can generate more accurate and detailed PM 2.5 estimates, are selected.

Auxiliary data
Auxiliary data include meteorological, land-cover, surface topographic and population data. The meteorological variables are collected from ERA-Interim atmospheric reanalysis products, including the boundary layer height (BLH), evaporation (ET), temperature (TEM), precipitation (PRE), relative humidity (RH), surface pressure (SP), wind speed (WS) and wind direction (WD). Observations of meteorological variables made between 10:00 and 14:00 LT (local time) are av-eraged to be consistent with satellite overpass times. Landcover data include the MODIS land use cover and normalized difference vegetation index (NDVI) products. Topographic data, i.e., the surface elevation, slope, aspect and relief (Wei et al., 2019d), are calculated from the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) product, and the population data are from Visible Infrared Imaging Radiometer Suite nighttime lights (NTL) data. Different from our previous study (Wei et al., 2019b), pollutant emissions for different precursors (including SO 2 , NO x , CO and volatile organic compounds) and fine-sized dust are also employed to help explicitly explain the PM 2.5 composition, collected from a multi-resolution emission inventory for China (MEIC; Zhang et al., 2007). Table 1 provides detailed information about the data sources.

Methodology
Here, a tree-based ensemble learning approach, called the extremely randomized trees (ERT; Geurts et al., 2006), is selected to deal with complex supervised regression issues and to construct robust PM 2.5 -AOD relationships. This model splits nodes by randomly selecting cut-points and uses all training samples to grow trees instead of the bootstrap approach. The model efficiently solves variance problems and mines more valuable information compared to other widely used tree-based approaches, e.g., the decision tree and RF. Unlike the STET model used in our previous study for retrieving PM 1 (Wei et al., 2019b), the current algorithm for retrieving PM 2.5 is partly based on the STET model and is enhanced by a series of refinements to further optimize and strengthen the model capacity to improve the estimation accuracy, including (1) using aerosol precursor gases (SO 2 , CO, NO x and VOC and fine-sized dust) from pollutant emission inventories as additional input, (2) correcting satellite retrievals of AOD with reference to ground-based measurements, (3) modifying the feature-selection approach using the Gini (GI) index and (4) improving the determination of spatiotemporal information.

Data correction and integration
Although the MAIAC algorithm performs generally well in China, with a mean absolute error (MAE) of 0.06 and a rootmean-square error (RMSE) of 0.121 (Fig. 2), a systematic error in the AOD retrievals (τ s ) can be corrected by linear regression between in situ AOD measurements collected at all AERONET sites in China matched with the MAIAC retrievals, given as follows: Due to the difference in cloud distributions at their respective imaging times, the spatial coverages of Terra and Aqua MAIAC AOD products differ. Terra and Aqua MAIAC AOD   retrievals are thus averaged for each pixel on each day to form a new dataset and enlarge the spatial coverage. By integrating the two datasets, the spatial coverage increased by more than 15 % over most areas in China, leading to PM 2.5 maps with wider spatial coverages. The number of valid data samples also significantly increased by approximately 25 %-32 %, improving the model training ability. Due to different spatial resolutions, all auxiliary variables were uniformly aggregated to a 1 km spatial resolution using the bilinear interpolation approach. After removing invalid or unrealistic values, there are 167 716 matched PM 2.5 -AOD samples and independent variables collected for 2018 in China.

Potential effects of variables on PM 2.5
The potential relationships between all selected independent variables and PM 2.5 measurements are first investigated (Fig. 3). AOD is highly positively related to PM 2.5 measurements (R = 0.54), and all pollutant emissions, nighttime lights and land use cover show positive effects on PM 2.5 . In contrast, all topographical variables and NDVI are negatively related to PM 2.5 . Moreover, except for ET (R = 0.24) and SP (R = 0.16), the other meteorological variables show negative effects on PM 2.5 , especially for BLH (R = −0.22) and TEM (R = −0.17). In general, all of the selected variables are significantly correlated to PM 2.5 measurements at the confidence level of 0.01 or 0.05 (two sides), so they are used as inputs to the STET model for preliminary training.

Updated feature selection
Due to the large number of independent variables considered, over-fitting will occur during the model training process. The model thus needs further adjustment, done by selecting the most important variables rather than all variables to overcome this issue and improve the model efficiency. In this study, the GI index is selected to calculate the importance score of each independent variable on PM 2.5 estimates because of its higher accuracy and stability as a variableimportance measure, especially for continuous variables with low signal-to-noise ratios (Jiang et al., 2009;Calle and Urrea, 2011), expressed as where n represents the number of the categories (N = 1, . . ., n) and ω n represents the sample weight of each category. The importance of one feature (X j ) on node m is that the GI changes before and after node m branching as follows: where GI l and GI r represent the GI of two new nodes after branching. The importance score for one feature (IS j ) in the then extra trees with k trees (i = 1, . . ., k) is calculated as where GI ij represents the importance of X i in the ith tree when the node of feature X i in decision tree j belongs to set M. Finally, an additional normalization approach is performed to all obtained importance scores for each feature.
The results suggest that AOD is the most influential variable, contributing ∼ 32.5 % to daily PM 2.5 estimates (Fig. 3). Most meteorological variables contribute more to PM 2.5 estimates, especially BLH, ET and TEM, with average importance scores of 9.6 %, 7.7 % and 7.3 %, respectively. The PM 2.5 -AOD relationship might largely depend on the compositions (e.g., aerosol water; Reddington et al., 2019;Jin et al., 2020). High RH conditions and precipitation should have large influences on the production and removal of PM 2.5 Zheng et al., 2015). However, RH and PRE become less important, with overall low importance scores in the STET model, which may be attributed to the fact that aerosol retrieval algorithms only work under cloud-free conditions when RH is relatively low. More importantly, the calculated importance score only represents the importance of features in splitting during the extra-tree construction, not the contribution of features to PM 2.5 in physical mechanisms. Two main land-use variables, i.e., NDVI and DEM, are also important to PM 2.5 estimates, while the pollutant emissions show different effects on PM 2.5 , with varying importance scores, especially for NH 3 , CO, SO 2 and fine-sized dust. The eight least important variables with low importance scores of < 2 % are excluded from the STET model, and the remaining 14 more important variables are selected as inputs to build the PM 2.5 -AOD relationship.

Improved spatiotemporal information
Spatiotemporal heterogeneities, i.e., strong spatial autocorrelations and clear temporal variations, are the key characteristics of PM 2.5 , presenting great challenges and usually neglected in most regression and artificial intelligence models. Therefore, in this study, the STET model is further enhanced to solve this problem by more accurately determining the spatial and temporal information. For this purpose, the Haversine approach is selected to calculate the great-circle distance between two points on a sphere specified by their latitudes and longitudes (Eqs. 5-7). This approach can avoid the problem of insufficient effective numbers due to the short distance between two points by using sines, used to represent the space term (P s ). In addition, instead of using the day of the year (DOY), the time radian difference for each point on different days in a year is calculated (Eq. 8) to minimize the impact of the seasonal cycle and is selected to represent the time term (P T ). These two improved space-time terms can account for the spatiotemporal autocorrelations of PM 2.5 between different points for each day and between consecutive time series at the same place.
where α 1 and α 2 denote the latitudes of two points, β 1 and β 2 denote the longitudes of two points in space, r denotes the radius (in km) of the earth, d represents the DOY, and T represents the total number of days in the year in question.
For the enhanced STET model, all of the selected independent variables are first input into the ERT model, and the random splits (S, a i ) are established according to the whole of training data samples; second, totally different K attributes are selected randomly from all attributes according to spatial and temporal differences; third, K random splits are generated (s 1 , . . ., s k ), and a split (s * ) is selected by calculating the score measure function, i.e., Score(s * , S); fourth, split node (S) is completely randomly generated to establish an extra tree; last, the extra tree ensemble is built using the similarity method. Detailed information on ERT algorithm can be found in Geurts et al. (2006). Figure 4 illustrates the schematic of the enhanced STET model.

Model validation approach
Different from our previous study, three independent validation methods are performed to verify the ability of the model to estimate PM 2.5 concentrations. The first independent validation method, i.e., the out-of-sample cross-validation (CV) approach, is performed on all data samples using the 10-fold CV procedure (Rodriguez et al., 2010). The data samples are randomly divided into 10 subsets; nine are used as training data, and one is used as validation data. This approach is repeated 10 times, and error rates are averaged to obtain the final result. This is a common approach to evaluate the overall accuracy of a machine learning model, widely adopted in most satellite-derived PM studies (T. Li et al., 2017a, b;Ma et al., 2014Ma et al., , 2019Xiao et al., 2017;He and Huang, 2018;Chen et al., 2019;Wei et al., 2019a, b;Xue et al., 2019;Yao et al., 2019).
The second independent validation method, i.e., out-ofstation CV approach, is similar to the first one but performed using data from the monitoring stations to evaluate the spatial performance of the model. Data samples collected from different spatial points make up the training and testing data, and the relationship between spatial predictors and PM 2.5 built from the training dataset is then estimated for each testing. The third independent validation approach tests the predictive power of the model. It is performed by applying the model built for a specific year to predict the PM 2.5 concentrations for other years, then validating the results against the corresponding ground measurements. This approach ensures that the data samples for model training and validation are completely independent on both spatial and temporal scales. Several traditional statistical metrics are selected to describe the model performance, including the correlation coefficient (R), R 2 , RMSE, MAE and the mean relative error (MRE).  Figure 5 shows the out-of-sample and out-of-station 10-CV results of daily PM 2.5 estimates for the traditional ERT model and our enhanced STET model at the national scale in 2018. The original ERT model works well in estimating PM 2.5 concentrations, with an average out-of-sample CV R 2 of 0.84 and overall small estimation uncertainties. However, when considering spatiotemporal information, the model performance significantly improves, with a sample-based CV R 2 of 0.89, a stronger regression line, and a decreasing RMSE of 10.33 µg m −3 , MAE of 6.69 µg m −3 and MRE of 21.28 %. Regarding the spatial performance, compared to the original ERT model, the enhanced STET model shows a stronger spatial predictive power, with a higher out-of-station CV R 2 of 0.88, a lower RMSE of 10.93 µg m −3 , MAE of 7.15 µg m −3 and MRE of 23.69 %. In addition, compared to the sample-based validation, the out-of-station accuracy changes little, suggesting that the enhanced STET model can estimate well the daily PM 2.5 concentrations. Moreover, these results illustrate that spatiotemporal information is crucial in improving PM 2.5 -AOD relationships and should be carefully considered when introducing regression models using remote sensing techniques. Figure 6 shows the sample-based 10-CV results of the enhanced STET model in PM 2.5 daily estimates over eastern and western China (according to the widely used Heihe-Tengchong line) and four typical regions (Fig. 1). The enhanced STET model performs differently over eastern and western China, mainly due to significant differences in land cover and climate conditions. There are 1289 uniformly distributed PM 2.5 stations in eastern China, and 127 241 daily samples were collected. The model performs well in eastern China, with a high sample-based CV R 2 equal to 0.90 and low estimation uncertainties, i.e., RMSE = 9.72 µg m −3 , MAE = 6.41 µg m −3 and MRE = 19.16 %. In contrast, there are 294 unevenly and sparsely distributed PM 2.5 stations in western China, with about 3 times fewer daily PM 2.5 estimates collected. The model performance is overall poorer (e.g., CV R 2 = 0.85, RMSE = 12.04 µg m −3 , MAE = 7.56 µg m −3 ) than over eastern China. This is mainly attributed to brighter surfaces (e.g., desert and bare land) with little vegetation and harsh meteorological conditions over western China.

Regional-scale validation
There were 33 733, 15 199, 6209 and 6470 daily samples collected from 233, 184, 95 and 107 uniformly distributed PM 2.5 monitoring stations in the North China Plain (NCP), the Yangtze River Delta (YRD), the Pearl River Delta (PRD) and the Sichuan Basin (SCB), respectively. Estimated PM 2.5 concentrations in the typical urban agglomerations of the NCP, YRD and PRD are highly consistent with surface measurements (CV R 2 = 0.86-0.92), with overall low estimation uncertainties (i.e., RMSE = 8-12 µg m −3 , MAE = 5-8 µg m −3 and MRE = 15 %-19 %). The new model also performs well over the Sichuan Basin, with an average CV R 2 value equal to 0.87 and comparable estimation uncertainties to those from the NCP. Overall, despite some differences in model performance, the enhanced STET model shows an overall good ability in estimating PM 2.5 concentrations at the regional scale.

Site-scale validation
National-and regional-scale aggregated evaluations mainly illustrate the overall performance of the model in estimating PM 2.5 concentrations. However, due to the inhomogeneity of PM 2.5 monitoring stations, an additional validation for each monitoring station in China is performed (Fig. 7). For statistical significance, only these monitoring stations with more than 10 data samples are plotted. Daily PM 2.5 estimates relate well to surface measurements at most individual stations across China. The average sample-based CV R 2 is 0.84, and CV R 2 values are greater than 0.8 at more than 73 % of the monitoring stations, especially in eastern China. However, observed are relatively poorer performances (CV R 2 < 0.6) at some scattered sites located in southwest and southeast China. In general, the new model shows overall low estimation uncertainties at most sites, with average RMSE and MAE values of 9.2 and 6.5 µg m −3 , especially in southern China. Moreover, ∼ 94 % of the monitoring stations in China have mean RMSE and MAE values of less than 15 and 10 µg m −3 , respectively. Note that these stations have larger RMSE values (> 10 µg m −3 ) in central China, mainly due to the high pollution levels. The average MRE value in China is 20.8 %, and most stations (> 86 % of them) have MRE values of less than 30 %, especially at sites located in eastern and southern China. = 0.77) on most days in the year, and more than 77 % of these days have CV R 2 values greater than 0.7. Two main uncertainty metrics, i.e., RMSE and MAE, show similar temporal variations during the year, first decreasing until around day 250, then gradually increasing. Approximately 91 % and 92 % of the days have low RMSE and MAE values of less than 15 and 10 µg m −3 , respectively, over the year. MRE is relatively stable, ranging from 13 % to 49 % with an average value of 23.2 %, and more than 87 % of the days have MRE values of less than 30 % in China. In general, high R 2 with overall large RMSE but small MRE values are observed at the beginning and end of the year (in winter). This is because PM 2.5 concentrations vary more and are always high due to the greater amount of pollutant emissions caused by heating or frequent dust storms. In contrast, lower R 2 with overall small RMSE and large MRE values are observed in the middle of the year (in summer) because air pollution levels are lower. Nevertheless, these results illustrate that the enhanced STET model captures well the PM 2.5 concentrations on most days of the year. Air quality is about 2 or 3 times worse in spring and winter, with wider PM 2.5 ranges and larger standard deviations. The model performance in these seasons is similar, with almost equal CV R 2 and slope values and close estimation uncertainties. The differences in model performance among the seasons are mainly attributed to seasonal variations in natural conditions and human activities. Meteorological conditions in summer favor the diffusion of pollutants but complicate the PM 2.5 -AOD relationship (Su et al., 2018(Su et al., , 2020, whereas direct emissions of pollutants are greater in winter, resulting in severe air pollution.

Predicted PM 2.5 maps across China
Monthly PM 2.5 maps are thus synthesized and averaged from at least 20 % of available daily PM 2.5 estimates for each grid in a month, and annual PM 2.5 maps are generated from monthly PM 2.5 maps if there are more than eight available values for each grid across China (Hsu et al., 2012;Wei et al., 2019f). The spatial coverage of monthly PM 2.5 maps varies from 73 % to 92 %, with an average of 83 % across mainland China. The maximum coverage occurs in April, and the minimum coverage occurs in January. The monthly mean PM 2.5 values vary conversely from 24.4 to 42.9 µg m −3 , where the highest (lowest) PM 2.5 concentration is observed in December (August).   The satellite-derived 1 km resolution PM 2.5 map in 2018 covers almost the full area (spatial coverage = 99 %) across mainland China (Fig. 11a) and is highly consistent in spatial pattern with the corresponding in situ measurements (Fig. 11b). The average PM 2.5 concentration is 32.7 ± 13.6 µg m −3 in 2018 across mainland China. In general, the most severe PM 2.5 pollution occurs in the Taklamakan Desert, where most areas are exposed to high PM 2.5 concentrations of > 80 µg m −3 . There are also high pollution levels over the NCP, the SCB and the YRD, with annual mean PM 2.5 values of 46.7±10.5, 39.8±9.9 and 38.4±8.3 µg m −3 , respectively, arising from intensive human activities, and special topographic and meteorological conditions. In contrast, the annual mean PM 2.5 loading is overall low over the rest of China, e.g., the PRD (33.4 ± 3.9 µg m −3 ). However, there may be poor representativeness for areas in western China with few ground monitoring stations. More than 34 % of mainland China experienced high PM 2.5 levels in 2018, exceeding the international and national recommended air quality level (PM 2.5 > 35 µg m −3 ). Figure 12 shows seasonal mean PM 2.5 maps, averaged from available monthly values for each grid, in 2018 across China. The average PM 2.5 concentration (spatial coverage) is 37.2 ± 20.7 µg m −3 (∼ 96 %), 25.5 ± 12.1 µg m −3 (∼ 92 %), 29.5 ± 11.5 µg m −3 (∼ 97 %) and 41.3±15.4 µg m −3 (∼ 88 %) for spring, summer, autumn and winter, respectively. There are noticeable spatial differences in PM 2.5 distributions on the seasonal scale. In winter and spring, more than 49 % and 42 % of mainland China were exposed to high PM 2.5 levels > 30 µg m −3 , resulting in poor quality. In contrast, PM 2.5 pollution is lower in summer and autumn, with more than 90 % and 74 % of mainland China, respectively, experiencing PM 2.5 levels below the acceptable air quality level. Note that in spring, PM 2.5 concentrations are particularly high in Xinjiang Province due to frequent sand and dust episodes in 2018.

Model accuracy
There is an increasing number of studies on estimating PM 2.5 using satellite AOD products from local to national scales across China. However, limited by the operational satellite aerosol products, PM 2.5 can only be estimated at coarse spatial resolutions of approximately 6-10 km (Fang et al., 2016;T. Li et al., 2017b;Yu et al., 2017;Chen et al., 2018;Ma et al., 2019;Yao et al., 2019). Recently, with the release of MODIS 3 km DT aerosol products, PM 2.5 estimates can be improved to a 3 km spatial resolution across China (You et al., 2016;T. Li et al., 2017a;He and Huang, 2018;Xue et al., 2019). This study improves the spatial resolution of PM 2.5 estimates across mainland China to 1 km based on the newly released high-quality MAIAC products.
Regarding model performance, our newly developed STET model is more accurate, with higher CV R 2 values and smaller RMSE and MAE values than those from statistical regression models (Table 2), e.g., the timely structure adaptive model (TSAM; Fang et al., 2016), the Generalized Additive Model (GAM; Chen et al., 2018) model, the GWR model (Ma et al., 2014;You et al., 2016), and the geographically and temporally weighted regression model (GTWR; He and Huang, 2018). The enhanced STET model can also outperform most machine learning (ML) and deep learning approaches including the Gaussian model (Yu et al., 2017), the random forest model (Chen et al., 2018;Wei et al., 2019a), the XGBoost model , the GRNN and deep brief network (DBN) models (T. Li et al., 2017a, b), and some optical combined models, e.g., the Daily-GWR model (D-GWR; He and Huang, 2018), the two-stage model (He and Huang, 2018;Ma et al., 2019;Yao et al., 2019) and the ML + GAM model (Xue et al., 2019).
We find that all traditional statistical regression models and machine and deep approaches reported in previous studies underestimated PM 2.5 concentrations under highly polluted conditions, with poor regressions (i.e., slope < 0.9 and intercept > 6 µg m −3 ) between measurements and retrievals of PM 2.5 in China, a common problem. Potential causes are as follows: (1) there are large estimation errors in AOD retrievals under severe pollution conditions in China (Wei et al., 2019c). This is further rooted to the fundamental limitations of satellite-based AOD retrievals, i.e., the nonlinear  to reflectance and the high sensitivity of the single-scattering albedo ); (2) high AOD does not correspond to high PM 2.5 concentrations because their ratio is highly variable over space and time, affected by both natural and human factors; and (3) the number of samples for high-pollution cases is small, hindering the ability to train the model. Therefore, our model also tends to underestimate PM 2.5 concentrations on highly polluted days (PM 2.5 > 150 µg m −3 ); however, it can more accurately capture the high pollution events, with a larger slope of 0.86 and a smaller intercept of 6.16 µg m −3 with reference to other models reported from previous studies (Table 2).
Furthermore, compared with daily PM 1 estimates using the STET model in our previous study (CV R 2 = 0.76 and slope = 0.70; Wei et al., 2019b), the overall accuracy of daily PM 2.5 estimates using the enhanced STET model has improved significantly, with a much higher CV R 2 of 0.89 and a steeper slope of 0.86, based on data from 2018 in China. Continuous improvements of the model can further improve the determination of the relationship between fine particulate matter and AOD so as to improve the model performance. More data samples may also help improve the training ability of the model.

Predictive power
To test the predictive power of the enhanced STET model, the model built for the year 2018 was used to predict daily PM 2.5 concentrations in 2017, validated against the ground measurements from 2017. Results suggest that our new model can correctly capture more than 65 % of the historical daily PM 2.5 concentrations (N = 177 616). Monthly (N = 12 408), seasonal (N = 5227) and annual (N = 1461) mean PM 2.5 predictions across China are highly correlated with surface observations, with R 2 values of 0.80, 0.81 and 0.82, respectively, having overall small estimation uncertainties (i.e., RMSE < 12 µg m −3 , MAE < 9 µg m −3 and MRE < 26 µg m −3 ). There are only a handful of studies examining the predictive powers of models estimating PM 2.5 concentrations in China. Comparisons show that the enhanced STET model is superior to those reported in previous studies, i.e., the two-stage model (Ma et al., 2019), the GTWR model (He and Huang, 2018), the ML + GAM model (Xue et al., 2019) and the space-time RF model (Wei et al., 2019a). The enhanced STET model has a strong predictive power and can be used to estimate historical PM 2.5 concentrations in China.

Summary and conclusions
With the increase in air pollution over recent years, abundant studies on estimating PM 2.5 have been performed using satellite remote sensing. However, most of the PM 2.5 estimates are reported at spatial resolutions of 3-10 km, which is inadequate for monitoring air quality in urban areas. Traditional models also limit the accuracy of PM 2.5 estimates. Here, we present spatially continuous high-resolution (1 km) and high-quality PM 2.5 dataset across mainland China (i.e., ChinaHighPM 2.5 ). For this, an enhanced STET model was developed to minimize spatiotemporal heterogeneities and improve the overall estimate accuracy of ground-level PM 2.5 concentrations.
Our results suggest that the enhanced STET model estimates well the daily PM 2.5 concentrations at the national scale, with a relatively high sample-based cross-validation coefficient of 0.89, low RMSE of 10.35 µg m −3 , MAE of 6.71 µg m −3 and MRE of 21.37 %. Comparisons illustrate that spatiotemporal information is important and should be carefully considered during model development. The enhanced STET model estimates PM 2.5 concentrations well at most monitoring stations and individual days in the year. The North China Plain and the Sichuan Basin regions, under the influence of intense human activities and poor dispersion conditions, have high PM 2.5 loadings. The enhanced