The drivers and health risks of the unexpected surface ozone 1 enhancements over the Sichuan basin, China in 2020

18 After a continuous increase in surface ozone (O 3 ) level from 2013 to 2019, the overall 19 summertime O 3 concentrations across China showed a significant reduction in 2020. In contrast to 20 this overall reduction in surface O 3 across China, unexpected surface O 3 enhancements of 10.2 ± 21 0.8 ppbv (23.4%) were observed in May-June 2020 vs. 2019 over the Sichuan basin (SCB), China. 22 In this study, we use high resolution nested-grid GEOS-Chem simulation, the eXtreme Gradient 23 Boosting (XGBoost) machine learning method, and the exposure−response relationship to 24 determine the drivers and evaluate the health risks of the unexpected surface O 3 enhancements. We 25 first use the XGBoost machine learning method to correct the GEOS-Chem model-to-measurement 26 O 3 discrepancy over the SCB.


Abstract 18
After a continuous increase in surface ozone (O3) level from 2013 to 2019, the overall 19 summertime O3 concentrations across China showed a significant reduction in 2020. In contrast to 20 this overall reduction in surface O3 across China, unexpected surface O3 enhancements of 10.2 ± 21 0.8 ppbv (23.4%) were observed in May-June 2020 vs. 2019 over the Sichuan basin (SCB), China. 22 In this study, we use high resolution nested-grid GEOS-Chem simulation, the eXtreme Gradient 23 Boosting (XGBoost) machine learning method, and the exposure−response relationship to 24 determine the drivers and evaluate the health risks of the unexpected surface O3 enhancements. We 25 first use the XGBoost machine learning method to correct the GEOS-Chem model-to-measurement 26 O3 discrepancy over the SCB. The relative contributions of meteorology and anthropogenic 27 emissions changes to the unexpected surface O3 enhancements are then quantified with the 28 combination of GEOS-Chem and XGBoost models. In order to assess the health risks caused by the 29 unexpected O3 enhancements over the SCB, total premature death mortalities are estimated. The 30 results show that changes in anthropogenic emissions caused 0.9 ± 0.1 ppbv of O3 reduction and 31 changes in meteorology caused 11.1 ± 0.7 ppbv of O3 increase in May-June 2020 vs. 2019. The 32 meteorology-induced surface O3 increase is mainly attributed to an increase in temperature and the 33 decreases in precipitation, specific humidity and cloud fractions over the SCB and surrounding 34 regions in May-June 2020 vs. 2019. These changes in meteorology combined with the complex 35 basin effect enhance biogenic emissions of volatile organic compounds (VOCs) and nitrogen oxides 36 (NOx), speed up O3 chemical production, and inhabit the ventilation of O3 and its precursors, and 37 therefore account for the surface O3 enhancements over the SCB. The total premature mortality due 38 to the unexpected surface O3 enhancements over the SCB has increased by 89.8% in May-June 2020 39 vs. 2019. 40 Keywords: Ozone; Health risk; Emissions; Meteorology; Chemical model; Machine learning 1

Introduction 2
Surface ozone (O3) is largely generated from its local anthropogenic (fossil fuel and biofuel  3 combustions) and natural (biomass burning (BB), lightning, and biogenic emissions) precursors 4 such as volatile organic compounds (VOCs), nitrogen oxides (NOx), and carbon monoxide (CO) via 5 a chain of photochemical reactions (Cooper, 2019;Sun et al., 2018). Additional portion of surface 6 O3 is transported from long-distance regions or from stratosphere (Akimoto et al., 2015;Wang et al., 7 2020b). Surface O3 is one of the most harmful air pollutants that threatens human health and corps 8 production (Fleming et al., 2018;Lu et al., 2020;Sun et al., 2018;Van Dingenen et al., 2009). 9 Exposure to ambient O3 pollution evokes a series of health risks including stroke, respiratory disease 10 (RD), hypertension, cardiovascular disease (CVD), and chronic obstructive pulmonary disease 11 (COPD) (Brauer et al., 2016;Lelieveld et al., 2013;Li et al., 2015;Liu et al., 2018;Lu et al., 12 2020;Wang et al., 2020c). Lu et al., 2020 estimated that the premature RD mortalities attributable 13 to ambient O3 exposure in 69 Chinese cities in 2019 reached up to 64,370. 14 Surface O3 variability is sensitive to both emissions and meteorological changes (Liu and Wang, 15 2020a, b;Lu et al., 2019c). Meteorological conditions affect surface O3 variability indirectly through 16 changes in natural emissions of its precursors or directly via changes in wet and dry removal, 17 dilution, chemical reaction rates, and transport flux (Li et al., 2019a;Lin et al., 2008;Liu and Wang, 18 2020a;Lu et al., 2019d). A reduction in temperature can lessen O3 production by slowing down the 19 chemical reaction rates (Fu et al., 2015;Lee et al., 2014;Liu and Wang, 2020a) or reducing the 20 biogenic VOCs and NOx emissions (Guenther et al., 2006;Im et al., 2011;Tarvainen et al., 2005). A 21 dryer meteorological condition can result in an increase in surface O3 level (He et al., 22 2017;Kalabokas et al., 2015;Liu and Wang, 2020a). Depending on which process dominates the 23 influence of planetary boundary layer height (PBLH) on surface pollutants, a higher PBLH can 24 either reduce surface O3 level by diluting O3 and its precursors into a larger volume of air (Sanchez-25 Ccoyllo et al., 2006;Wang et al., 2020d) or increase surface O3 level by transporting more O3 from 26 upper troposphere or lessening NO abundance for O3 titration (He et al., 2017;Liu and Wang, 27 2020a;Sun et al., 2009). Precipitation has been verified to decrease surface O3 level through the wet 28 removal of its precursors, and clouds reduce surface O3 level by decreasing the oxidative capacity 29 of the atmosphere and enhancing scavenging of atmospheric oxidants (Lelieveld and Crutzen,30 1990; Liu and Wang, 2020b;Shan et al., 2008;Steinfeld, 1998). A higher wind speed can decrease 31 surface O3 level by a fast ventilation of O3 and its precursors (Lu et al., 2019c;Sanchez-Ccoyllo et 32 al., 2006). 33 Emissions of air pollutants affect surface O3 variability by perturbing the abundances of 34 hydroperoxyl (HO2) and alkylperoxyl (RO2) radicals which are the key atmospheric constituents in 35 formation of O3 (Liu and Wang, 2020b). Many previous studies have verified a nonlinear 36 relationship between O3 and its precursors (e.g., (Atkinson, 2000;Liu and Wang, 2020b;Lu et al., 37 2019d;Sun et al., 2018;Wang et al., 2017). If surface O3 formation regime lies within the VOCs 38 limited region, reductions in VOCs emissions will result in a reduction in surface O3 level. Similarly, 39 if surface O3 formation regime lies within the NOx limited region, reductions in NOx emissions will 40 result in a reduction in surface O3 level (Atkinson, 2000;Wang et al., 2017). If surface O3 formation 41 regime lies within transitional region, reductions in either VOC or NOx emissions will result in a 42 reduction in surface O3 level. Atmospheric aerosols can affect surface O3 level through either 1 heterogeneous reactions of reactive gases (Li et al., 2018;Lou et al., 2014;Lu et al., 2012;Stadtler et 2 al., 2018) or affecting the solar radiation for gases photolysis and oxidation (Li et al., 2011;Lu et al., 3 2019c;Lu et al., 2019d;Xing et al., 2017). 4 Understanding the drivers of surface O3 variability has a strong implication for O3 mitigation 5 purpose (Chen et al., 2020;Lu et al., 2019c;Sun et al., 2018). China has experienced a continuous 6 increase in surface O3 level despite the implementation of control measures on NOx since 2013 (Liu 7 and Wang, 2020a, b;Lu et al., 2018;Lu et al., 2020). Many studies have attempted to determine the 8 drivers of high-O3 events occurred in specific region and time across China. Most of these studies 9 focus on the most densely populated and highly industrialized areas in eastern China, whereas the 10 studies in the rest part of China are still limited (Liu and Wang, 2020a, b;Lu et al., 2019a;Lu et al., 11 2019b;Lu et al., 2012;Wang et al., 2020a;Wang and Lu, 2019;Wang et al., 2017). As China has a 12 vast territory with a wide range of emission levels and meteorological conditions, O3 variability and 13 its drivers may vary both temporally and geographically, so the results from one region are not likely 14 to be applied nationally. In addition, previous studies typically use state-of-the-art chemical 15 transport models (CTMs) with sensitivity simulations to quantify the drivers of O3 variability, e.g., 16 fixed meteorology but varied emission levels to quantify the influences of emission changes or vice 17 versa (Liu and Wang, 2020a, b;Lu et al., 2019a). However, uncertainties in local meteorological 18 fields, emission estimates, and model mechanisms can lead to a discrepancy in CTMs that may 19 affect the accuracy of O3 predictions and their sensitivities to changes in emissions and meteorology 20 (Lu et al., 2019c;Young et al., 2018). This is in particular for the Sichuan basin (SCB), one of the 21 most industrialized and populated cities cluster in western China, where large discrepancies between 22 measured and modelled surface O3 are found due to the complex terrain (Lu et al., 2019c;Wang et 23 al., 2020d). 24 After a continuous increase in surface O3 level from 2013 to 2019, the summertime (May-25 August) O3 concentration across China showed a significant reduction in 2020 ( Figure 1 (d)). In this 26 study, we use high resolution nested-grid GEOS-Chem simulation, the eXtreme Gradient Boosting 27 (XGBoost) machine learning method, and the exposure−response relationship to determine the 28 drivers and evaluate the health risks of the unexpected surface O3 enhancements. We first use the 29 XGBoost machine learning method to correct the GEOS-Chem model-to-measurement O3 30 discrepancy over the SCB. The relative contributions of meteorology and anthropogenic emissions 31 changes to the unexpected surface O3 enhancements are then quantified with the combination of 32 GEOS-Chem and XGBoost models. In order to assess the health risks caused by the unexpected O3 33 enhancements over the SCB, total premature death mortalities are also estimated. 34

Methods and data 35
2.1 Surface O3 data and auxiliary data over the SCB 36 China has identified nine city clusters that lead the populations and developments of economy, 37 society, and culture across China. The SCB contains the fourth-largest cities cluster in China after 38 the Yangtze River Delta (YRD), the Pearl River Delta (PRD), and the Beijing-Tianjin-Hebei (BTH) 39 cities clusters. The location of the SCB city cluster is shown in Figure S1. With Chongqing and 40 Chengdu as the dual core cities, more than a dozen cities including Mianyang, Deyang, Yibin, 41 Nanchong, Dazhou, and Luzhou over the SCB have achieved rapid economic development and 42 industrial upgrading. As the region with the strongest economic strength and best economic 1 foundation in western China, the SCB region has many industries such as energy and chemical 2 industry, electronic information, food processing, equipment manufacturing, eco-tourism, and 3 modern finance. As one of the most densely populated and highly industrialized regions in China 4 combined with the basin terrain which is favorable to the accumulation of air pollutants, the SCB is 5 a newly emerging region of severe air pollution in China (Lu et al., 2019b;Lu et al., 2012). 6 Surface O3 measurements over the SCB are available from the China National Environmental 7 Monitoring Center (CNEMC) network (http://www.cnemc.cn/en/, last access: 2 July 2021). The 8 CNEMC network has routinely measured the concentrations of CO, O3, NO2, SO2, PM10, and PM2. 5 9 at 122 sites in 22 key cities over the SCB since 2015. The mean geolocation, population, the number 10 of measurement site, and data range of each city are summarized in Table  from January 2019 to June 2020. 37 The simulations were driven by GEOS-FP meteorological field at the native resolution of 0.25° 38 × 0.3125° and 47 layers from surface to 0.01 hPa at the top. The PBLH and surface meteorological 39 variables are implemented in 1-hour interval and other meteorological variables are in 3-hour 40 intervals. The time step applied in the model for transport is 5 minutes and for chemistry and 41 emissions is 10 minutes (Philip et al., 2016) The non-local scheme for the boundary layer mixing 42 process is from (Lin and McElroy, 2010), wet deposition is from (Liu et al., 2001), and dry 1 deposition is generated with the resistance-in-series algorithm (Wesely, 1989;Zhang et al., 2001). 2 The photolysis rates are from the FAST-JX v7.0 photolysis scheme (Bian and Prather, 2002). 3 Chemical mechanism follows the universal tropospheric-stratospheric Chemistry (UCX) 4 mechanism (Eastham et al., 2014). The GEOS-Chem simulation outputs 47 layers of O3 and other 5 atmospheric constituents over China with a temporal resolution of 1 hour. 6 We

Correction of GEOS-Chem discrepancy with machine learning method 16
We used the XGBoost machine learning method to correct the GEOS-Chem model-to-17 measurement O3 discrepancy over the SCB following (Keller et al., 2021). The same methodology 18 has also been applied in our companion study examining ozone changes over the eastern China from 19 2019 to 2020 . It uses the Gradient Boosting Decision Tree (GBDT) framework to 20 iteratively train the GEOS-Chem model-to-measurement discrepancy to improve the model (2) 26 where gi and hi are first and second order gradients of the loss function, respectively. ( ) 28 represents the optimization objective function to be solved at the t-th iteration. ( ,̂( −1) ) is the 29 loss function representing the difference between the prediction for the i-th sample at the (t-1)-th 30 iteration and the real values yi. Function f(t) is the amount of change at the t-th iteration. Overall, 31 the objective function consists of a two-order Taylor approximation expansion of the loss function 32 and the regularization term (Ω( ) ) , which penalizes the complexity of the model and prevents 33 overfitting of the model. Compared to traditional GBDT method, XGBoost method has the 34 following advantages: (1) effectively handles missing values; (2) prevents overfitting; (3) reduces 35 computing time by using parallel and distributed computing methods. 36 Since GEOS-Chem model-to-measurement discrepancy is usually site-specific, we train a 37 separate XGBoost model for each site. Similar to the method of (Keller et al., 2021), we use a full 38 seasonal cycle of hourly measurements in 2019 at each site as the learning samples, and GEOS-39 Chem input of emissions and meteorological parameters, output concentrations of atmospheric 1 constituents, and time information as training input data. In order to incorporate emissions and 2 meteorological factors that affect O3 production properly, we have included the GEOS-Chem 3 simulated concentrations of 43 atmospheric chemical constituents, emissions of 21 atmospheric 4 chemical constituents, 10 meteorological parameters, and 4-time parameters (e.g., hour, day, month, 5 and year) into the data training. All these training input data are summarized in Table S1 and have 6 been standardized as equation (2) in section S2. We choose a learning rate of 0.35, maximum tree 7 depth of 6, L1 and L2 regularization terms of 0 and 1, the loss function of mean square, and 8 evaluation metric of root-mean-square error (RMSE) in the data training. 9 We use k-fold cross-validation method to test the performance of the XGBoost model (k=1 -10 n). First, all sample data are randomly and uniformly divided into k groups, where one group is 11 taken as the test dataset and the remaining k-1 groups are taken as the training dataset. We then start 12 to train the model and use the test dataset to evaluate the performance of the trained model. We 13 repeated this process for k times by using different groups of datasets as the test data. The training 14 model is finally determined if all the k groups of experiments show similar performances. This 15 method ensures the stability and robustness of XGBoost model and avoid overfitting. In this study, 16 a 10-fold cross-validation method is applied, i.e., we divide the O3 measurements in 2019 into 10 17 groups of sub data: the training dataset accounts for 90% and the test dataset accounts for the 18 remaining 10% of the total sample data. We also attempted to use 60% and 80% of the sample data 19 as the training dataset and do not find significant influences on the results, indicating the robustness 20 of the XGBoost training model. 21

Quantifying meteorological and emissions contributions 22
We have used the GEOS-Chem only and the combination of GEOS-Chem and XGBoost model 23 (hereafter GEOS-Chem-XGBoost) to quantify the contributions of meteorology and anthropogenic 24 emissions to the unexpected surface O3 enhancements over the SCB in 2020, following (Yin et al., 25 2021). For the GEOS-Chem method, since the anthropogenic emissions are fixed, the simulated O3 26 differences between 2020 and 2019 can be attributed to changes in meteorological conditions, which 27 is calculated as, 28 _ = 2020 − 2019 (4) 29 The contribution of anthropogenic emissions changes can then be quantified as, where G_Met and G_Emis represent the meteorology and anthropogenic emissions contributions, 32 respectively. Meas2019 and Meas2020 represent O3 measurements in 2019 and 2020, respectively. 33 G2019 and G2020 represent GEOS-Chem O3 simulations in 2019 and 2020, respectively. 34 Since the GEOS-Chem-XGBoost model has corrected the GEOS-Chem model-to-35 measurement discrepancy in 2019, we assume it can provide accurate predictions to the surface O3 36 measurements in 2020 if the anthropogenic emissions were unchanged. This assumption is valid 37 since the probability density functions (PDFs) of key O3 precursors and meteorological parameters 38 for the training data within a full seasonal cycle of 2019 cover the variation ranges of these factors 39 in May-June 2020 ( Figures S2 and S3). For predicting O3 evolutions in 2020, all input parameters 40 except anthropogenic emissions fed into each GEOS-Chem-XGBoost model are updated to match 41 the measurements in 2020, while anthropogenic emissions are fixed at the 2019 levels. As a result, 42 the differences between the GEOS-Chem-XGBoost predictions for 2020 and the 2020 1 measurements are attributed to the changes in anthropogenic emissions (equation (6)). The 2 meteorology induced contributions are then obtained as equation (7) by subtracting the 3 anthropogenic emissions induced contributions. 4 _ = 2020 − 2020 (6) 5 where the acronyms are similar to those in equations (4) and (5) but for GEOS-Chem-XGBoost 7 method. By correcting the model-to-measurement discrepancy, GEOS-Chem-XGBoost model is 8 expected to provide a more accurate O3 sensitivity to changes in both meteorology and 9 anthropogenic emissions as will be discussed later. 10

Health risks evaluation 11
We have assessed the total premature mortalities including all nonaccidental causes, 12 hypertension, CVD, RD, COPD, and stroke attributed to ambient O3 exposure in all cities over the 13 SCB in 2019 and 2020. We first calculated the O3 induced daily premature mortalities based on the 14 exposure−response relationship described in Cohen et al., 2004, which have been used in many 15 subsequent studies (Anenberg et al., 2010;Liu et al., 2018;Wang et al., 2021). We then added up the 16 daily premature mortalities within May-June or the whole year to get the total O3 induced premature 17 mortalities in the respective periods. The population data used in this study include all age groups, 18 which may result in higher daily mortalities than expected (Liu et al., 2018;Wang et al., 2021). 19 According to (Cohen et al., 2004), the daily premature mortalities attributable to ambient O3 20 exposure can be estimated by the following equation (Cohen et al., 2004), 21 9) where ∆ represents the daily premature mortalities due to ambient O3 exposure. The city 22 representative daily mean O3 concentration Cmeas is an average of all measurements in each city. 23 Variable 0 is the daily baseline mortality rate for each disease averaged from all ages and genders. 24 We follow the method of (Wang et al., 2021) and use the daily 0 value for each disease from the 25 latest China Health Statistical Yearbook in 2018.  represents the increase in daily mortality as a 26 result of each 10 μg/cm 3 (~ 5.1 ppbv) increase in daily O3 concentration, which is often referred to 27 as the concentration response function (CRF) in previous studies. We collected the CRF values from 28 those used in (Yin et al., 2017) and (Wang et al., 2021). ∆ represents the incremental O3 29 concentration relative to the threshold concentration Cthres of 35.1 ppbv, which is obtained from 30 (Lim et al., 2012) and (Liu et al., 2018). Pop represents the population exposed in the ambient O3 31 pollution, which is available from the seventh nationwide population census in 2020 provided by 32 National Bureau of Statistics of China. The daily 0 and  values for all non-accidental causes, 33 hypertension, CVD, RD, COPD, and stroke are summarized in Table S2. cities in the SCB region in 2019 and 2020 are 48.1 ppbv and 58.3 ppbv, which are 11.0 ppbv lower 1 and 1.2 ppbv higher than those averaged over all Chinese cities in the same period, respectively. As 2 the most densely populated and highly industrialized region in western China, the land use, 3 industrialization, infrastructure construction, and urbanization over the SCB have expanded rapidly 4 in recent years, resulting in the highest anthropogenic emissions of O3 precursors and highest surface 5 O3 levels in the region ( Figure S4). Although the O3 levels in the SCB cities cluster are lower than 6 those in the three most developed city clusters in eastern China, i.e., the BTH, the Fenwei Plain 7 (FWP), and the YRD city clusters, the SCB region has been classified by the MEE as a newly 8 pollution region for O3 mitigation (Sun et al., 2021b;Wang et al., 2020a;Wang and Lu, 2019;Zou et 9 al., 2019). Situated in the basin with stationary meteorological fields combined with high 10 anthropogenic emissions, the SCB cities cluster is potential to become a new region with frequent 11 high-O3 events after BTH, FWP, and YRD. 12 We find significant O3 enhancements by 10.2 ± 0.8 ppbv (23.4%) (mean ±1σ standard deviation) 13 averaged over all cities in the SCB in May-June 2020 vs. 2019 levels (Figure 1(c)). The largest 14 enhancements are observed in the most densely populated areas around the megacities Chongqing 15 and Chengdu (11.8 ± 0.6 ppbv (29.9%)). These year-to-year O3 enhancements over the SCB are 16 record high in the 2015-2020 period, following an increasing change rate of 1.2% yr -1 from 2015 to 17 2017 and then a decreasing change rate of −0.7% yr -1 from 2017 to 2019. These surface O3 18 enhancements mainly reflect regional emissions and meteorology changes in the SCB and 19 surrounding regions since the lifetimes of surface O3 and most of its precursors are too short to 20 undergo long range transport. 21 The drivers in a separate study . 28

Model performance assessment 29
We use the metrics of normalized root-mean-square error (NRMSE), normalized mean bias 30 (NMB), and Pearson correlation coefficient (R) to assess the performance of the GEOS-Chem-31 XGBoost model. For each measurement site, we analysed these metrics for both training (blue) and 32 test (red) datasets as shown in Figure S5. We define the NRMSE as the RMSE normalized by the 33 difference between the 95th and 5th percentile concentrations, and NMB as the mean bias 34 normalized by average concentration at the given measurement site. The formulas of above metrics 35 are summarized in Section S2. 36 The GEOS-Chem-XGBoost model predictions for surface O3 over the SCB show no bias when 37 evaluated against the training data, with NMB of 0.01, NRMSEs of less than 0.1, and R between 38 0.93 -1.0. Compared to the training data, the performances on the test data show a higher variability, 39 with an average NMB of -0.04, NRMSE of 0.22, and R of 0.83. We find no significant difference 40 in prediction performance between clean (less than the Cthres defined in section 2.5) and polluted 41 measurement sites. A number of factors likely contribute to relatively poorer statistical results at 42 some sites such as Ganzizhou, Leshan, and Suining. First, the training data of these sites may include 1 certain temporal events that are not easily generalizable, such as unusual emissions activity (e.g., 2 BB, fireworks, closure of nearby point source) or weather patterns that are not properly observed, 3 which might be prone to overfitting. In addition, the differences in surface O3 variabilities between 4 the training data and the test data at these sites are relative larger than other sites, which can 5 contribute to a relative poorer performance. 6 We use the SHapely Additive exPlanations (SHAP) approach to examine how the GEOS-7 Chem-XGBoost model uses the input variables to make a prediction. The SHAP approach is based 8 on game-theoretic Shapely values and represents a measure of each predictor's responsibility for a 9 change in the model prediction (Lundberg and Lee, 2017). The SHAP values are computed 10 separately for each model prediction, which offer detailed insight into the importance of each input 11 variable to this prediction while also consider the role of variables interactions (Keller et al., 12 2021;Lundberg et al., 2020). Figure 2 shows the SHAP value distribution for all the major O3  13 predictors averaged over all cities in the SCB. The results show that all variables that are expected 14 to be associated with O3 formation can affect model O3 prediction. Generally, the temperatures (at 15 the surface, 2 m height, and 10 m height) are the most important predictors for the GEOS-Chem 16 model-to-measurement discrepancy over the SCB, followed by the uncorrected GEOS-Chem 17 simulated O3, reactive nitrogen (e.g., NO2, Peroxyacetyl nitrate (PAN)), atmospheric oxidants (Ox,  18 hydrogen peroxide (H2O2)), fine aerosols, VOCs (Isoprene, C3H8), hour of the day, and 19 meteorological variables including horizontal and vertical wind speeds (u10m, v10m). All of these 20 factors have tight connections to surface O3 formation over the SCB and it is thus not surprising that 21 the GEOS-Chem model-to-measurement discrepancies are most sensitive to them (Seinfeld and 22 Pandis, 2016). 23 We have compared the performances of GEOS-Chem and GEOS-Chem-XGBoost in capturing 24 the measured surface O3 levels. Figure 3 (a) shows the time series of measured and model predicted 25 O3 concentrations averaged over all cities in the SCB region. Figure 3 (b) shows histogram of the 26 differences between the GEOS-Chem-XGBoost predictions and the measurements. The GEOS-27 Chem simulations generally capture the daily variability of MDA8 O3 over the SCB, but they show 28 a high bias of 7.8 ± 5.0 ppbv (17.5%) across all measurement sites within the SCB region. The 29 discrepancy can be mainly attributed to uncertainties in the horizontal transport and vertical mixing 30 schemes simulated by the GEOS-Chem model at a relatively coarse spatial resolution compared to 31 the measurements at a single point, and also associated with the errors in emission estimates, 32 chemical mechanism, and the sub-grid-scale local meteorological processes. Especially, errors of 33 O3 predictors with high SHAP values are more likely to result in large model-to-measurement 34 discrepancy. For example, GEOS-Chem model overestimates the correlations between surface O3 35 concentration and temperature ( Figure S7 (a)), indicating that this overestimation of O3-to-36 temperature sensitivity is one of the major factors contributing to higher GEOS-Chem model O3 37 predictions. 38 By iteratively training and correcting the GEOS-Chem model-to-measurement discrepancy in 39 O3-to-temperature sensitivity, the correlations between temperature and surface O3 concentration 40 predicted by the GEOS-Chem-XGBoost model were in good agreement with the measurements 41 ( Figure S7 (a)). With respect to the performance of reproducing the sensitivities of O3 to other 42 meteorological parameters such as humidity, cloud fraction, and precipitation, the GEOS-Chem-43 XGBoost model is also better than the GEOS-Chem ( Figure S7 (b)-(d)). After correcting the errors 44 in all O3 predictors, the GEOS-Chem-XGBoost model significantly improves the prediction of 1 surface O3 concentrations in all cities over the SCB compared to the GEOS-Chem ( Figure S8). It 2 shows a bias of 0.5 ± 0.3 ppbv for all O3 measurements in 2019 over the SCB. As a result, the overall 3 GEOS-Chem-XGBoost model performance is acceptable and can support further investigation of 4 the drivers of the unexpected surface O3 enhancements over the SCB in May-June 2020. 5 6 5.1 Separation of meteorological and anthropogenic emissions contributions 7

Attribution
We quantify the surface O3 enhancements in May-June 2020 over the SCB to changes in 8 anthropogenic emissions and meteorological conditions according to equations (6) and (7). 9 Differences between the measured and GEOS-Chem-XGBoost predicted O3 in May-June 2020, as 10 indicated by the shadings in Figure 4 in June (Figure 4 (a)). For the May-June mean contributions averaged over all cities in the SCB, 16 changes in anthropogenic emissions caused 0.9 ± 0.1 ppbv of O3 reduction and changes in 17 meteorology caused 11.1 ± 0.7 ppbv of O3 increase, which correspond to -8.0% and 108% of 18 relative contributions to the total O3 enhancement (10.2 ± 0.8 ppbv) over the SCB in May-June 2020, 19 respectively. As a result, the anthropogenic emissions induced O3 reductions are dominantly 20 overwhelmed by the meteorology induced O3 increases, leading to the unexpected O3 enhancements 21 over the SCB in 2020. 22 We compare the meteorology and anthropogenic emissions induced contributions to the 23 unexpected surface O3 enhancements estimated by the GEOS-Chem-XGBoost model with those by 24 the GEOS-Chem model only (Figure 4 (b)). Both methods show that changes in meteorology 25 contribute significantly to the O3 enhancements, while the absolute magnitudes differ slightly from 26 each other. For example, the anthropogenic emissions induced O3 reduction calculated with the 27 GEOS-Chem model only is 0.94 ppbv, while the value for GEOS-Chem-XGBoost model is 1.36 28 ppbv. By taking the subtraction in equation (5) and the average over all cities, the propagation of 29 systematic model discrepancies that are common to all measurements sites was effectively 30 minimized, which can mitigate the difference in attribution results between the GEOS-Chem and 31 GEOS-Chem-XGBoost methods. However, as demonstrated in Figure S8, model discrepancies may 32 differ between regions and time. Therefore, the GEOS-Chem-XGBoost approach is expected to 33 provide a more accurate and consistent estimate on O3 change attribution than the GEOS-Chem 34 model alone.  Liu 13 et al., 2003). 14 Figure 6 shows the May-June mean differences in vertical velocity, precipitation, temperature, 15 specific humidity, cloud fraction, and PBLH between 2020 and 2019. In May-June 2020, the 16 northwest, central western and southern China experienced anomalous strong droughts 17 (https://quotsoft.net/air/), leading to a significant increase in temperature and decreases in 18 precipitation, specific humidity and cloud fractions compared to the 2019 levels ( Figure 6). These 19 changes in meteorological conditions could enhance the natural emissions of O3 precursors and 20 speed up O3 chemical production. Meanwhile, the SCB basin effect inhabited the ventilation of O3 21 and its precursors, which further enhanced the O3 accumulations over the SCB. As a result, we 22 conclude that the meteorological anomalies combined with the complex basin effect caused the 23 surface O3 enhancements over the SCB in 2020. Although higher PBLH over the SCB in May-June 24 2020 vs. 2019 may reduce surface O3 levels by diluting O3 and its precursors into a larger volume 25 of air, this reduction effect was overwhelmed by the aforementioned enhancement effect. There is 26 no strong evidence for the change in the horizontal transport from other regions ( Figure 5 were not observed in the northwest China such as Xinjiang and Inner Mongolia Provinces, and 32 southern China such as the Pearl River Delta (PRD) region, which is also one of the nine well-33 developed city clusters in China with severe air pollution. This can be partly attributed to low 34 anthropogenic emissions of O3 precursors in northwest China (Zheng et al., 2018); and that strong 35 exchange between the land and sea in the coastal regions driven by the summer monsoon facilitates 36 the ventilation of O3 and its precursors in the PRD region. Furthermore, the meteorology induced 37 O3 enhancements are probably overwhelmed by the anthropogenic emissions induced O3 reductions 38 in the northwest and southern China. 39

Emissions contribution 40
To suppress the spread of coronavirus pandemic 2019 (COVID-19) across China and above, 41 the Chinese government sealed off several cities starting in January 2020; implementing a few 42 measures such as closing local businesses and halting public transportation at an unprecedented 1 scale (Li et al., 2019b;Steinbrecht et al., 2021;. These prevention measures quickly 2 spread nationwide. Although the COVID-19 lockdowns in all cities have been removed before May, 3 there are still restrictions on public transportation, businesses, social activities and industrial 4 manufactures, which could cause domestic anthropogenic emissions reductions in both HCHO and 5 NOx. Furthermore, the MEE continues the mitigation of NOx emissions following the 2018-2020 6 Action Plan on Defending the Blue Sky, and has also implemented The 2020 Action Plan on VOCs 7 Mitigations in 2020. This new Action Plan issues a number of control measures including 8 implementation of stringent VOCs emission standards, replacement of raw and auxiliary materials 9 with low VOCs content, and mitigation of unorganized emissions. Driven by the above factors, the 10 TROPOMI observed tropospheric HCHO and NO2 over China in May-June 2020 vs. 2019 reduced 11 by 2.0 ± 0.3% (averaged for all Chinese cities) and 1.1 ± 0.2%, respectively. Due to the relative 12 short lifetime of both HCHO and NO2 in troposphere, these reductions mostly reflect local emissions 13 changes. These reductions in domestic anthropogenic emissions dominated the significant reduction 14 of summertime MDA8 O3 across China in 2020 vs 2019. 15 We have used the HCHO/NO2 ratios following the method of (Sun et al.  Figure S9). Meanwhile, the O3 chemical sensitivity in 20 May 2020 is similar to that in June, indicating that the O3 variability in May-June 2020 is sensitive 21 to both NOx and VOCs. The recently available Chinese anthropogenic emissions statistic data 22 provided by the MEE show that the anthropogenic VOCs over the SCB has decreased by 5.0% and 23 3.5% in May and June in 2020 relative to the 2019 level, respectively. The anthropogenic NOx in 24 the same period has increased by 1.5% and decreased by 1.7%, respectively . 25 The increase in anthropogenic NOx in May 2020 vs. 2019 is attributed to an increase in NOx 26 emission from power plant sector, which was not affected by the post-lockdown restrictions for 27 suppressing the spread of COVID-19 (Table S3). For the May-June aggregation, the anthropogenic 28 VOCs and NOx over the SCB have decreased by 4.3% and 0.3%, respectively . 29 These independent analyses on anthropogenic emissions explain the different predicted O3 changes 30 due to anthropogenic emissions alone in May (increase) versus June (decrease) in the SCB. 31 In contrast to the widespread reductions in both HCHO and NO2 across the BTH, FWP, and 32 YRD regions, we find notable increases in both HCHO and NO2 in the SCB in May-June 2020 vs. 33 2019 levels. The tropospheric HCHO and NO2 columns averaged over all cities in the SCB region 34 have been increased by (2.8 ± 0.3%) and (5.1 ± 0.5%) in 2020 vs. 2019 levels, respectively. Since 35 both anthropogenic VOCs and NOx emissions in the SCB showed decreasing change rates in May-36 June 2020 vs. 2019, these regional increases in both HCHO and NO2 could thus be attributed to Finally, we concluded that natural emissions enhancements of both NOx and VOCs induced by 43 the unexpected meteorological anomalies could be accounted for the O3 enhancements in May-June 44 2020 over the SCB, and their contributions have been included in the meteorology-driven ozone 1 enhancement as discussed in Section 5.2. In present work, we were not able to determine which 2 specific VOCs species are the most effective for O3 enhancements and cannot quantify the relative 3 contributions of VOCs and NOx enhancements to the O3 enhancements in the SCB. A series of 4 sensitivity studies might be able to address this important issue, but this is beyond the scope of 5 present work. 6 6 Health risks for the O3 enhancements over the SCB 7 Figure 8 presents the total premature mortalities from all non-accidental causes, hypertension, 8 CVD, RD, COPD, and stroke attributable to ambient O3 exposure in all cities over the SCB during 9 May-June in 2019 and 2020. The statistical results for each city in 2019 and 2020 are summarized 10 in Tables S4 and S5, respectively. The surface O3 enhancements over the SCB in May-June 2020 11 vs. 2019 results in dramatically higher health risks. The estimated total premature mortalities from 12 all non-accidental causes due to the surface O3 enhancements in May-June 2020 over the SCB is 13 5455, which is 89.8% higher than that in the same period in 2019 (i.e., 2874). All above O3 induced 14 diseases over the SCB have significant increases in total mortalities in May-June 2020 vs. 2019. 15 The highest health risk among these diseases is from CVD which is 741 in May-June 2019, followed 16 by RD (236), COPD (231), and hypertension (223). This O3 induced health risk rank over the SCB 17 is consistent with those in the YRD, BTH, and PRD in previous studies (Liu et al., 2018;Lu et al., 18 2020;Yin et al., 2017;Wang et al., 2021). In May-June 2020, total mortalities from CVD, RD,19 COPD, hypertension, and stroke over the SCB reached to 1405, 450, 439, 418, and 46, respectively, 20 due to significant O3 enhancements. The change rates for these diseases are 89.6, 90.7, 90.1, 87.4, 21 and 91.7%, respectively. 22 From a whole year view, the estimated total premature mortalities from all non-accidental 23 causes due to surface O3 exposure over the SCB in 2019 and 2020 are 16,772 and 18,301, 24 respectively (Tables S4 and S5). All O3 induced diseases within May-June 2019 account for about 25 ~ 17.0% of those in the whole year 2019, and this percentage reaches up to ~ 30.0% in 2020 ( Figure  26 S10). The total premature mortalities from all non-accidental causes due to surface O3 exposure 27 over the SCB has increased by 1528 in the whole year 2020 vs. 2019 ( Figure S11), which is 40.8% 28 lower than that within May-June 2020 vs. 2019 (i.e., 2581). This indicates that the O3 level over the 29 SCB showed an overall decreasing change rate in all months except May-June in 2020 vs. 2019, 30 which resulted in a decrease (by 1053) in O3 induced diseases in the period. 31 We further investigated the O3 induced diseases in the two most densely populated cities over 32 the SCB (i.e., Chengdu and Chongqing) during May-June in 2019 and 2020. The premature 33 mortalities from all O3 induced diseases in 2020 vs. 2019 in each city are dependent on regional 34 population, surface O3 level, and enhancement level (equation (9)). With the largest populations and 35 highest O3 enhancements, the estimated total premature mortalities in Chengdu and Chongqing 36 accounted for 46.9% of total O3 induced mortalities over the SCB during May-June 2020 ( Figure 8  37 (b)-(c)). Since the O3 levels and enhancement in Chengdu are larger than those in Chongqing, the 38 total O3 induced mortalities in Chengdu are larger even with smaller population. The change rates 39 for all O3 induced diseases are about 75% in Chengdu and 160% in Chongqing during May-June 40 2020 vs. 2019, which are much higher than the enhancement of ozone levels in the two cities 41 (29.9 %). In order to reduce the O3 induced health risk, strident O3 control policies are necessary in 42 densely populated cities. 1