Interpreting machine learning prediction of fire emissions and comparison with FireMIP process-based models

Annual burned areas in the United States have increased 2-fold during the past decades. With more large fires resulting in more emissions of fine particulate matter, an accurate prediction of fire emissions is critical for quantifying the impacts of fires on air quality, human health, and climate. This study aims to construct a machine learning (ML) model with game-theory interpretation to predict monthly fire emissions over the contiguous US (CONUS) and to understand the controlling factors of fire emissions. The optimized ML model is used to diagnose the process-based models in the Fire Modeling Intercomparison Project (FireMIP) to inform future development. Results show promising performance for the ML model, Community Land Model (CLM), and Joint UK Land Environment Simulator-Interactive Fire And Emission Algorithm For Natural Environments (JULESINFERNO) in reproducing the spatial distributions, seasonality, and interannual variability of fire emissions over the CONUS. Regional analysis shows that only the ML model and CLM simulate the realistic interannual variability of fire emissions for most of the subregions (r > 0.95 for ML and r = 0.14∼ 0.70 for CLM), except for Mediterranean California, where all the models perform poorly (r = 0.74 for ML and r < 0.30 for the FireMIP models). Regarding seasonality, most models capture the peak emission in July over the western US. However, all models except for the ML model fail to reproduce the bimodal peaks in July and October over Mediterranean California, which may be explained by the smaller wind speeds of the atmospheric forcing data during Santa Ana wind events and limitations in model parameterizations for capturing the effects of Santa Ana winds on fire activity. Furthermore, most models struggle to capture the spring peak in emissions in the southeastern US, probably due to underrepresentation of human effects and the influences of winter dryness on fires in the models. As for extreme events, both the ML model and CLM successfully reproduce the frequency map of extreme emission occurrence but overestimate the number of months with extremely large fire emissions. Comparing the fire PM2.5 emissions from the ML model with process-based fire models highlights their strengths and uncertainties for regional analysis and prediction and provides useful insights into future directions for model improvements. Published by Copernicus Publications on behalf of the European Geosciences Union. 3446 S. S.-C. Wang et al.: Interpreting machine learning prediction of fire emissions


Introduction
Large fires have increased across the United States over the past 2 decades, especially in the western US. While the total area burned in 2020 increased by 51 % compared to the 10-year average for 2010-2019, the total number of fires in 2020 is smaller than the 10-year average. This indicates the contribution of larger and more powerful fires to the growing burned areas (NIFC, 2020). Large fires can directly lead to property damage and pose a threat to human lives (Thomas et al., 2017). Meanwhile, fine particulate matter (PM 2.5 , particles with an aerodynamic diameter smaller than and equal to 2.5 µm) emitted from fires not only have negative impacts on human health but also affect climate and ecosystems (Johnston et al., 2012;Ward et al., 2012;Rap et al., 2013;Kaulfus et al., 2017;Liu et al., 2018;Wang et al., 2018;Stowell et al., 2019). Driven by stronger fire heating and with higher injection height, aerosols emitted from large fires can be transported to broader area and stay in the atmosphere longer. Given the increasing trend of fire emissions, fire smoke may become the predominant source of PM 2.5 in the US in the future (Yue et al., 2013;Liu et al., 2016;Ford et al., 2018). Thus, an accurate prediction of fire emissions is imperative for investigating the impacts of historical and future fires on air quality, human health, and climate.
One of the widely used methods for predicting fire emission is process-based fire parameterization. These processbased models generally employ universal functions depicting non-linear relationships between fires and the input variables and apply the same functions to all grid cells in a model (Pechony and Shindell, 2009;Thonicke et al., 2010). In addition, the parameters of the process-based model are usually determined by empirical or statistical functions, assuming that the same parameters apply to all the regions or regions with limited fire observations (Crevoisier et al., 2007;Parisien et al., 2016). Recently, Zou et al. (2019) developed the Region-Specific ecosystem feedback Fire (RES-Fire) model that includes region-and PFT-specific fire parameterizations in subregions over the globe. Their model shows improved spatial distributions and temporal variations in fire activities compared to the CLM fire model. Processbased models are usually included in the dynamic global vegetation models (DGVMs) to simulate fire dynamics, vegetation dynamics, and biogeochemistry driven by atmospheric forcing and socio-economic data (Li et al., 2013;Knorr et al., 2016). Fire emissions, including trace gases and aerosols, are calculated from the simulated fire carbon emissions and the emission factors, with the former computed as the product of the burned area, fuel load, and combustion completeness. The process-based models in DGVM coupled with other components of Earth system models can be used to assess the impacts of environmental factors on fires and the feedback between fire emissions, land processes, and climate (Kloster et al., 2010). In 2014, the Fire Model Intercomparison Project (FireMIP) was initiated to compare nine DGVMs that include fire modules to better understand the performances of the global fire models (Rabin et al., 2017). FireMIP enables comprehensive evaluation and comparison across various process-based models and provides a dataset of long-term fire simulations for regional and global analysis (Li et al., 2019a;Hantson et al., 2020).
Besides process-based fire models, data-driven statistical models are also commonly used to estimate fire activities using relationships between fires and predictor variables. Multiple linear regression (MLR) is a popular simple statistical method used for fire modeling (Spracklen et al., 2009;Morton et al., 2013;Urbieta et al., 2015;Williams et al., 2019). MLR can achieve a good performance, but it fails to capture the non-linear relationships between fires and predictors, and it is sensitive to the collinearity and combinations of predictors (Littell et al., 2009). Unlike MLR, machine learning (ML) is a novel tool for advancing fire modeling, given its strengths in resolving the complex relationships between the target and predictor variables. Different ML approaches have been used to estimate fire occurrence, burned areas, or emissions at various timescales and spatial scales (Cortez and Morais, 2007;Aldersley et al., 2011;Dillon et al., 2011;Birch et al., 2015;Kane et al., 2015;Coffield et al., 2019;Wang and Wang, 2020). Even though ML models generally achieve higher accuracy than simple statistical models, their decision processes are often inscrutable and hence lack interpretability. The development of explainable ML represents major advances for scientific applications beyond predictions (Gunning, 2019;Arrieta et al., 2020). For example, Wang et al. (2021a) used the extreme gradient boosting (XGBoost) algorithm and Shapley additive explanation (SHAP) to predict wildfire-burned area and revealed the relationships between burned areas and predictor variables. As process-based and data-driven models have their own advantages and weaknesses, as listed in Table 1, comparing these models and assessing their uncertainties in historical simulations and future projections are important. Yue et al. (2013) applied an MLR and a parameterization method to estimate burned areas in ecoregions of the western US and found that both models explained ∼ 50 % of the variance in the observed burned areas. Although they compared the burned areas estimated by the two methods and quantified their uncertainties in fire projections, both methods are only driven by meteorology while the effects of fuels and human activities are not considered.
The FireMIP dataset provides long-term simulations of multiple DGVMs with fire modules, allowing comparisons between process-based and data-driven models, with all models considering all the potential factors influencing fires, including climate, weather, vegetation, and human activities. This study aims to develop an ML model with game theory interpretation for fire emission prediction and to understand controls of fire emissions. The ML model and SHAP are then used to reveal the important factors controlling fire emissions and diagnosis of the process-based FireMIP models. The ML model predicts the monthly PM 2.5 emissions from  (CONUS). It uses the XGBoost algorithm and incorporates various predictors, including local and large-scale meteorology, land surface characteristics, and socioeconomic variables, which are common input variables also used by the FireMIP models while some are specifically related to fire activities in the CONUS. We acknowledge that different input variables between the ML and FireMIP models might cause additional uncertainty for comparison. This study aims to construct an ML model that predicts fire emissions over the CONUS and utilizes the ML model and SHAP to reveal the important factors contributing to fire emissions that might not be fully represented in the process-based models. In this context, the ML model and FireMIP models are optimized using different data or predictors at various scales, which enables us to use the ML to diagnose the performance of FireMIP models over the CONUS through the comparisons of their performances and variable importance from the ML model. We evaluate and compare the predicted fire emissions from the ML and FireMIP models against the GFED fire emission product, focusing on spatial distributions, seasonality, and interannual variability over selected regions in the CONUS. Additionally, the ML model and the SHAP importance are used to identify the important drivers of fire emissions in different regions and compare them with the corresponding parameterizations in the process-based models. Lastly, we compare the process-based and ML model performances in simulating extremely large fire emissions, including the spatial distributions of frequency and two case studies.

Data
2.1 Fire-induced PM 2.5 emission data Monthly fire PM 2.5 emission data are obtained from the Global Fire Emissions Database (GFED). GFED version 4 provides monthly burned area at 0.25 • spatial resolution from 1997 to present, based on a combination of the MODIS burned area product with active fire data from the Tropical Rainfall Measuring Mission (TRMM), Visible and Infrared Scanner (VIRS), and Along-Track Scanning Radiometer (ATSR) family of sensors . The GFED fire PM 2.5 emissions are estimated by combining the burned area boosted by small fires  and the emission factors based on Akagi et al. (2011) with a revised version of the Carnegie-Ames-Stanford Approach (CASA) biogeochemical model that estimates fuel loads and combustion completeness for each monthly time step (van der Werf et al., 2017). The emission factors are dependent on the fire types, including savanna, boreal forest, temperate forest, tropical forest, and agriculture (van der Werf et al., 2017). We use the GFED fire PM 2.5 emission as the target variable in the machine learning model development and for model evaluation.
To reduce spatial heterogeneity and help model learning, we apply the inverse distance weighting (IDW) (Bartier and Keller, 1996;Shepard, 1968) to interpolate the monthly grid-3448 S. S.-C. Wang et al.: Interpreting machine learning prediction of fire emissions ded fire PM 2.5 emission at 0.25 • × 0.25 • . The IDW method determines the value at a grid cell as the weighted average of the surrounding values within a search distance, with the weights proportional to the inverse of the distance raised to the power value p. Here we choose a value of 1 for p and a search distance of 35 km for IDW processing. Note that the total fire-emitted PM 2.5 within a search distance after IDW processing is constrained to be the same as the original data. In this study, we only include grids with more than 8 months of fire emissions larger than zero (in a total of 250 months), encompassing 90 % of the total fire emissions and ensuring sufficient data for the XGBoost model training. The interpolated fire emission is normalized based on its 21-year mean and standard deviation for each grid to reduce the skewness and improve data symmetry.

Predictor variables
We develop an empirical model at 0.25 • × 0.25 • grid resolution driven by various predictor variables at a monthly scale from January 2000 to October 2020. Given the fact that the datasets have different spatial resolutions, all the predictor variables are resampled to the spatial resolution of 0.25 • × 0.25 • by linear interpolation. The predictor variables used in the model along with their original spatial and temporal resolutions are included in Table 2. Most variables were also used in Wang et al. (2021a) for developing an ML model of fire-burned area over the contiguous US.

Local meteorology
The same as the local meteorological predictors used in Wang et al. (2021a), we include monthly data of mean surface temperature, relative humidity (RH) at 2 m, daily precipitation, zonal (U ) and meridional (V ) components of wind at 10 m from the North American Regional Reanalysis (NARR) (Mesinger et al., 2006), and 1000 h dead fuel moisture (FM1000), energy release component (ERC), and vapor pressure deficit (VPD) from the gridMET dataset (Abatzoglou and Kolden, 2013;Coffield et al., 2019). Drought is a natural phenomenon that influences fires through ignition efficiency, fuel availability, and fuel moisture. Thus, we include the monthly standardized precipitation evapotranspiration index (SPEI), a multiscalar drought index based on climatic data (Vicente-Serrano et al., 2010). Given that lightning is one of the major ignition sources of fires and makes up approximately 75 % of burned areas in western US (Pyne, 1984;Stephens, 2005), in this study, we add the cloud-toground (CG) lightning flash density from Severe Weather Data Inventory (SWDI) based on the National Lightning Detection Network (NLDN) (Cummins and Murphy, 2009;NOAA, 2006). The daily number of CG lightning flashes is summarized in 0.1 • tiles, and we aggregate the daily data to a monthly scale.

Large-scale meteorological patterns
Large-scale meteorological patterns at a synoptic scale have been linked to large fire events (Crimmins, 2006;Trouet et al., 2009;Zhong et al., 2020;Dong et al., 2021). Furthermore, it has been shown that including predictors of largescale meteorological patterns conducive to wildfires significantly improves the prediction of burned areas over the CONUS (Wang et al., 2021a). Thus, we follow the methods developed by Wang et al. (2021a) using the singular value decomposition (SVD) method to construct predictors representing the synoptic patterns driving fire emission variability. Note that the only difference between Wang et al. (2021a) and this study is that they used wildfire-burned-area data and we use fire emissions to construct the SVDs. Three regions where large fires periodically occur are selected for constructing SVDs: northern California, the southern Rocky Mountains, and the southeastern US, as defined in Wang et al. (2021a). For each region, we calculate the daily mean fire PM 2.5 emissions over the region and compute the day-to-day correlations between the regional mean fire PM 2.5 emissions and the five gridded daily meteorological variables (surface temperature, 2 m RH, U wind and V wind at 850 hPa, and geopotential height at 500 hPa) for all 1 • ×1 • grid cells within the large-scale domain, giving a correlation map for each meteorological variable. The correlation maps are then used to derive the SVD modes representing the large-scale meteorological patterns related to fires. Finally, we compute the monthly standard deviation of the daily SVD time series for the first two SVD modes, representing the month-to-month variations of synoptic fluctuations and atmospheric instability. The detailed methods and discussions about the SVDs are provided in Wang et al. (2021a). Overall, the identified SVDs for the three regions are similar to the SVDs in Wang et al. (2021a) calculated using wildfire-burned areas (Figs. S1-S3 in the Supplement).

Land surface properties
We use the same set of variables in the burned-area model that represent the effects of fuel and land surface states on fire emissions, including evapotranspiration (ET), surface soil moisture, land types, and topography (Wang et al., 2021a). Monthly mean ET, vegetation fraction, and surface soil moisture are obtained from the North American Land Data Assimilation System (NLDAS-2) (Xia et al., 2012). Land cover data of the LAI classification scheme are obtained from the Terra and Aqua combined MODIS Land Cover Climate Modeling Grid (CMG) version 6 data (Friedl, 2015). Since the land cover data are at yearly intervals from 2001 to 2020, we use the land cover data of 2001 for 2000. Topography data of slope and elevation are obtained from Amatulli et al. (2018).
Besides the above-mentioned variables that were also used in Wang et al. (2021a), in this study, we consider the effect  (2017) of fuel load on fire emissions, since fuel load is critical to fire emissions through its controls on fuel consumption and burned areas (Parks et al., 2012;Liu and Wimberly, 2015). As there are limited observations of fuel load, we use LAI to approximate the canopy bulk density, which is an important crown characteristic to predict crown fire spread, and vegetation fraction to represent the existing amount of vegetation (Keane et al., 2005). LAI is taken from MODerate resolution Imaging Spectroradiometer (MODIS) instruments (Myneni et al., 2015), and vegetation fraction is obtained from the NLDAS-2. As LAI may not fully represent the available biomass, we also include fuel load simulated by the Community Land Model (CLM). Monthly fuel load data from 2000 to 2015 are obtained from a simulation by CLM version 5 with biogeochemistry and prognostic crop, driven by atmospheric forcing from GSWP3v1 (Lawrence et al., 2019). The fuel load after 2015 is taken from a simulation under the SSP3 (shared socioeconomic pathways) scenario. CLM fuel load is validated by comparing with the fuel-measured fuel load from the global fuel consumption database (van der Werf et al., 2017;van Leeuwen et al., 2014), as shown in Fig. S4. The CLM-simulated fuel load is generally consistent with the measured fuel load for different vegetation types across the CONUS based on the limited measurements. Additionally, we include normalized fuel load as a predictor to capture the effects of temporal variation in fuel load, as the influence of fuel load on fire emissions is mainly attributed to its spatial variation rather than the temporal variation (Lasslop and Kloster, 2015).

Socioeconomic variables
We use population density and gross domestic product (GDP) per capita to represent human effects on wildfires.
The population density data are obtained from the Gridded Population of the World (GPW V4) data collection for the years 2000, 2010, 2015, and 2020, with a spatial resolution of 30 arcsec (CIESIN, 2017). The populations in other years are linearly interpolated between the abovementioned four years. The GDP per capita is taken from a gridded global dataset for 2000-2015 with a spatial resolution of 5 arcmin (Kummu et al., 2018). For the GDP after 2015, we use the data of 2015.

Process-based fire emission models
The Fire Model Intercomparison Project (FireMIP) includes a set of common fire modeling experiments from nine DGVMs driven by the same forcing data, allowing a better understanding of global fire models (Rabin et al., 2017). The FireMIP dataset provides global gridded burned area fraction and fire emissions, including carbon and 33 species of trace gases and aerosols over 1700-  (Rabin et al., 2017). The nine DGVMs in FireMIP are driven by the CRU-NCEP v5.3.2 atmospheric forcing data with a spatial resolution of 0.5 • and a 6-hourly temporal resolution (Wei et al., 2014;Rabin et al., 2017). Other forcing data, including annual global atmospheric CO 2 concentration, land use and land cover, and population density from 1700 to 2012, are taken from various data sources (Klein Goldewijk et al., 2010;Hurtt et al., 2011;Le Quéré et al., 2014). Monthly cloud-to-ground lightning frequency with a resolution of 0.5 • × 0.5 • over 1901-2012 is calculated based on the observed relationship between present-day lightning and convective available potential energy (CAPE) anomalies (Pfeiffer et al., 2013). Fire emissions in FireMIP are calculated considering the fire carbon emissions and vegetation characteristics based on the plant functional type (PFT) from the FireMIP historical transient control run (SF1). SF1 breaks the simulation period into three phases: the spin-up phase in 1700, the transient phase in 1701-1900, and the transient phase in 1901-2012 (see the detailed descriptions and model settings in Rabin et al., 2017;Li et al., 2019a;Hantsan et al., 2020). In the 1901-2012 transient phase, the models are driven by time-varying atmospheric forcing, CO 2 concentration, land use and land cover change (LULCC), population density, and lightning data. Note that the MC2 and CTEM runs start from 1901 and 1861, while the rest of the models start from 1700. As the spatial resolutions of the FireMIP models are different, the regridded model outputs with 1 • × 1 • resolution obtained from Li et al. (2019b) are used to compare with the GFED data and the ML model.

ML-based approach: an extreme gradient boosting (XGBoost) model
The extreme gradient boosting (XGBoost) is a decision-treebased ensemble machine learning method using the gradientboosting approach (Chen and Guestrin, 2016). The XGBoost model builds multiple decision trees that are added subsequently and learn the errors of the previous tree to reduce the loss and obtain the best prediction. Unlike the Gradient Boosting Machine (GBM) that also uses the gradientboosting approach, XGBoost utilizes a more regularized model formalization to prevent over-fitting and improve the computational efficiency. The formula for the prediction at step t and grid location i can be defined as follows: are the predictions at steps t and t −1, and x i represents the predictor variables. The parameters of the model f t (x i ) are selected by optimizing the objective function that measures how well the model fits the training data: which is composed of the loss function L t and the regularizing term t in each step. L t is defined as l(y iŷ ) and t is defined as γ T + 1 2 λ ω 2 , where γ is the regularization term which penalizes the number of leaves in the tree T and λ is the regularization term which penalizes ω, the weights of different leaves.
We use grid search to choose the set of suitable hyperparameters and achieve the best ML model performance. Grid search is a tuning technique for computing the optimal values of hyperparameters considering a range of numbers with a given increment. The parameter set that yields the best fivefold cross-validation score is selected as the final set of hyper-parameters. The considered hyper-parameters, their search domains, and the final values are denoted in Table S1.
The 10-fold cross-validation (CV) technique is applied to evaluate the model and avoid overfitting. First, we randomly divide the fire emission dataset (2000-2020 over the CONUS) into 10 equally sized splits. Then, we train the model with nine splits of the data and use the trained model to predict fire emissions for the remaining one split. This process is repeated 10 times for each split. Finally, the predictions are evaluated by grids and regions using root mean square error (RMSE), correlation coefficient (R), and the index of agreement (IoA). The IoA represents the ratio of the mean square error and the potential error, and the value closer to 1 indicates better agreement.

Shapley additive explanation (SHAP)
We utilize SHAP to identify the relative importance of the predictor variables. SHAP is a novel approach to resolve and explain variable importance based on game theory (Lundberg and Lee, 2017). Within the scope of game theory, the goal is a prediction for a single observation. Each predictor variable is referred to as a "player" in this game and contributes to the goal ("payout"). For each predictor, the SHAP variable importance measures the marginal contribution considering all possible combinations of the predictor variables. The marginal contribution is calculated by comparing the differences between the model fit f x (S ∪ {i}), including the predictor i and another model fit f x (S) without predictor i. When there is more than one predictor i, the marginal contribution also depends on the interactions with other predictors. Thus, the calculation repeats considering the whole set of the predictors. The final contribution φ i of predictor i is the weighted average of all marginal contributions: where F is the total number of features, S is the subset of predictors from all predictors except for predictor i, and is the weighting factor counting the number of permutations of the subset S. f x (S) is the expected output given the predictors' subset S. f Compared to the commonly used feature importance, such as gain, or split count, SHAP is more consistent and faithful to the model (Lundberg et al., 2019). More importantly, SHAP provides local importance that measures the variable importance for each sample, while most of the feature importance metrics only have global importance that measures variable contributions limited to the entire dataset. The global importance by SHAP is the average of the absolute SHAP values for each predictor, providing an overall picture of the predominant variables controlling fire emissions in the CONUS. The local importance will be used to identify the important predictors for large fire events in the ML model and diagnose the deficiency of the process-based models. Table 3 shows the whole CONUS and regional model performance, including RMSE, IoA, and correlation. The model performs well at the grid level over the CONUS, with an RMSE of 0.16 g m −2 and an IoA of 0.84. Figure 1a shows the map of correlation between the observed and predicted monthly fire emission time series for each grid over the CONUS. Overall, the results indicate the ML model can reproduce the interannual variability of fire emissions at 0.25 • resolution over the CONUS, with a mean correlation of 0.58 and more than 70 % of the grids having correlations larger than 0.4. To better assess model performance in different regions, Table 3 summarizes the model performance for several selected regions: (1) western forest area, (2) Mediterranean California, (3) the southwestern US, and (4) the southeast-ern US (color boxes in Fig. 1a). The regions where fires frequently occur are selected by the similarity of ecoregions, vegetation types, and fire regimes. Figure 1b-e show the time series of observed and predicted fire PM 2.5 emissions averaged over several regions. Generally, the ML model reproduces the interannual variability of fire emissions for the selected regions (r = 0.84-0.98). Among these regions, Mediterranean California has the smallest correlation coefficient and largest RMSE compared to other regions, which can be explained by the fact that fires in this region interact with multiple factors, including human activity, complex terrain, and Santa Ana winds (Syphard et al., 2008;Yue et al., 2014). The interactions between fires and these factors pose uncertainties and challenges in fire prediction over this region. It is also worth noting that the ML model captures the large fire events in September 2020 in Oregon and California but underestimates the peak values by ∼ 30 % (Fig. 1b  and c). In addition, we also test the ML model's ability to provide accurate predictions on unseen data (i.e., generalization) by using data from 2000 to 2019 as a training set and data from 2020 as a testing set. As shown in Fig. S5, the ML model can reproduce the spatial patterns of fire emissions well but underestimates the emissions of the peak in September 2020. The results are within our expectations because the ML model generally fails to make accurate predictions for the data outside of the training domain or has large uncertainties in extrapolation (Tsubaki and Mizoguchi, 2020;Hooker, 2004). Since 2020 features the largest fire emissions in the study period, we conducted another test using 2000-2017 and 2019-2020 to train the ML model and test on the data of 2018. We selected 2018 because 2018 had the largest fires on record before 2020. The ML successfully reproduces the temporal variability of fire emissions (r = 0.92) and captures the peak in August 2018, as well as the spatial distributions of fire emissions (r = 0.52).

XGBoost model performance and variable importance
To improve understanding of the ML prediction, we utilize the SHAP method to quantify the contributions of each predictor variable to the prediction and identify the key contributing factors of fire PM 2.5 emission. SHAP importance is chosen because it provides not only global importance but also local importance that helps understand which variables have larger contributions to specific events or regions. Here, we first demonstrate the global importance that considers all the samples. Figure 2 shows the 20 most important variables for the model ranked by the absolute mean SHAP values. The SHAP value for a feature indicates its contribution to the prediction, so larger absolute mean SHAP values indicate larger contributions to the fire emissions. Among the top 10 variables, seven of them are local meteorological variables, indicating local meteorology is the predominant control of fire emissions, as these variables control fire activity directly (Liu and Wimberly, 2015;Wang et al., 2021a). Besides local meteorology, the predictors of large-scale meteorology (SVD1_SElag2 and SVD2_SElag2) are identified as the 8th and 10th important variables, show-ing that meteorology is important not only at the local scale but also at the synoptic scale (Trouet et al., 2009;Pollina et al., 2013;Dong et al., 2021). Finally, in addition to meteorology, fuel load is identified as the fifth important variable in the model, as fuel load affects emission through controlling burned area and fuel consumption (Seiler and Crutzen, 1980). Considering the important variables in different regions, the selected regions in the western US (western forest area, Mediterranean California, and the southwestern US) generally share the common top 10 variables (Fig. S6). Over the western US, predictors controlling fuel dryness and fuel amount, including RH, fuel moisture (FM1000), ERC, vegetation fraction, and fuel load, contribute more to fire emissions. On the other hand, large-scale meteorological patterns (SVDs_SElag2) are more important for fire emissions in the southeastern US.
As the dominant drivers differ for different temporal scales, we aggregate the monthly SHAP values to obtain annual and seasonal time series of SHAP values for each variable. The annual and seasonal time series are the averaged SHAP values over the study period for each year and month, respectively. Figure S7 shows the mean |SHAP| values at seasonal and interannual timescales for the CONUS. Considering both the mean |SHAP| and larger correlations (r > 0.5) between the annual and seasonal time series of SHAP and mean fire emissions, temperature, VPD, RH, and ERC are the dominant variables controlling the seasonal variation in fire emissions. These factors have relatively stronger seasonality than other variables (e.g., VPD is usually higher in the summer). On the other hand, large-scale circulation patterns, including SVD1_SElag2, SVD2_SElag2, and SVD1_RM, are important variables controlling both the seasonal and interannual variability of fire emissions, while SVD2_RM and SVD2_NCA mainly control interannual variability. Some identified large-scale meteorology has significant seasonality (e.g., SVD1_SElag2 and SVD2_SElag2 are predominant in spring and SVD1_RM is strongest in summer), and most of them have interannual variability, as shown in Fig. 8. Overall, the SHAP analysis shows different dominant predictors for fire emissions at various timescales.

General comparison between GFED, ML, and FireMIP models
This section compares the performance of the ML and FireMIP models benchmarked against observations from GFED, and the evaluations are based on spatial distributions, seasonality, and interannual variability of fire PM 2.5 emissions. Since the spatial resolutions of the GFED data, ML models, and FireMIP models are different, they are all regridded to 1 • × 1 • using bilinear interpolation. Note that the simulation period of FireMIP models ends in 2012, so we use the overlapping period of 2000-2012 for comparison and exclude the MC2 model because its simulation ends in 2008.    (Table 4). The overestimations in some models may be explained by the sensitivities of fire emissions to individual meteorologi-  cal variables. Figure 4 shows the slopes for the dependence of annual mean fire PM 2.5 emissions on annual mean RH from the CRUNCEP atmospheric forcing data for GFED and the 10 models based on linear regression. Since the ML model uses NARR meteorology as predictors, we also include sensitivities of the fire emissions predicted by the ML model to the NARR meteorology (Fig. 4b). Almost all models capture the negative dependence of PM 2.5 emissions on RH over the western US (r = −0.06 ∼ 0.84), but the sensitivities in the models are much stronger (steeper negative slope) than in GFED. For temperature, positive sensitivity is shown over the western US in GFED (Fig. 5), with the largest slope in northern California. The sensitivities to temperature in models agree with the observed sensitivities (r = −0.06 ∼ 0.64), but some models show much stronger sensitivities over the western, central, and southeastern US. Generally, the spatial distributions of the long-term mean fire emissions shown in Fig. 3 match well with the spatial distributions of sensitivities to RH or temperature, suggesting an important role of the sensitivities in the model biases of predicting fire emissions. However, the correspondence of large fire emissions to the sensitivities to RH or temperature shows regional differences. For instance, in the western US, the stronger sensitivities to both RH and temperature correspond to the overestimations in this region for most models, including the ML model, CLM, CTEM, JSBACH, JULES, LPJ-SIM, and LPJ-SPI (Figs. 4 and 5). On the other hand, over the central US, larger PM 2.5 emissions simulated by CTEM and JSBACH only correspond to stronger sensitivity to temperature (Fig. 5). Similar to the central US, in the southeastern US, the overestimations in CLM and JULES only correlate with stronger sensitivity to temperature (Fig. 5). Regional differences in the correspondences between the predicted fire emissions and their sensitivity to meteorology can be explained by several factors. For the western US, the overestimations of fire emissions correspond to stronger sensitivities to both RH and temperature, given that fire activities are sensitive to fuel aridity that is controlled by temperature and fuel moisture (Abatzoglou and Williams, 2016;Holden et al., 2018). As for the southeastern US, fuels in this region typically burn at higher RH, and the interannual RH variation (standard deviation) is smaller (Balch et al., 2017;Brey et al., 2018). With higher RH values and less variation in RH, the fire emissions in the southeastern US show weaker sensitivity to RH than to temperature in observation (Table S2). The above analysis shows that the overestimation of fire emissions in the models may be attributed to the stronger sensitivities to meteorology. However, fire activities are controlled by meteorology and other factors such as vegetation and humans, so the analysis of fire emission sensitivity to meteorology only provides a potential explanation of the overestimation of fire emissions in the models (Forkel et al., 2019).

Seasonality and interannual variability over the CONUS
In addition to evaluating spatial distributions, it is also important to compare the models' ability to reproduce the temporal variability of fire emissions. As the models may systematically over-or underestimate fire emissions, we normalize the emissions by the mean and standard deviation and focus only   on its temporal variability. Figure 6a shows the seasonality of normalized fire PM 2.5 emission over the CONUS. Most models capture the seasonality of fire emission successfully (r > 0.85), except LPJ-SIM, which simulates peak emission in August-October (r = 0.65). Among the models, the ML model has the highest correlation coefficient between prediction and observation from GFED (r = 0.98) and successfully reproduces the peak in August. The seasonal peaks simulated by the FireMIP models are broader and flatter than the peak in GFED, with an early peak in June-July continuing to September (Fig. 6a).
In terms of interannual variability (Fig. 6b), the ML model, CLM, and JULES perform better than other models, with larger correlation coefficient between simulated and observed fire PM 2.5 emissions (r = 0.87, 0.71, and 0.55 for ML, CLM, and JULES, respectively; Table 4). Other models have relatively poor performance in capturing the interannual variability. The interannual variability of fire emissions shows several peaks in 2002, 2007, and 2012 (black line in Fig. 6b), when the western US contributes 76 % of the total emissions to the peaks in these years. Almost all models except ORCHIDEE capture the peak in 2012. However, most models miss the peaks in 2002 and 2007. Among all models, the LPJ-Glob model simulates the peaks in the two years, while ML, JULES, and CLM only capture the largest emission in 2007 (Fig. 6b).

Seasonality and interannual variability by regions
As the temporal variability of fire activities varies by region, we compare the performance between GFED and the ML and FireMIP models by the regions defined in Sec. 4.1. Figure 7 shows the seasonality and interannual variability of normalized fire PM 2.5 emission over western forest area, Mediterranean California, the southwestern US, and the southeastern US. All models generally capture the seasonality of the western forest area peaking in summer, with correlation coefficients larger than 0.8 (Table 4). Even though the FireMIP models generally reproduce the peaks in summer, the predicted peaks are broad and flat, indicating a relatively longer fire season starting in June and ending in September (Fig. 7a). When looking at the interannual variability, we find that the ML model has the best performance with a correlation coefficient of 0.93, and it successfully captures the largest fire emission in 2007. CLM, JULES, and LGJ-Glob perform better than the rest of the models (r = 0.70, 0.60, and 0.51 for CLM, JULES, and LPJ-Glob, respectively; Table 4), but all of them still miss the peaks in 2007 and overestimate fire emissions in 2001 and 2003 (Fig. 7b). The emission peak in 2007 is mainly attributed to the large fires in Idaho, which were associated with synoptic weather patterns characterized by positive geopotential height and temperature anomalies over the Pacific Coast and western US (Zhong et al., 2020). Consistent with prior findings, SHAP importance shows that in the ML model SVD predictors (SVD_NCA and SVD_RM in July and August 2007, Fig. 8a) are the dominant factors of fire emissions in 2007 (contribute 27 % and 28 % for July and August 2007, respectively), which are characterized by high pressure, low RH, and northeasterly winds over the western US (Figs. S1 and S2). Thus, the underestimation of peak emission in 2007 may be explained by the fact that the influences of large-scale meteorology on fire activity are not fully considered in the FireMIP models, which are point models driven only by local atmospheric forcing.
In Mediterranean California, the seasonality of fire emissions shows a bimodal pattern, peaking in August and October. The peak in October is mainly due to the extremely large fires associated with Santa Ana winds in 2003(Keeley et al., 2009Yue et al., 2014). The ML model simulates a flatter peak from July to October, while all the FireMIP models except ORCHIDEE capture the first emission peak in summer but fail to simulate the large fire emission in October (Fig. 7c). The underestimation associated with the Santa Ana winds is also shown in the interannual time series in Fig. 7d (15 % and18 % for October 2003 and2007, respectively) (Fig. 8b). The results indicate that the ML model captures the effect of synoptic weather patterns on fire activity by including the SVD predictors. Even though the wind speed is included in simulating fire spread in the FireMIP models, the spatial resolutions of the models and/or the atmospheric forcing data may not be fine enough to resolve the strengthened offshore winds through the complex terrain, and subsequently, they may not capture the effects of Santa Ana winds on fires well. As shown in Fig. S8, the wind speeds from NARR are significantly larger than from CRUNCEP for the strong wind days (daily wind speed  (Table S3). The results indicate the lower wind speeds in the CRUNCEP atmospheric forcing used in FireMIP may partially explain the model biases for the events associated with Santa Ana winds. Besides the abovementioned shortfall, all the models have problems reproducing the interannual variability of the fire emissions over Mediterranean California, with very low correlations (r < 0.25) for the FireMIP models and a relatively low correlation (r = 0.72) for the ML model (Table 4; Fig. 7d). The poor performance for this region may be due to the complex relations between fires and multiple factors, including meteorology, complex terrain, fuel, and human, which may not be fully represented in the models (Mann et al., 2016;Radeloff et al., 2018). Both the ML model and LPJ-SIM successfully reproduce the seasonality of fire emission in the southwestern US peaking in June (r = 0.99 and 0.94 for ML and LPJ-SIM, respectively), while other models simulate relatively smooth seasonality ( Fig. 7e and Table 4). The ML model, LPJ-SIM, and ORCHIDEE have better performance for the interannual variability, with correlation coefficients of 0.95, 0.40, and 0.45, respectively (Table 4). However, most FireMIP models show larger variability in fire emissions than the GFED, and they all fail to capture the extremely large fire emission in 2011 (Fig. 7f). The peak fire emission in 2011 over the southwestern US was caused by extremely low atmospheric moisture along with moderately high temperature, leading to record-breaking VPD and wildfire activities (Williams et al., 2015). To explain why the FireMIP models fail to capture the peak of 2011, we compare the VPD calculated from CRUN-CEP data and the VPD data from gridMET used in the ML model. As Fig. S9 shows, CRUNCEP shows smaller positive anomalies of VPD over the southwestern US in summer 2011, while gridMET data demonstrate a significantly larger VPD anomaly. The biases in CRUNCEP data may partially explain the underestimations in all FireMIP models.
For the southeastern US, the seasonal cycle of fire PM 2.5 emissions displays a bimodal pattern, peaking in spring (March-April) and fall (September and October) (Fig. 7g). Most models fail to reproduce the bimodal fire emissions, but the ML model, LPJ-SIM, and LPJ-SPI can capture the bimodal pattern. Although LPJ-SIM and LPJ-SPI predict the bimodal peaks, the first peak simulated by LPJ-SIM shows a 1-month delay, and the second peak simulated by LPJ-SIM and LPJ-SPI is 1 month early and 1 month late, respectively (Fig. 7g). In addition, the ML model, CLM, and JSBACH reproduce the interannual variability of fire PM 2.5 emissions relatively well (r = 0.96, 0.57, and 0.72 for the ML model, CLM, and JSBACH, respectively) ( Table 4 and Fig. 7h). Interestingly, CLM and JSBACH can capture several peaks in 2007, 2010, and 2011, but they do not simulate seasonality correctly, which may be explained by the fact that the underestimation in spring is compensated for by the overestimation in summer related to abnormal dryness or drought.

Performance in modeling extreme events
Fire activity in the US is becoming more hazardous, particularly over the western US, due to more frequent hotter and drier conditions as climate continues to warm (Williams et al., 2019). Thus, it is necessary to assess whether the ML model and process-based models can capture the extreme events in terms of their magnitude, frequency, timing, and location, which is essential to future projection and adaptation. As CLM performs relatively well among the FireMIP models, we select CLM for comparison with the ML model at 1 • × 1 • resolution, focusing on the spatial patterns of extreme event frequency and two case studies with extremely large fire emissions. Figure 9 shows the frequency maps of months with large fire emissions during 2000-2012 for GFED, the ML model, and CLM. Large fire emission is defined as monthly fire PM 2.5 emissions greater than the 95th percentile of fire PM 2.5 emission considering all the grids over the CONUS in 2000-2012. GFED shows hotspots with a higher frequency over northern California, the Pacific Northwest, and the southeastern US, with total counts ranging from 15 to 105 (Fig. 9a). The ML model captures the spatial patterns (r = 0.74), but it overestimates the number of months by a factor of 2 to 3 compared to GFED, especially over the western US (Fig. 9b). The spatial patterns of large fire emission occurrence simulated by CLM are generally consistent with the observed distribution by GFED (r = 0.35). However, it overestimates the frequency, particularly over Idaho and the northeastern US, and simulates more significant numbers of months with extreme events over large spatial extents, maybe due to its coarse spatial resolution (Fig. 9c).

Case studies
To evaluate how well the models simulate the large fire emissions, we compare model performance for the two recent cases reported to be the largest fire events during 2000-2012, including the fires in the southern US in 2011 and western US in 2012. During 2011, a severe drought leading to large wildfires was observed over the southern US, including Arizona, New Mexico, and Texas (NOAA, 2012;Wang et al., 2015). Figure 10 shows the maps of annual mean fire PM 2.5 emissions over the southern US from GFED, the ML model, and CLM. GFED shows the largest fire emissions close to the border of Arizona and New Mexico in conjunction with other small hotspots over New Mexico, Texas, and Louisiana (Fig. 10a). The ML model overall reproduces the spatial distributions of the fire emissions (r = 0.96) and captures the largest fire emission in Arizona and New Mexico in 2011 (Fig. 10b). However, CLM does not capture the hotspots observed in GFED over Arizona and New Mexico but simulates larger fire emissions in Louisiana instead (Fig. 10c). In terms of the time series, the ML model reproduces the temporal variability of fire emissions and successfully captures the peak of total fire PM 2.5 emissions in June 2011 (r = 0.98; Fig. 10d and e). Although CLM simulates the peak in June, it overestimates fire emissions in the following months by a factor of 4 (r = 0.52; Fig. 10d and e).
In 2012, the western US experienced several major wildfires (NOAA, 2013). The warm and dry conditions led to large wildfires in California, Oregon, New Mexico, and Colorado (Fig. 11). Both the ML model and CLM capture the hotspots with large fire emissions (Fig. 11b and c) and have correlation coefficients of 0.56 and 0.37, respectively. However, the ML model tends to overestimate fire emissions, especially in areas surrounding the grids with extremely large fire emissions (Fig. 11b). CLM misses some large fire emissions in Colorado and New Mexico and underestimates the larger fire emissions in several hotspots (Fig. 11a), which may be explained by its coarse resolution. The time series of normalized fire PM 2.5 emissions in 2012 show one peak in August. The ML model captures the peak and presents a high correlation between the simulated and observed normalized and total fire PM 2.5 emissions (r = 0.98). CLM captures the peak in August but overestimates emissions in September and October (r = 0.84; Fig. 11d and e). To test model generalization, we train the model using data of 2000-2009 and 2013-2020 and test on 2010-2012 and compare the ML performance with CLM (Figs. S10-S11). The ML model performance is as good as the 10-fold cross-validation, demonstrating that the ML model performs well on predicting unseen data.

Discussion and conclusions
This study provides the first assessment to evaluate the performance of data-driven and process-based models in predicting fire PM 2.5 emissions over the CONUS. We first demonstrate that the developed ML model performs well in predicting monthly fire PM 2.5 emissions nationwide at grid cells of 0.25 • × 0.25 • resolution from 2000 to 2020, with an RMSE of 0.16 g m −2 and IoA of 0.84. The ML model outperforms prior statistical models predicting fire activities at similar spatial and temporal scales (Carvalho et al., 2008;Bedia et al., 2014). Considering the performance at a regional scale, the ML model reproduces the interannual variability of fire emissions for the selected regions, with correlation coefficients ranging from 0.84 to 0.98. Therefore, the ML model has a promising performance in predicting fire emission over the CONUS at a relatively fine spatial resolution. Compared to the wildfire-burned-area model in Wang et al. (2021a), the fire emission model in this study shows slight degradations in capturing the interannual variability of fire emission at grid level (e.g., percentage of grids with correlations larger than 0.4). This may be explained by the fact that the fire emission model may not effectively resolve the  relationships between fires and predictors when more grids with less fire occurrence are included (i.e., more zeros or unburned grids) without reliable information about ignition. As a side note, both burned area and emission ML models have relatively poor performance over Mediterranean California, indicating the challenges in modeling fire activities in this region where the terrain and land use are complex. The SHAP variable importance shows that meteorology at both local and synoptic scales as well as fuel loads are important variables controlling fire emissions over the CONUS. Regional analysis of predictors indicates that fuel dryness such as fuel moisture and energy release component (ERC) and fuel load are important for predicting fire emissions in the western US, while large-scale meteorological patterns (SVDs_SElag2) contribute more to fire emissions in southeastern US.
We then compare the simulated fire PM 2.5 emissions from the ML model and FireMIP models against GFED from 2000 to 2012 at the spatial resolution of 1 • × 1 • . The ML model, CLM, and JULES reproduce the spatial distribution more reasonably than the rest of the FireMIP models (r = 0.83, 0.52, and 0.40 for the ML, CLM, and JULES, respectively). Both CLM and JULES simulate more fire PM 2.5 emissions over the southeastern US, which can be explained by several reasons. First, it has been shown that the satelliteobserved burned areas in the southeastern US are much smaller than the burned areas estimated from the groundbased fire records, which might have resulted from the small prescribed and agricultural fires (Hu et al., 2016;Nowell et al., 2018). In addition, large differences exist among different satellite-estimated fire PM 2.5 emissions in the southeastern US (Li et al., 2019a). As a consequence, these studies highlighted uncertainties about the GFED-estimated burned area and emission over the southeastern US. Second, cropland fires are one of the predominant fire types in this region. Among the FireMIP models, CLM is the only model that simulates cropland fires (Li et al., 2013). For JULES, even though it does not simulate cropland fires, it treats crop- lands as natural grasslands. The emission factors of grasslands and croplands used in the FireMIP models are larger than in GFED4s, thereby causing larger fire PM 2.5 emissions in the southeastern US in CLM and JULES (van der Werf et al., 2017;Li et al., 2019a). Furthermore, Li et al. (2019a) noted that CLM4.5 simulates higher fuel loads in croplands than the CASA model used by GFED4s, leading to higher fire carbon emissions estimated by CLM than by GFED. It is worth noting that the ML model incorporates fuel load simulated by CLM4.5, but it predicts fire emissions closer to GFED4s, indicating a smaller sensitivity of fire emission to fuel load in the ML model. Lastly, the overestimation of fire PM 2.5 emissions can also be explained by the sensitivity to meteorology. The spatial distributions of the long-term mean fire emissions shown in Fig. 3 correlate with the spatial distributions of sensitivities to RH and/or temperature, with regional differences. For the western US, large fire emissions are associated with stronger sensitivities to both RH and temperature in the ML and most FireMIP models. For the central and southeastern US, overestimation of fire PM 2.5 emissions only corresponds to stronger sensitivity to temperature in some FireMIP models. Besides comparing model performance aggregated over the CONUS, we analyze the model performance for several regions, including the western forest area, Mediterranean California, the southwestern US, and the southeastern US. For the western forest area, the ML model performs well in capturing both seasonality and interannual variability of fire PM 2.5 emissions, with correlation coefficients of 0.98 and 0.96, respectively. In contrast, the FireMIP models generally reproduce the seasonality well but do not simulate the interannual variability well, especially underestimating the peak in 2007, which related to large-scale meteorological patterns favorable for fires in the Pacific Northwest (Zhong et al., 2020). For Mediterranean California, all FireMIP models only capture the first peak in August but fail to simulate the second peak in October, which is caused by large fires related to Santa Ana winds in 2003 and 2007. Such lack of peak emission is also shown in the interannual variability, as all FireMIP models show limited ability to simulate the peaks in these two years. By contrast, the ML model successfully predicts the bimodal seasonality and the large fire emissions related to the Santa Ana winds in 2003 and 2007. The underestimation of the peak in the FireMIP models may be attributed to the underrepresentation of the effects of large-scale meteorology in the two regions, as the ML model and SHAP importance show that SVD predictors have larger contributions to the fire emissions in both events. The results of the two regions in the western US suggest that fire parameterization in the FireMIP models could be improved by including the effects of regional and large-scale meteorology (e.g., Santa Ana winds) on fire activity (Yue et al., 2014). Modeling the effect of Santa Ana winds on wildfires may be particularly challenging as the offshore Santa Ana winds exhibit variability related to both synoptic-scale pressure anomaly over the Great Basin and local thermodynamic forcing associated with strong desert-ocean temperature gradient (Hughes and Hall, 2010).
As for the southwestern US, the ML model and LPJ-SIM estimate the peak in June (r = 0.99 and 0.94 for ML and LPJ-SIM, respectively), which highly agrees with the GFED observation. Interestingly, most FireMIP models fail to capture the extremely large fire emission in the 2011 summer mainly due to the low biases of VPD anomalies in CRUN-CEP (Tang et al., 2017). Unlike the southwestern US, the seasonality of the southeastern US has peaks in March-April and September-October. The two peaks of fire emissions correspond to wildfires (March-April and September), cropland fires (February-March and August-October), and prescribed fires (February-April and October) that include burnings for pest controls and land cleaning (Knapp et al., 2009;Lin et al., 2014). Most models fail to reproduce the bimodal fire emissions, but the ML model, LPJ-SIM, and LPJ-SPI can capture the bimodal pattern. Even though the seasonality of fires over this region is not simulated accurately, the CLM and JSBACH reproduce the interannual variability of fire PM 2.5 emissions and predict the peaks well. The FireMIP models' shortfall in reproducing the bimodal seasonality can be explained by two reasons. First, the relationships between humans and fire spread implemented in the process-based models may not be realistic compared to the observed relationships. Parisien et al. (2016) demonstrated the large spatial variability of human impacts on burned areas in North America, which is not well represented in the FireMIP models (Li et al., 2019a). Second, drier conditions in winter would promote fires in springtime (Wear and Greis, 2013;Wang et al., 2021a), which may not be directly considered in the FireMIP models but are incorporated as SVD predictors in the ML model. Overall, the representations of the effects of human and large-scale meteorology on fires may explain why the models simulate the seasonality incorrectly in the southeastern US. In addition to the comparison of general model performance, we also compare the ability of the data-driven and processed-based models in predicting extremely large fire emissions. Both ML and CLM models reproduce the spatial pattern of extreme fire events and reasonably simulate the historical events of large fires in the southwestern and western US.
It is known that different fire emission inventories have their uncertainties, and prior studies have compared fire emission inventories over the globe or the CONUS (Urbanski et al., 2018;Liu et al., 2020). The GFED fire emissions used in this study are known to underestimate the fire emission peak in springtime over the southeastern US, which may be explained by the fact that other products such as FINN or QFED capture more small fire activity compared to the GFED approach (Koplitz et al., 2018;Carter et al., 2020). Although FINN can capture more small fires, it underestimates the intensity of large fires for some cases, which has been attributed to the cloud coverage on daily-scale detection (Paton-Walsh et al., 2012). QFED and GFAS, which estimate emissions using fire radiative power (FRP) from satellites, are also more sensitive to small fires than GFED. However, QFED tends to estimate much larger emissions than other products, which can be explained by the fact that the emission coefficients used to obtain emissions are constrained by MODIS AOD and the uncertainties within FRP (Pan et al., 2020). Despite the known discrepancy between GFED and other data products, the GFED data still show bimodal peaks in spring and fall over the southeastern US, while most FireMIP models fail to reproduce the first peak (Fig. 7g). For the western US, GFED and FINN are generally consistent regarding the magnitude and variability of fire emissions (Urbanski et al., 2018). As stated above, different fire emission inventories have uncertainties. Future works are required to include other fire emission datasets for model evaluation.
To summarize, we utilize the ML model with SHAP importance to diagnose the fire emissions simulated by processbased models and attributed model biases to several factors. First, the sensitivities of fire emissions to meteorology in the models are stronger than those observed, leading to overestimations. Second, the large-scale meteorological patterns conducive to fires are not fully considered in the processbased models, which are important contributors of large fire emissions in the western US and southeastern US. Third, the spatial resolutions of models and/or the atmospheric forcing they used may be too coarse to resolve the effects of regional weather phenomena such as Santa Ana winds. Fourth, biases in the atmospheric forcing data may result in biases of fire emission predictions. Last but not least, human activities are a critical component shaping fire regimes, but the human effects on fire activities in the FireMIP models may not reflect the human-fire relationships in the real world. This is also an issue in the ML model as the human-related predictors in the ML model may be too simple to represent the human influences. The underrepresentation of human effects in both types of models may cause additional uncertainties in projecting future fire activities and their impacts on climate. By training the ML model using the GFED emissions, the ML model is able to better explain fire emissions in the US, which makes it a useful tool for diagnosing processes or relationships that may be missing or not well represented in the process-based models to guide future development for improving their performance. Besides its use in diagnosing process-based models, the ML model with SHAP provides a different and novel approach to simulate fire emissions more accurately and identify the important predictor variables. While the ML model generally has higher accuracy than the FireMIP models, the feedbacks between fire emissions and climate are not included, which could potentially affect the reliability of ML-based models in fire emission prediction under future climate change scenarios (Zou et al., 2020). Lastly, due to limited training data, the ML model cannot predict fires in regions with longer fire return intervals, posing additional uncertainties in their use for making future projections. Data availability. The ML prediction and predictor dataset used in this study are publicly accessible online at https://doi.org/10.5281/zenodo.5076646 (Wang et al., 2021b).
Author contributions. SSCW, YZ, YQ, and LRL conceived the research ideas. SSCW wrote the initial draft of the paper and performed analyses and model development. All authors contributed to the interpretation of the results and the preparation of the paper.
Competing interests. At least one of the (co-)authors is a member of the editorial board of Atmospheric Chemistry and Physics. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.

Disclaimer.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The views expressed in this document are solely those of the SEARCH Center and do not necessarily reflect those of the Agency. The EPA does not endorse any products or commercial services mentioned in this publication.

Special issue statement.
This article is part of the special issue "The role of fire in the Earth system: understanding interactions with the land, atmosphere, and society (ESD/ACP/BG/GMD/NHESS inter-journal SI)". It is a result of the EGU General Assembly 2020, 3-8 May 2020.
Acknowledgements. This research was performed at PNNL and funded under assistance agreement no. RD835871 by the U.S. Environmental Protection Agency to Yale University through the SEARCH (Solutions for Energy, AiR, Climate, and Health) Center. It has not been formally reviewed by the EPA. We thank the two anonymous reviewers and Xiaohong Liu for helpful comments on the paper. Financial support. This research has been supported by the U.S. Environmental Protection Agency (grant no. RD835871).
Review statement. This paper was edited by Xiaohong Liu and reviewed by two anonymous referees.