2010-2015 North American methane emissions, sectoral contributions, and trends: a high-resolution inversion of GOSAT satellite observations of atmospheric methane

. We use 2010-2015 GOSAT satellite observations of atmospheric methane columns over North America in a high-resolution inversion of methane emissions, including contributions from different sectors and long-term trends. The inversion involves analytical solution to the Bayesian optimization problem for a Gaussian mixture model (GMM) of the emission ﬁeld with up to 0.5 ◦ × 0.625 ◦ resolution in concentrated source regions. Analytical solution provides a closed-form characterization of the information content from the inversion and facilitates the construction of a large ensemble of solutions exploring 5 the effect of different uncertainties and assumptions. Prior estimates for the inversion include a gridded version of the EPA Inventory of U.S. Greenhouse Gas Emissions and Sinks (GHGI) and the WetCHARTS model ensemble for wetlands. Our best estimate for mean 2010-2015 US anthropogenic emissions is 30.6 (range: 29.4-31.3) 2010-2015 decrease in emissions from offshore oil production.

cover. Observations have a precision of 13 ppb and relative bias of 2 ppb compared to the Total Carbon Column Observing Network (Buchwitz et al., 2015). There is no significant spectral degradation over time (Kuze et al., 2016). Figure 1 shows the 156,110 retrievals over land used to optimize emissions in this study . Each retrieval comes with an estimated retrieval error (11 ppb on average). We use observations over land from January 2010 to December 2015, excluding data above 60N • for which model errors are large (Maasakkers et al., 2019). Most observations (95,365) are over the Contiguous 5 United States (CONUS). The data are spatially sparse but this reflects the observing strategy of repeated measurements at the same locations in the default mode. Thus most observation locations in Figure 1 have a large number of data points to inform temporal variability and trends (Sheng et al., 2018a).  (Etiope, 2015;Kvenvolden and Rogers, 2005) and areal seepage (Kvenvolden and Rogers, 2005;Etiope and Klusman, 2010) as described in Maasakkers et al. (2019).

Forward model
We use the nested version of the GEOS-Chem chemical transport model v11-01 at 0.5 • × 0.625 • resolution over North Amer-5 ica as forward model for the inversion. Earlier versions of this model for methane were described by Wecht et al. (2014) and . The model is driven with MERRA-2 meteorological fields (Bosilovich et al., 2016) from the NASA Global Modeling and Assimilation Office (GMAO). Methane loss from reaction with OH and Cl radicals, soil uptake, and stratospheric oxidation are described in Maasakkers et al. (2019). The simulation is initialized in January 2009 with concentration fields from . Three-hourly boundary conditions at the edges of the nested domain are from the 4 • × seasonally variable background bias likely caused by the extratropical stratosphere (Bader et al., 2016;Saad et al., 2016;Stanevich, 2018). The latitudinal correction term ξ (ppb) follows a quadratic form as in : with θ the latitude in degrees. The seasonal bias is corrected over rolling 8 • latitudinal bands. A sensitivity inversion without the seasonal bias correction is performed as part of the inversion ensemble.

State vector for the inversion and error covariances
Although we could technically carry out the inversion of the GOSAT data at the 0.5 • × 0.625 • resolution of the GEOS-Chem simulation, the data do not have sufficient information to constrain emissions on that grid and doing so would incur large smoothing error (Wecht et al., 2014). We use instead a 600-element Gaussian mixture model (GMM) as described by Turner 25 and  to optimally define the emission patterns that can be usefully constrained by the inversion. Each of the 600 Gaussian functions in the GMM is defined by an emission amplitude, mean location, and spread (standard deviation). These parameters are optimized using a similarity vector on the 0.5 • × 0.625 • grid that takes into account latitude, longitude, and the prior patterns of different source sectors. The state vector x for the inversion with dimension n = 2 × 600 consists of scaling factors adjusting the amplitudes of the Gaussians in the GMM and their 2010-2015 linear trends. This approach allows 30 for effective aggregation of regions with weak or homogeneous emissions while preserving high resolution for concentrated emissions. Each 0.5 • × 0.625 • grid cell is represented by a unique combination of the Gaussians, so that the optimization of x Prior emission error variances are defined for each Gaussian on the basis of its spatial distribution and the contributions from different sectors. Emission errors for individual anthropogenic sectors are estimated using the error curves from Maasakkers et al. (2016). The error standard deviation σ for a given source sector and Gaussian is given by: Where α 0 , k α , and α N are source sector specific error coefficients from , L is the effective spatial The error variances for all sectors contributing to a given Gaussian are added in quadrature to obtain the corresponding diagonal element of the prior error covariance matrix S A . Error variances for a given Gaussian are capped at 50% in the base 15 inversion and we also perform a sensitivity inversion without this cap. The mean relative error standard deviation is 37% in the base inversion. The 50% cap mainly affects Gaussians dominated by wetland emissions. For the 2010-2015 emission trends associated with each Gaussian, the prior estimate is set to zero and the prior error standard deviation is a 5% change per year, in line with uncertainties in trend estimates for North America Bruhwiler et al., 2017;Sheng et al., 2018a;Lan et al., 2019). We also perform sensitivity inversions with changes of 2.5% and 10% per year as prior error standard devia-20 tion. Off-diagonal elements of S A are assumed to be zero because  found no spatial error correlation for the gridded EPA inventory; this may be an underestimate for wetland emissions (Bloom et al., 2017).
Our calculation of S A leads to different error variances for each grid cell. To assess the impact of that choice, we also perform a sensitivity inversion using the mean error variance for all the Gaussians. The base inversion assumes normal errors, 25 but we also perform a sensitivity inversion assuming log-normal emission errors following the Levenberg-Marquardt method grid cell is assumed to be due to errors in emissions, to be corrected by the inversion. After subtracting this mean difference, the residual standard deviation is taken as estimate of the observational error standard deviation including contributions from instrument, representation, and forward model errors. If this estimate is less than the reported instrument error standard devia-tion , we use the latter instead (17% of observations). If it is less than 10 ppb we reset it to 10 ppb (6% of observations). The resulting average observational error standard deviation is 14 ppb. Off-diagonal terms of S O are assumed to be zero for lack of better information, but in fact some transport error correlation would be expected in the forward model.
We account for this error correlation with a regularization term γ in the inversion (Section 2.5).

Inversion procedure
We perform an analytical inversion minimizing the Bayesian cost function J(x) assuming normal errors (Rodgers, 2000): where x is the state vector to be optimized, consisting of 600 Gaussians for which we optimize both scaling factors for mean emissions and absolute linear emission trends, for a total of 1200 state vector elements; x A is the prior state vector; S A 10 is the prior error covariance matrix (Section 2.4); S O is the observational error covariance matrix (Section 2.4); and γ is a regularization factor to account for the lack of non-diagonal terms in S O and hence prevent overfitting. γ plays a similar role as the regularization parameter in Tikhonov methods (Brasseur and Jacob, 2017) and reflects our inability to precisely quantify error statistics in the Bayesian method. Here we find that γ = 0.5 provides the best balance of fitting the prior and observational terms in the cost function, following the L-curve approach of Hansen (1999). The value is higher than γ = 0.05 used in the The GEOS-Chem forward model (y = F (x)) as implemented here is strictly linear in its relationship between methane column concentrations (y) and the state vector of emissions (x). It can be expressed as F (x) = Kx + c where K = ∂y/∂x 20 is the Jacobian matrix and c an initialization constant. This allows the optimal posterior solution x which minimizes the cost function J(x) to be obtained analytically as: with posterior error correlation matrix S: The information content from the inversion can then be obtained from the averaging kernel matrix (A = ∂ x/∂x) which gives the sensitivity of the solution to the true state: 8 https://doi.org/10.5194/acp-2020-915 Preprint. Discussion started: 18 September 2020 c Author(s) 2020. CC BY 4.0 License.
The trace of A gives the degrees of freedom for signal (DOFS), which measures the number of independent pieces of information on the state vector that can be obtained from the inversion. The diagonal elements of A (averaging kernel sensitivities) measure the degree to which the inversion can constrain the true values of the corresponding state vector elements (1 = perfectly, 0 = not at all). We will use these measures of information in our presentation of results.

5
Analytical solution to the inverse problem requires explicit construction of the Jacobian matrix. We perform this construc- The posterior emissions when implemented in GEOS-Chem reduce the mean squared difference with GOSAT by 3.5%.
Overall changes are small because random errors in individual observations are large and because the background is already    The largest decrease is for US wetland emissions, mostly contributed by the Gulf and east coasts ( Figure 2). Such an overestimate in the mean of the WetCHARTS wetland inventory ensemble was previously identified in an inversion of aircraft observations over the Southeast US (Sheng et al., 2018b). It may be related to the low organic carbon content of the soil, the difficulty of distinguishing freshwater and saltwater wetlands, uncertainties in anaerobic CH 4 :CO 2 respiration rates, and the 25 accounting of partial wetland land-cover areas (Holmquist et al., 2018;Lehner and Döll, 2004;Bloom et al., 2017). We also find an overestimate of wetland emissions in (mainly eastern) Canada. The large uncertainty range is driven by the inversion ensemble member without seasonal correction. Based on the root mean square error and spatial correlation, our inversion results are most consistent with the WetCHARTs ensemble members that use GlobCover wetland extent, a q 10 = 2 value for the factor increase in the CH 4 to CO 2 emission ratio per 10 K temperature increase, and global scaling at the low end or middle of Here T −1 is the generalized pseudo-inverse of W i , and A is the averaging kernel matrix (Equation 6). a i = 1 means that the inversion can fully constrain the national total for that emission category, independent of the prior estimate, while a i = 0 means that the inversion provides no information and the estimate cannot depart from the prior. The averaging kernel sensitivities for individual sectors/subsectors in Table 2  and driven by oil/gas production as seen for example in Texas, Oklahoma, and offshore in the Gulf of Mexico. Our national estimates for the emissions from oil and gas production are 3.1 (2.7-3.6) and 5.4 (4.9-5.9) Tg a −1 , respectively, as compared Our scaling factors to the EPA GHGI are for the 2012 emissions as reported by EPA (2016) and used in the inversion as prior estimates. More recently, EPA (2020) updated its methodology for estimating emissions and applied it to a reanalysis of emissions from previous years including 2012. Changes are important for some oil/gas subsectors, as shown in Figure 5. Gas 5 production emissions are lower by 19% because of a downward correction to gathering and boosting emissions. Oil production emissions are 30% lower than previously reported, caused by a correction for wells that were previously double-counted. Our correction factor from the inversion increases oil production emissions by a factor 1.9 (1.7-2.3) and natural gas production emissions by a factor 1.5 (1.4-1.6) relative to the updated 2012 GHGI from EPA (2020). Similarly, emissions from gas processing in the updated GHGI are 55% lower than previously reported, but our inversion finds them to be higher. Our correction 10 factor from the inversion increases gas processing emissions by a factor of 2.9 (2.6-3.1) relative to the updated GHGI.
12 https://doi.org/10.5194/acp-2020-915 Preprint. Discussion started: 18 September 2020 c Author(s) 2020. CC BY 4.0 License.  has seen a large increase in production after 2015 and is currently the largest oil-producing Basin in the US (Zhang et al., 2020).
GOSAT provides little information over Canada and Mexico when it comes to trends. There are signs that oil/gas production emissions in both Alberta (Canada) and offshore in the Gulf of Mexico are decreasing, and the latter may be driven by decreasing oil production (Zhang et al., 2019). plus trend) and much higher than the 7.3 Tg a −1 national total from the latest GHGI (EPA, 2020). Similar to our subsector attribution, Alvarez et al. (2018) find the largest difference with the GHGI for the production subsector (factor 2), which they attribute to the GHGI emissions not accounting for emissions from abnormal operating conditions. While Alvarez et al. (2018) 10 did not distinguish between oil and gas production, our results point at a much larger relative discrepancy with the GHGI for oil production emissions than natural gas production emissions.

Comparison with other evaluations of the EPA inventory
In an inversion of data from two tower networks and one aircraft campaign, Cui et al. (2019)  representing a significant decrease from the prior estimate of 2.3 Tg a −1 . This is due to our large reduction of mean 2010-2015 emissions in the Los Angeles Basin. This reduction may be overestimated because of the coarseness of model CO 2 used in the proxy retrieval, underestimating CO 2 over Los Angeles . Using aircraft measurements, Ren et al. (2019) found 70% higher oil/gas production emissions in the Marcellus Shale (Pennsylvania/West Virginia) in 2015 compared to the 2012 gridded EPA inventory, which they attribute to an increase in production. We find a 2010-2015 increase in emissions for 20 that region, as discussed above, but 2015 emissions are still only 22% higher than the GHGI. Based on surface observations in the Uintah Basin in Utah, Foster et al. (2017) found good agreement with basin-wide emissions from Karion et al. (2013), and found the gridded EPA inventory to be 45% lower after adjusting emissions based on 2015 production data. Most of the emissions in the Uintah Basin are concentrated in one of our grid cells and that cell is optimized individually in our inversion with good constraints (Fig 3), finding 2015 posterior emissions that are 37(12-66)% higher than the prior and no significant 25 linear trend.
Based on aircraft data, Plant et al. (2019) found a factor 2 higher anthropogenic urban methane emissions compared to the gridded EPA inventory over five cities on the US East Coast. We find no such difference but evaluating urban emissions along the East Coast is difficult because of overlap with large wetlands emissions that are themselves highly uncertain. It should be the inversion ensemble), slightly higher than the EPA GHGI estimate of 28.7 (26.4-36.2) Tg a −1 . The difference is mainly from oil and gas production, which we find to be higher by 35% (19-59%) and 22% (11-33%) respectively compared to the GHGI.
The most recent version of the GHGI EPA (2020) revises emissions from oil/gas production and gas processing emissions downward, opposite from our results. Thus we find that the estimate of emissions from oil production by EPA (2020) is lower 15 than our result by a factor of 2.
Our best estimate of CONUS wetland emissions is 10.2 (5.6-11.1) Tg a −1 , representing 24% of total CONUS methane emissions. This is lower than the ensemble mean from the WetCHARTS inventory (14.2 Tg a −1 ) used as prior estimate, and is consistent with previous studies pointing to overestimates in US wetland emissions. More work is needed to understand the 20 underlying processes. We find a similar overestimate in wetland emissions over Eastern Canada. We estimate mean 2010-2015 anthropogenic emissions of 4.5 (4.4-4.7)Tg a −1 for Canada and 6.1 (5.5-6.3) Tg a −1 for Mexico. We find that oil/gas emissions in the IMP (2012) inventory reported to the UNFCCC are too low by 20%, mainly driven by oil production.
We find from the inversion a 2010-2015 increase in US anthropogenic emissions of 0.14 Tg a −1 (0.4 % a −1 ), much lower 25 than previous GOSAT-based estimates but at odds with the latest EPA GHGI that reports a 0.35 Tg a −1 decrease in emissions over that period. Our increase appears to be largely driven by the rapid growth of unconventional oil/gas production in the eastern US.