Inverse models use observations of a system (observation vector) to quantify the variables driving that system (state vector) by statistical optimization. When the observation vector is large, such as with satellite data, selecting a suitable dimension for the state vector is a challenge. A state vector that is too large cannot be effectively constrained by the observations, leading to smoothing error. However, reducing the dimension of the state vector leads to aggregation error as prior relationships between state vector elements are imposed rather than optimized. Here we present a method for quantifying aggregation and smoothing errors as a function of state vector dimension, so that a suitable dimension can be selected by minimizing the combined error. Reducing the state vector within the aggregation error constraints can have the added advantage of enabling analytical solution to the inverse problem with full error characterization. We compare three methods for reducing the dimension of the state vector from its native resolution: (1) merging adjacent elements (grid coarsening), (2) clustering with principal component analysis (PCA), and (3) applying a Gaussian mixture model (GMM) with Gaussian pdfs as state vector elements on which the native-resolution state vector elements are projected using radial basis functions (RBFs). The GMM method leads to somewhat lower aggregation error than the other methods, but more importantly it retains resolution of major local features in the state vector while smoothing weak and broad features.

Introduction

Inverse models quantify the state variables driving the evolution of a physical system by using observations of that system. This requires a physical model F, known as the forward model, that relates a set of input variables x (state vector) to a set of output variables y (observation vector), y=F(x)+ϵ. The observational error ϵ includes contributions from both the forward model and the measurements. Solution to the inverse problem involves statistical optimization to achieve a best error-weighted estimate of x given y.

A critical step in solving the inverse problem is determining the amount of information contained in the observations and choosing the state vector accordingly. This is a non-trivial problem when using large observational data sets with large errors. An example that will guide our discussion is the inversion of methane emissions on the basis of satellite observations of atmospheric methane concentrations . Methane concentrations can be predicted on the basis of emissions by using a chemical transport model (CTM) that solves the 3-D continuity equation for methane concentrations. Here the CTM is the forward model F, the satellite provides a large observation vector y, and we need to choose the resolution at which to optimize the methane emission vector x.

The simplest approach would be to use the native resolution of the CTM in order to extract the maximum information from the observations. However, the observations may not be sufficiently dense or precise to optimize emissions at that level of detail, resulting in an underdetermined problem. refer to this as the “resolution problem”. The inverse solution must then rely on some prior estimate for the state vector and may not be able to depart sufficiently from that knowledge. The associated error is known as the smoothing error  and increases with size of the state vector . illustrate the severity of this problem in their inversion of methane emissions using satellite data.

An additional drawback of using a large state vector is that analytical solution to the inverse problem may not be computationally tractable. Analytical solution requires calculation of the Jacobian matrix, xF, and inversion and multiplication of the error covariance matrices . It has the major advantage of providing complete error statistics as part of the solution, but it becomes impractical as the state vector becomes too large. Numerical solutions using variational methods circumvent this problem but do not provide error characterization as part of the solution. Approximate error statistics can be obtained e.g.,, but at the cost of additional computation.

Reducing the dimensionality of the state vector in the inverse problem thus has two advantages. It improves the observational constraints on individual state vector elements and it facilitates analytical solution. Reduction can be achieved by aggregating state vector elements. For a state vector of gridded time-dependent emissions, the state vector can be reduced by aggregating grid cells and time periods. However, this introduces error in the inversion as the underlying spatial and temporal patterns of the aggregated emissions are now imposed from prior knowledge and not allowed to be optimized as part of the inversion. The resulting error is called the aggregation error .

Previous work by , , , , and developed optimal grids that allow the transfer of information across multiple scales. These computationally efficient methods generally require the use of the native-resolution grid to derive the optimal representation. They also assume that the native-resolution prior error covariance matrices can be accurately constructed. However, in practice we are generally unable to specify realistic prior error correlations and must resort to simple assumptions.

Here we present a method for optimizing the selection of the state vector in the solution of the inverse problem for a given ensemble of observations without requiring an accurate specification of the native-resolution prior error covariance matrix. Instead, we use the expected error correlations between native-resolution state vector elements as criteria in the aggregation process. Relative to , our method is suboptimal but is more practical to implement. As the dimension of the state vector decreases, the smoothing error decreases while the aggregation error increases. Therefore, there is potentially an optimum dimension where the overall error is minimized. We derive an analytical expression for the aggregation error covariance matrix and show how this can guide selection of a reduced-dimension state vector where the aggregation error remains below an acceptable threshold. We also show how intelligent selection of the state vector can extract more information from the observations for a given state vector dimension.

Formulating the inverse problem

Inverse problems are commonly solved using Bayes' theorem, P(x|y)P(y|x)P(x), where P(x|y) is the posterior probability density function (pdf) of the state vector x (n×1) given a vector of observations y (m×1), P(x) is the prior pdf of x, and P(y|x) is the conditional pdf of y given the true value of x. Assuming Gaussian distributions for P(y|x) and P(x) allows us to write the posterior pdf as P(x|y)exp⁡-12y-F(x)TSO-1y-F(x)-12xa-xTSa-1xa-x, where xa is the n×1 prior state vector, SO is the m×m observational error covariance matrix, and Sa is the n×n prior error covariance matrix. Here and elsewhere, our notation and terminology follow that of . The most probable solution x^ (called the maximum a posteriori or MAP) is defined by the maximum of P(x|y), i.e., the minimum of the cost function J(x): J(x)=12y-F(x)TSO-1y-F(x)+12xa-xTSa-1xa-x. This involves solving xJ=xF(x)TSO-1F(x)-y+Sa-1xa-x=0. Solution to Eq. () can be done analytically if F is linear; i.e., F(x)=Kx+c where KxF=y/x is the Jacobian of F and c is a constant that can be set to zero in the general case by subtracting c from the observations. This yields x^=xa+Gy-Kxa, where G=S^KTSO-1 is the gain matrix and S^ is the posterior error covariance matrix, S^=KTSO-1K+Sa-1-1 The MAP solution can also be expressed in terms of the true value x as x^=xa+Ax-xa+Gϵ, where A is the averaging kernel matrix that measures the error reduction resulting from the observations A=GK=I-S^Sa-1 and Gϵ is the observation error in state space with error covariance matrix GSOGT. We have assumed here that errors are unbiased, as is standard practice in the inverse modeling literature. An observational error bias bO would propagate as a bias GbO in the solution x^ in Eq. ().

The analytical solution to the inverse problem thus provides full error characterization as part of the solution. It does require that the forward model be linear. The Jacobian matrix must generally be constructed numerically, requiring n sensitivity simulations with the forward model. Subsequent matrix operations are also of dimension n. This limits the practical size of the state vector. The matrix operations also depend on the dimension m of the observation vector, but this can be easily addressed by splitting that vector into uncorrelated packets, a method known as sequential updating .

The limitation on the state vector size can be lifted by finding the solution to xJ=0 numerically, rather than analytically, for example by using the adjoint of the forward model to calculate xJ iteratively at successive approaches to the solution e.g.,. This variational method allows for optimization of state vectors of any size because the Jacobian is not explicitly constructed. But it only yields the MAP solution, x^, with no error statistics. Several approaches have been presented to obtain approximate error characterization e.g.,, but they can be computationally expensive. An excessively large state vector relative to the strength of the observational constraints also incurs smoothing error, as discussed above.

Quantifying aggregation and smoothing errors

The resolution of the forward model (e.g., grid resolution of the CTM) places an upper limit on the dimension for the state vector, which we call the native dimension. As we reduce the dimension of the state vector from this native resolution, the smoothing error decreases while the aggregation error increases. Here we present analytical expressions for the aggregation and smoothing error covariance matrices and show how they can be used to select an optimal state vector dimension.

Aggregation error

As in , we define a restriction (aggregation) operator that maps the native-resolution state vector x of dimension n to a reduced-resolution vector xω of dimension p. We assume a linear restriction operator Γω as a p×n matrix relating xω to x: xω=Γωx.

provide a detailed analysis of aggregation error for reduced-resolution state vectors. Their analysis relies heavily on the construction of a prolongation operator (Γ) mapping xω back to x: x=Γxω. However, construction of this prolongation operator is not unique. We present here a simpler and more practical method.

Aggregation error is the error introduced by aggregating state vector elements in the inversion. The relationship between the aggregated elements is not optimized as part of the inversion anymore and instead becomes an unoptimized parameter in the forward model, effectively increasing the forward model error and inhibiting the ability of the model to fit the observations. The aggregation error is thus a component of the observational error.

The aggregation error can be quantified by comparing the observational error incurred by using the native-resolution state vector, ϵ=y-Kx, to that using the aggregated state vector, ϵω=y-Kωxω. Here y is the observation vector (common in both cases), x and xω are the true values of the native-resolution and aggregated state vectors, and K and Kω are the native resolution and the reduced-dimension Jacobians. The only difference between ϵ and ϵω is the aggregation of state vector elements. As such, ϵω=ϵ+ϵA where ϵA is the aggregation error. Rearranging, ϵA=K-KωΓωx. Obtaining the error statistics for ϵA requires knowledge of the pdf of x for the ensemble of possible true states cf.. Let x represent the mean value of this ensemble and Se the corresponding covariance matrix. The aggregation error covariance matrix is: SA=EϵA-EϵAϵA-EϵAT where E is the expected value operator. EϵA=K-KωΓωx is the bias introduced by the aggregation. Replacing into Eq. (): SA=K-KωΓωEx-xx-xTK-KωΓωT=K-KωΓωSeK-KωΓωT. In designing our inversion system we use xa as our best estimate of x and Sa as our best estimate of Se. Indeed, if xa=x there would be no aggregation error since the prior relationship assumed between state vector elements would be correct, thus K=KωΓω and the aggregation bias would be zero. Assuming Sa=Se allows us to calculate the aggregation error covariance matrix as SA=K-KωΓωSaK-KωΓωT and we will use this expression in the analysis that follows. Application of Eq. () requires computation of the native-resolution Jacobian K, but this can be done for a limited test period only. We will give an example below.

Smoothing error

Following , we can express the smoothing error on x^ by rearranging Eqs. () and (): x^-x=I-Axa-x+Gϵ, where ϵS=I-Axa-x is the smoothing error. As pointed out by , the smoothing error statistics must be derived from the pdf of possible true states, in the same way as for the aggregation error and characterized by the error covariance matrix Se. For purposes of designing the inverse system we assume that Se=Sa. Thus we have SS=I-ASaI-AT. We can also express the smoothing error in observation space, ϵS, (i.e., as a difference between y and Kx^) by multiplying both sides of Eq. () by the Jacobian matrix: Kx^-x=KI-Axa-x+KGϵ so that ϵS=KI-Axa-x. The corresponding smoothing error covariance matrix in observation space is SS=KI-ASaI-ATKT. This expression can be generalized to compute the smoothing error covariance matrix in observation space for any reduced-dimension state vector xω with Jacobian Kω, prior error covariance matrix Sa,ω, and averaging kernel matrix Aω: SS=KωI-AωSa,ωI-AωTKωT.

Total error budget

From Eq. () we can see that the total error on x^ without aggregation is ϵT=ϵS+Gϵ in the state space, or ϵT=ϵS+KGϵ in the observation space. The KG term in the observation space appears because we are interested in the error on x^. If x^=x then KG=I and A=I, thus ϵS=0 and our total error reverts to ϵ, ϵT|x^=x=KI-Axa-x+KGϵ=ϵ.

Additional consideration of aggregation error for a reduced-dimension state vector xω yields a total error in the state space ϵT=ϵS+Gωϵ+GωϵA where Gω=KωTSO-1Kω+Sa,ω-1-1KωTSO-1 is the gain matrix for the reduced-dimension state vector. In the observation space we get ϵT=ϵS+KωGωϵ+KωGωϵA.

From these relationships we derive the total error covariance matrix as ST,ω=I-AωSa,ωI-AωTSmoothing Error+GωK-KωΓωSaK-KωΓωTGωTAggregation Error+GωSOGωTObservation Error in the state space and ST,ω=KωI-AωSa,ωI-AωTKωTSmoothing Error+KωGωK-KωΓωSaK-KωΓωTGωTKωTAggregation Error+KωGωSOGωTKωTObservation Error in the observation space. A bias term should exhibit similar scale dependence to the observation error term and could be included by following the derivation from .

Each of the three error terms above depends on state vector dimension. Because the smoothing error increases with state vector dimension while the aggregation error decreases, analysis of the error budget can potentially point to the optimal dimension where the total error is minimum. It can also point to the minimum state vector dimension needed for the aggregation error to be below a certain tolerance, e.g., smaller than the observation error. We give an example in Sect. .

A caveat in the above expressions for the aggregation and smoothing error covariance matrices is that they are valid only if the prior xa is the mean value x for the pdf of true states and if the error covariance matrix Sa is the covariance matrix for that pdf (Se=Sa). p. 49 and provide a detailed discussion of the errors induced by failing to meet this assumption. Since these assumptions define our prior, they can be taken as valid for the purpose of selecting an appropriate state vector dimension in an inverse problem. However, they should not be used to diagnose errors on the inversion results.

Illustration of different approaches for aggregating a state vector. Here the native-resolution state vector is a field of gridded methane emissions at 12×23 resolution over North America. Extreme reduction to eight state vector elements is shown with individual elements distinguished by color.

Aggregation methods

Aggregation of state vector elements to reduce the state vector dimension introduces aggregation error, as described in Sect. . The aggregation error can be reduced by grouping elements with correlated errors. Analyzing the off-diagonal structure of a precisely constructed prior error correlation matrix would provide the best objective way to carry out the aggregation, as described by , , and . We generally lack such information but do have some qualitative knowledge of prior error correlation that can be used to optimize the aggregation. By aggregating regions that have correlated errors we can exploit additional information that would otherwise be neglected in a native-resolution inversion assuming (by default) uncorrelated errors.

Previous work by , , and used tiling and tree-based aggregation methods, while used a hierarchal clustering method based on prior error patterns. also used principal component analysis (PCA) coupled to the hierarchal grid to compute an optimal grid. Here we compare three aggregation methods: (1) simple grid coarsening, (2) PCA clustering, and (3) a Gaussian mixture model (GMM) with radial basis functions (RBFs) to project native-resolution state vector elements to Gaussian pdfs. A qualitative illustration of these methods is shown in Fig.  for the aggregation of a native-resolution state vector of methane emissions with 12×23 native grid resolution over North America . We focus here on spatial aggregation and assume that the state vector has no temporal dimension. However, the same methods can be used for temporal aggregation.

The simplest method for reducing the dimension of the state vector is to merge adjacent elements, i.e., neighboring grid cells. This method considers only spatial proximity as a source of error correlation. It may induce large aggregation errors if proximal, but otherwise dissimilar regions are aggregated together. In the case of methane emissions, aggregating neighboring wetlands and farmland would induce large errors because different processes drive methane emissions from these two source types.

The other two methods enable consideration of additional similarity factors besides spatial proximity when aggregating state vector elements. These similarity factors are expressed by vectors of dimension n describing correlative properties of the original native-resolution state vector elements. In the case of a methane source inversion, for example, we can choose as similarity vectors latitude and longitude to account for spatial proximity, but also wetland fraction to account for error correlations in the bottom-up wetland emission estimate used as prior.

Similarity matrix for aggregation

Table  lists the similarity vectors chosen for our example problem of estimating methane emissions . The first two vectors account for spatial proximity, the third represents the scaling factors from the first iteration of an adjoint-based inversion at native resolution , and the others are the source type patterns from the bottom-up inventories used as prior. All similarity vectors are normalized and then weighted by judgment of their importance. We choose here to include initial scaling factors from the adjoint-based inversion because we have them available and they can serve to correct any prior patterns that are grossly inconsistent with the observations, or to identify local emission hotspots missing from the prior. One iteration of the adjoint-based inversion is computationally inexpensive and is sufficient to pick up major departures from the prior.

Similarity vectors for inverting methane emissions in North Americaa.

Similarity Weighting vector factorb 1. Latitudec 1.00 2. Longituded 1.00 3. Initial scaling factorse 0.15 4. Wetland 0.31 5. Livestock 0.22 6. Oil/gas 0.16 7. Waste 0.15 8. Coal 0.06 9. Soil absorption 0.05 10. Termites 0.02 11. Biomass burning 0.02 12. Biofuel 0.01 13. Rice 0.01 14. Other 0.01

a The K=14 similarity vectors describe prior error correlation criteria for the native-resolution state vector, representing here the methane emission in North America at the 12×23 resolution of the GEOS-Chem chemical transport model. The criteria are normalized and then weighted (weighting factor). Criteria 4–14 are prior emission patterns used in the GEOS-Chem model . b The weighting factors (dimensionless) measure the estimated relative importance of the different similarity criteria in determining prior error correlations in the state vector. For the prior emission patterns these weighting factors are the fractional contributions to total prior emissions in North America. c Distance in kilometers from the equator. d Distance in kilometers from the prime meridian. e Initial scaling factors from one iteration of an adjoint inversion at the native resolution.

Let c1,,cK represent the K similarity vectors chosen for the problem (K=14 in our example of Table ). We assemble them into a n×K similarity matrix C. We will also make use of the ensemble of similarity vector values for individual state vector elements, which we assemble into vectors {c1,,cn} representing the rows of C. Thus: C=c1c2cK =c1c2cn In this work all of the aggregation methods except for grid coarsening will use the same similarity matrix to construct the restriction operator.

This approach of using a similarity matrix C to account for prior error covariances bears some resemblance to the geostatistical approach for inverse modeling e.g.,. The geostatistical approach specifies the prior estimate as xa=Cβ, where β is a vector of unknown drift coefficients to be optimized as part of the inversion. Here we use the similarity matrix to reduce the dimension of the state vector, rather than just as a choice of prior constraints.

Clustering with principal component analysis

In this method we cluster state vector elements following the principal components of the similarity matrix. It is generally not practical to derive the principal components in state vector space because the n-dimension is large. Instead we derive them in similarity space (dimension K) as the eigenvectors of CTC sorted in order of importance by their eigenvalues. The leading j principal components are kept for clustering. The reduced state vector is then constructed by grouping state vector elements that have the same sign patterns for all j principal components. Each unique j-dimensional sign pattern constitutes a cluster. The number of clusters defined in that way ranges between j and 2j. Figure b shows an example of applying this method to methane emissions in North America with reduction of the state vector to n=8. The separation into four quadrants reflects the importance of latitude and longitude as error correlation factors. The additional separation within each quadrant isolates large from weak sources as defined by the prior.

Gaussian mixture model (GMM)

Here we use a Gaussian mixture model GMM; to project the native-resolution state vector onto p Gaussian pdfs using radial basis functions (RBFs). Mixture models are probabilistic models for representing a population comprised of p subpopulations. Each subpopulation is assumed to follow a pdf, in this case Gaussian. The Gaussians are K-dimensional where K is the number of similarity criteria. Each native-resolution state vector element is fit to this ensemble of Gaussians using RBFs as weighting factors.

The first step in constructing the GMM is to define a p×n weighting matrix W=[w1,w2,,wp]T. Each element wi,j of this weighting matrix is the relative probability for native-resolution state vector element j to be described by Gaussian subpopulation i; i.e., “how much does element j look like Gaussian i?”. It is given by wi,j=πiN(cj|μi,Λi)k=1pπjN(cj|μk,Λk). Here cj is the jth row of the similarity matrix C, μi is a 1×K row vector of means for the ith Gaussian, Λi is a K×K covariance matrix for the ith Gaussian, and π=π1,,πpT is the relative weight of the p Gaussians in the mixture. Ncj|μi,Λi denotes the probability density of vector cj on the normal distribution of Gaussian i. We define a p×K matrix M with rows μi and a K×K×p third-order tensor L=[Λ1,,Λp] as the set of covariance matrices.

Projection of the native-resolution state vector onto the GMM involves four unknowns: W, π, M, and L. This is solved by constructing a cost function to estimate the parameters of the Gaussians in the mixture model using maximum likelihood: JGMM(C|π,M,L)=j=1nln⁡i=1pπiN(cj|μi,Λi) Starting from an initial guess for π, M, and L we compute the weight matrix W using Eq. (). We then differentiate the cost function with respect to π, M, and L, and set the derivative to zero to obtain see μi=Ψij=1nwi,jcj,Λi=Ψij=1nwi,jcj-μiTcj-μi,πi=1nΨi, where Ψi=j=1n1wi,j. The weights are re-calculated from the updated guesses of W, π, M, and L from Eqs. () to (), and so on until convergence. The final weights define the restriction operator as Γω=W. The computational complexity for the expectation-maximization algorithm is O(nK+pn2) ; however, the actual runtime will be largely dictated by the convergence criteria. Here we use an absolute tolerance of τ<10-10 where τ=ijMi,j-Mi,j+ijkLi,j,k-Li,j,k+iπi-πi, and the superscript star indicates the value from the previous iteration.

Gaussian mixture model (GMM) representation of methane emissions in Southern California with Gaussian pdfs as state vector elements. The Gaussians are constructed from a similarity matrix for methane emissions on the 12×23 horizontal resolution of the GEOS-Chem CTM used as forward model for the inversion. The figure shows the dominant three Gaussians for Southern California with contours delineating the 0.5, 1.0, 1.5, and 2.0σ spreads for the latitude–longitude dimensions. The RBF weights w1, w2, and w3 of the three Gaussians for each 12×23 grid square are also shown along with their sum.

The GMM allows each native-resolution state vector element to be represented by a unique linear combination of the Gaussians through the RBFs. For a state vector of a given dimension, defined by the number of Gaussian pdfs, we can achieve high resolution for large localized sources by sacrificing resolution for weak or uniform source regions where resolution is not needed. This is illustrated in Fig.  with the resolution of Southern California in an inversion of methane sources for North America. The figure shows the three dominant Gaussians describing emissions in Southern California and the corresponding RBF weights for each native-resolution grid square. Gaussian 1 is centered over Los Angeles and is highly localized, Gaussian 2 covers the Los Angeles Basin, and Gaussian 3 is a Southern California background. The sum of these three Gaussians accounts for most of the emissions in Southern California and Nevada (which is mostly background). Additional Gaussians (not shown) resolve the southern San Joaquin Valley (large livestock and oil/gas emissions) and Las Vegas (large emissions from waste).

Application

We apply the aggregation methods described above to our example problem of estimating methane emissions from satellite observations of methane concentrations, focusing on selecting a reduced-dimension state vector that minimizes aggregation and smoothing errors. The inversion is described in detail in and uses GOSAT satellite observations for 2009–2011 over North America. The forward model for the inversion is the GEOS-Chem CTM with 12×23 grid resolution. The native-resolution state vector of methane emissions as defined on that grid includes 7366 elements.

For the purpose of selecting an aggregated state vector for the inversion, we consider a subset of observations for May 2010 (m=6070) so that we can afford to construct the corresponding Jacobian matrix K at the native resolution; this is necessary to derive the aggregation error covariance matrix following Eq. (). The prior error covariance matrix is specified as diagonal with 100 % uncertainty at the native resolution, decreasing with aggregation following the central limit theorem . The observational error covariance matrix is also diagonal and specified as the scene-specific retrieval error from , which dominates the total observational error as shown by . We compare the three methods presented in Sect.  for aggregating the state vector in terms of the implications for aggregation and smoothing errors for different state vector dimensions. In addition to the GMM with RBFs, we also consider a “GMM clustering” method where each native-resolution state vector element is assigned exclusively to its dominant Gaussian pdf. This yields sharp boundaries between clusters (Fig. ) as in the grid coarsening and PCA methods.

Aggregation and smoothing error dependences on the aggregation of state vector elements in an inverse model. The application here is to an inversion of methane emissions over North America using satellite methane data with 7366 native-resolution state vector elements Sect.  and. Results are shown as the square roots of the means of the diagonal terms (mean error standard deviation) in the aggregation and smoothing error covariance matrices. Different methods for aggregating the state vector (Sect. ) are shown as separate lines. Note the log scale on the x axis.

Figure  shows the mean error standard deviation in the aggregation and smoothing error covariance matrices, computed as the square root of the mean of the diagonal terms, as a function of state vector dimension. The aggregation error is zero by definition at the native resolution (7366 state vector elements), and increases as the number n of state vector elements decreases, following a roughly n-0.7 dependence. Conversely, the smoothing error increases as the number of state vector elements increases, following roughly a log⁡(n) dependence. The different aggregation methods of Sect.  yield very similar smoothing errors, suggesting that any reasonable aggregation scheme (such as k means clustering; c.f. ) would perform comparably. The aggregation error is somewhat improved using the GMM method. RBF weighting performs slightly better than GMM clustering (sharp boundaries). As discussed above, a major advantage of the GMM method is its ability to retain resolution of large localized sources after aggregation.

Figure  shows the sum of contributions from aggregation, smoothing, and observational error standard deviations as a function of state vector aggregation using the GMM with RBF weighting. In this application, aggregation error dominates for small state vectors (n<100), but drops below the observation error for n>100 and below the smoothing error for n>1000. The smoothing error remains smaller than the observational error even at the native resolution (n=7366). The observational error is not independent of aggregation, as shown in Eq. (), but we find here that the dependence is small.

Total error budget from the aggregation of state vector elements in an inverse model. The application here is to an inversion of methane emissions over North America using satellite methane data with 7366 native-resolution state vector elements Sect.  and. Results are shown as the square roots of the means of the diagonal terms (mean error standard deviation) in the aggregation, smoothing, and observational error covariance matrices, and for the sum of these matrices. Aggregation uses the GMM with RBF weighting (Sect. ). There is an optimum state vector size for which the total error is minimum and this is shown as the circle. Gray shading indicates the 90 % range for the total error on individual elements as diagnosed from the 5th and 95th quantiles of diagonal elements in the total error covariance matrix. Note the log scale on the x axis.

From Fig.  we can identify a state vector dimension for which the total error is minimum (n=2208; circle in Fig. ). However, error growth is small until n200, below which the aggregation error grows rapidly. A state vector of 369 elements, as adopted by , does not incur significant errors associated with aggregation or smoothing, and enables computation of an analytical solution to the inverse problem with full error characterization.

Previous work by , , , , and analyzed the scale dependence of different grids using the degrees of freedom for signal: DFS=Tr(I-Sa,ω-1S^ω). These past works found this error metric to be monotonically increasing. This implies that the native-resolution grid will have the least total error and there is no optimal resolution, except from a numerical efficiency standpoint. Here we find a local minimum that is, seemingly, at odds with this previous work. However, the reasoning for this local minimum is that we have allowed the aggregation to account for spatial error correlations that we are unable to specify at the native resolution. As such, we are taking more information into account and obtaining a minimum total error at a state vector size that is smaller than the native resolution. If the native-resolution error covariance matrices were correct, then, as previous work showed, the only reason to perform aggregation would be to reduce the computational expense and the grid used here would be suboptimal because it does not depend on the native-resolution grid.

Conclusions

We presented a method for optimizing the selection of the state vector in the solution of the inverse problem for a given ensemble of observations. The optimization involves minimizing the total error in the inversion by balancing the aggregation error (which increases as the state vector dimension decreases), the smoothing error (which increases as the state vector dimension increases), and the observational error. We further showed how one can reduce the state vector dimension within the constraints from the aggregation error in order to facilitate an analytical solution to the inverse problem with full error characterization.

We explored different methods for aggregating state vector elements as a means of reducing the dimension of the state vector. Aggregation error can be minimized by grouping state vector elements with the strongest correlated prior errors. We showed that a Gaussian mixture model (GMM), where the state vector elements are multi-dimensional Gaussian pdfs constructed from prior error correlation patterns, is a powerful aggregation tool. Reduction of the state vector dimension using the GMM retains fine-scale resolution of important features in the native-resolution state vector while merging weak or uniform features.

Acknowledgements

For advice and discussions, we thank K. Wecht (Harvard University). Special thanks to R. Parker and H. Boesch (University of Leicester) for providing the GOSAT observations. This work was supported by the NASA Carbon Monitoring System and by a Department of Energy (DOE) Computational Science Graduate Fellowship (CSGF) to A. J Turner. We thank the Harvard SEAS Academic Computing center for access to computing resources. We also thank M. Bocquet and an anonymous reviewer for their thorough comments. Edited by: R. Harley

References Bishop, C. M.: Pattern Recognition and Machine Learning, Springer, 1st Edn., New York, 2007. Bocquet, M.: Towards optimal choices of control space representation for geophysical data assimilation, Mon. Weather Rev., 137, 2331–2348, doi:10.1175/2009MWR2789.1, 2009. Bocquet, M. and Wu, L.: Bayesian design of control space for optimal assimilation of observations. II: Asymptotics solution, Q. J. Roy. Meteor. Soc., 137, 1357–1368, doi:10.1002/qj.841, 2011. Bocquet, M., Wu, L., and Chevallier, F.: Bayesian design of control space for optimal assimilation of observations. Part I: Consistent multiscale formalism, Q. J. Roy. Meteor. Soc., 137, 1340–1356, doi:10.1002/qj.837, 2011. Bousserez, N., Henze, D. K., Perkins, A., Bowman, K. W., Lee, M., Liu, J., Deng, F., and Jones, D. B. A.: Improved analysis-error covariance matrix for high-dimensional variational inversions: application to source estimation using a 3D atmospheric transport model, Q. J. Roy. Meteor. Soc., doi:10.1002/qj.2495, online first, 2015. Bousquet, P., Peylin, P., Ciais, P., Le Quere, C., Friedlingstein, P., and Tans, P. P.: Regional changes in carbon dioxide fluxes of land and oceans since 1980, Science, 290, 1342–1346, doi:10.1126/Science.290.5495.1342, 2000. Chen, Z., Haykin, S., Eggermont, J. J., and Becker, S.: Correlative Learning: A Basis for Brain and Adaptive Systems, John Wiley & Sons, 1st Edn., New York, 2007. Chevallier, F., Breon, F. M., and Rayner, P. J.: Contribution of the Orbiting Carbon Observatory to the estimation of CO2 sources and sinks: theoretical study in a variational data assimilation framework, J. Geophys. Res.-Atmos., 112, D09307, doi:10.1029/2006jd007375, 2007. Courtier, P., Thepaut, J., and Hollingsworth, A.: A strategy for operational implementation of 4D-Var, using an incremental approach, Q. J. Roy. Meteor. Soc., 120, 1367–1387, doi:10.1002/qj.49712051912, 1994. Desroziers, G., Berre, L., Chapnik, B., and Poli, P.: Diagnosis of observation, background and analysis-error statistics in observation space, Q. J. Roy. Meteor. Soc., 131, 3385–3396, doi:10.1256/qj.05.108, 2005. Gourdji, S. M., Mueller, K. L., Schaefer, K., and Michalak, A. M.: Global monthly averaged CO2 fluxes recovered using a geostatistical inverse modeling approach: 2. Results including auxiliary environmental data, J. Geophys. Res., 113, D21115, doi:10.1029/2007jd009733, 2008. Henze, D. K., Hakami, A., and Seinfeld, J. H.: Development of the adjoint of GEOS-Chem, Atmos. Chem. Phys., 7, 2413–2433, 10.5194/acp-7-2413-2007, 2007. Kaminski, T. and Heimann, M.: Inverse modeling of atmospheric carbon dioxide fluxes, Science, 294, p. 259, doi:10.1126/science.294.5541.259a, 2001. Kaminski, T., Rayner, P. J., Heimann, M., and Enting, I. G.: On aggregation errors in atmospheric transport inversions, J. Geophys. Res., 106, 4703, doi:10.1029/2000jd900581, 2001. Koohkan, M. R., Bocquet, M., Wu, L., and Krysta, M.: Potential of the International Monitoring System radionuclide network for inverse modelling, Atmos. Environ., 54, 557–567, doi:10.1016/j.atmosenv.2012.02.044, 2012. Michalak, A. M., Bruhwiler, L., and Tans, P. P.: A geostatistical approach to surface flux estimation of atmospheric trace gases, J. Geophys. Res., 109, D14109, doi:10.1029/2003jd004422, 2004. Michalak, A. M., Hirsch, A., Bruhwiler, L., Gurney, K. R., Peters, W., and Tans, P. P.: Maximum likelihood estimation of covariance parameters for Bayesian atmospheric trace gas surface flux inversions, J. Geophys. Res., 110, D24107, doi:10.1029/2005jd005970, 2005. Miller, S. M., Kort, E. A., Hirsch, A. I., Dlugokencky, E. J., Andrews, A. E., Xu, X., Tian, H., Nehrkorn, T., Eluszkiewicz, J., Michalak, A. M., and Wofsy, S. C.: Regional sources of nitrous oxide over the United States: seasonal variation and spatial distribution, J. Geophys. Res., 117, D06310, doi:10.1029/2011jd016951, 2012. Parker, R., Boesch, H., Cogan, A., Fraser, A., Feng, L., Palmer, P. I., Messerschmidt, J., Deutscher, N., Griffith, D. W. T., Notholt, J., Wennberg, P. O., and Wunch, D.: Methane observations from the Greenhouse Gases Observing SATellite: comparison to ground-based TCCON data and model calculations, Geophys. Res. Lett., 38, L15807, doi:10.1029/2011gl047871, 2011. Rodgers, C. D.: Inverse Methods for Atmospheric Sounding, World Scientific, Singapore, 2000. Schuh, A. E., Denning, A. S., Uliasz, M., and Corbin, K. D.: Seeing the forest through the trees: recovering large-scale carbon flux biases in the midst of small-scale variability, J. Geophys. Res., 114, G03007, doi:10.1029/2008jg000842, 2009. Turner, A. J., Jacob, D. J., Wecht, K. J., Maasakkers, J. D., Lundgren, E., Andrews, A. E., Biraud, S. C., Boesch, H., Bowman, K. W., Deutscher, N. M., Dubey, M. K., Griffith, D. W. T., Hase, F., Kuze, A., Notholt, J., Ohyama, H., Parker, R., Payne, V. H., Sussmann, R., Sweeney, C., Velazco, V. A., Warneke, T., Wennberg, P. O., and Wunch, D.: Estimating global and North American methane emissions with high spatial resolution using GOSAT satellite data, Atmos. Chem. Phys., 15, 7049–7069, 10.5194/acp-15-7049-2015, 2015. von Clarmann, T.: Smoothing error pitfalls, Atmos. Meas. Tech., 7, 3023–3034, 10.5194/amt-7-3023-2014, 2014. Wecht, K. J., Jacob, D. J., Frankenberg, C., Jiang, Z., and Blake, D. R.: Mapping of North American methane emissions with high spatial resolution by inversion of SCIAMACHY satellite data, J. Geophys. Res.-Atmos., 119, 7741–7756, doi:10.1002/2014jd021551, 2014. Wu, L., Bocquet, M., Lauvaux, T., Chevallier, F., Rayner, P., and Davis, K.: Optimal representation of source-sink fluxes for mesoscale carbon dioxide inversion with synthetic data, J. Geophys. Res., 116, D21304, doi:10.1029/2011jd016198, 2011. </app></app-group></back> </article>