Balancing aggregation and smoothing errors in inverse models

. Inverse models use observations of a system (observation vector) to quantify the variables driving that system (state vector) by statistical optimization. When the observation vector is large, such as with satellite data, selecting a suitable dimension for the state vector is a challenge. A state vector that is too large cannot be effectively constrained by the observations, leading to smoothing error. However, reducing the dimension of the state vector leads to aggregation error as prior relationships between state vector elements are imposed rather than optimized. Here we present a method for quantifying aggregation and smoothing errors as a function of state vector dimension, so that a suitable dimension can be selected by minimizing the combined error. Reducing the state vector within the aggregation error constraints can have the added advantage of enabling analytical solu-tion to the inverse problem with full error characterization. We compare three methods for reducing the dimension of the state vector from its native resolution: (1) merging adjacent elements (grid coarsening), (2) clustering with principal component analysis (PCA), and (3) applying a Gaussian mixture model (GMM) with Gaussian pdfs as state vector elements on which the native-resolution state vector elements are projected using radial basis functions (RBFs). The GMM


Introduction
Inverse models quantify the state variables driving the evolution of a physical system by using observations of that system.This requires a physical model F, known as the forward model, that relates a set of input variables x (state vector) to a set of output variables y (observation vector), y = F(x) + . (1) The observational error includes contributions from both the forward model and the measurements.Solution to the inverse problem involves statistical optimization to achieve a best error-weighted estimate of x given y.A critical step in solving the inverse problem is determining the amount of information contained in the observations and choosing the state vector accordingly.This is a nontrivial problem when using large observational data sets with large errors.An example that will guide our discussion is the inversion of methane emissions on the basis of satellite observations of atmospheric methane concentrations (Turner et al., 2015).Methane concentrations can be predicted on the basis of emissions by using a chemical transport model (CTM) that solves the 3-D continuity equation for methane concentrations.Here the CTM is the forward model F, the satellite provides a large observation vector y, and we need to choose the resolution at which to optimize the methane emission vector x.
The simplest approach would be to use the native resolution of the CTM in order to extract the maximum information from the observations.However, the observations may not be sufficiently dense or precise to optimize emissions at that level of detail, resulting in an underdetermined problem.Bocquet et al. (2011) refer to this as the "resolution problem".The inverse solution must then rely on some prior estimate for the state vector and may not be able to depart sufficiently from that knowledge.The associated error is known as the smoothing error (Rodgers, 2000;von Clarmann, 2014) and increases with size of the state vector (Bousquet et al., 2000;Kaminski and Heimann, 2001;Kaminski et al., 2001;von Clarmann, 2014).Wecht et al. (2014) illustrate the severity of this problem in their inversion of methane emissions using satellite data.
An additional drawback of using a large state vector is that analytical solution to the inverse problem may not be computationally tractable.Analytical solution requires calculation of the Jacobian matrix, ∇ x F, and inversion and multiplication of the error covariance matrices (Rodgers, 2000).It has the major advantage of providing complete error statistics as part of the solution, but it becomes impractical as the state vector becomes too large.Numerical solutions using variational methods circumvent this problem but do not provide error characterization as part of the solution.Approximate error statistics can be obtained (e.g., Bousserez et al., 2015), but at the cost of additional computation.
Reducing the dimensionality of the state vector in the inverse problem thus has two advantages.It improves the observational constraints on individual state vector elements and it facilitates analytical solution.Reduction can be achieved by aggregating state vector elements.For a state vector of gridded time-dependent emissions, the state vector can be reduced by aggregating grid cells and time periods.However, this introduces error in the inversion as the underlying spatial and temporal patterns of the aggregated emissions are now imposed from prior knowledge and not allowed to be optimized as part of the inversion.The resulting error is called the aggregation error (Kaminski and Heimann, 2001;Kaminski et al., 2001;Schuh et al., 2009).
Previous work by Bocquet (2009), Bocquet et al. (2011), Bocquet and Wu (2011), Wu et al. (2011), andKoohkan et al. (2012) developed optimal grids that allow the transfer of information across multiple scales.These computationally efficient methods (Bocquet and Wu, 2011) generally require the use of the native-resolution grid to derive the optimal representation.They also assume that the nativeresolution prior error covariance matrices can be accurately constructed.However, in practice we are generally unable to specify realistic prior error correlations and must resort to simple assumptions.
Here we present a method for optimizing the selection of the state vector in the solution of the inverse problem for a given ensemble of observations without requiring an accurate specification of the native-resolution prior error covariance matrix.Instead, we use the expected error correlations between native-resolution state vector elements as criteria in the aggregation process.Relative to Bocquet et al. (2011), our method is suboptimal but is more practical to implement.As the dimension of the state vector decreases, the smoothing error decreases while the aggregation error increases.Therefore, there is potentially an optimum dimension where the overall error is minimized.We derive an analytical expression for the aggregation error covariance matrix and show how this can guide selection of a reduced-dimension state vector where the aggregation error remains below an acceptable threshold.We also show how intelligent selection of the state vector can extract more information from the observations for a given state vector dimension.

Formulating the inverse problem
Inverse problems are commonly solved using Bayes' theorem, P (x|y) ∝ P (y|x)P (x), (2) where P (x|y) is the posterior probability density function (pdf) of the state vector x (n × 1) given a vector of observations y (m × 1), P (x) is the prior pdf of x, and P (y|x) is the conditional pdf of y given the true value of x.Assuming Gaussian distributions for P (y|x) and P (x) allows us to write the posterior pdf as where x a is the n×1 prior state vector, S O is the m×m observational error covariance matrix, and S a is the n × n prior error covariance matrix.Here and elsewhere, our notation and terminology follow that of Rodgers (2000).The most probable solution x (called the maximum a posteriori or MAP) is defined by the maximum of P (x|y), i.e., the minimum of the cost function J (x): This involves solving Solution to Eq. ( 5) can be done analytically if F is linear; i.e., F(x) = Kx + c where K ≡ ∇ x F = ∂y/∂x is the Jacobian of F and c is a constant that can be set to zero in the general case by subtracting c from the observations.This yields where G = ŜK T S −1 O is the gain matrix and Ŝ is the posterior error covariance matrix, The MAP solution can also be expressed in terms of the true value x as where A is the averaging kernel matrix that measures the error reduction resulting from the observations and G is the observation error in state space with error covariance matrix GS O G T .We have assumed here that errors are unbiased, as is standard practice in the inverse modeling literature.An observational error bias b O would propagate as a bias Gb O in the solution x in Eq. ( 8).
The analytical solution to the inverse problem thus provides full error characterization as part of the solution.It does require that the forward model be linear.The Jacobian matrix must generally be constructed numerically, requiring n sensitivity simulations with the forward model.Subsequent matrix operations are also of dimension n.This limits the practical size of the state vector.The matrix operations also depend on the dimension m of the observation vector, but this can be easily addressed by splitting that vector into uncorrelated packets, a method known as sequential updating (Rodgers, 2000).
The limitation on the state vector size can be lifted by finding the solution to ∇ x J = 0 numerically, rather than analytically, for example by using the adjoint of the forward model to calculate ∇ x J iteratively at successive approaches to the solution (e.g., Henze et al., 2007).This variational method allows for optimization of state vectors of any size because the Jacobian is not explicitly constructed.But it only yields the MAP solution, x, with no error statistics.Several approaches have been presented to obtain approximate error characterization (e.g., Courtier et al., 1994;Desroziers et al., 2005;Chevallier et al., 2007;Bousserez et al., 2015), but they can be computationally expensive.An excessively large state vector relative to the strength of the observational constraints also incurs smoothing error, as discussed above.

Quantifying aggregation and smoothing errors
The resolution of the forward model (e.g., grid resolution of the CTM) places an upper limit on the dimension for the state vector, which we call the native dimension.As we reduce the dimension of the state vector from this native resolution, the smoothing error decreases while the aggregation error increases.Here we present analytical expressions for the aggregation and smoothing error covariance matrices and show how they can be used to select an optimal state vector dimension.

Aggregation error
As in Bocquet et al. (2011), we define a restriction (aggregation) operator that maps the native-resolution state vector x of dimension n to a reduced-resolution vector x ω of dimension p.We assume a linear restriction operator ω as a p × n matrix relating x ω to x: x ω = ω x.
(10) Bocquet et al. (2011) provide a detailed analysis of aggregation error for reduced-resolution state vectors.Their analysis relies heavily on the construction of a prolongation operator ( ) mapping x ω back to x: x = x ω .However, construction of this prolongation operator is not unique.We present here a simpler and more practical method.
Aggregation error is the error introduced by aggregating state vector elements in the inversion.The relationship between the aggregated elements is not optimized as part of the inversion anymore and instead becomes an unoptimized parameter in the forward model, effectively increasing the forward model error and inhibiting the ability of the model to fit the observations.The aggregation error is thus a component of the observational error.
The aggregation error can be quantified by comparing the observational error incurred by using the native-resolution state vector, to that using the aggregated state vector, Here y is the observation vector (common in both cases), x and x ω are the true values of the native-resolution and aggregated state vectors, and K and K ω are the native resolution and the reduced-dimension Jacobians.The only difference between and ω is the aggregation of state vector elements.
As such, where A is the aggregation error.Rearranging, Obtaining the error statistics for A requires knowledge of the pdf of x for the ensemble of possible true states (cf.Rodgers, 2000;von Clarmann, 2014).Let x represent the mean value of this ensemble and S e the corresponding covariance matrix.The aggregation error covariance matrix is: where E [ ] is the expected value operator.
x is the bias introduced by the aggregation.Replacing into Eq.( 15): In designing our inversion system we use x a as our best estimate of x and S a as our best estimate of S e .there would be no aggregation error since the prior relationship assumed between state vector elements would be correct, thus K = K ω ω and the aggregation bias would be zero.
Assuming S a = S e allows us to calculate the aggregation error covariance matrix as and we will use this expression in the analysis that follows.Application of Eq. ( 17) requires computation of the nativeresolution Jacobian K, but this can be done for a limited test period only.We will give an example below.

Smoothing error
Following Rodgers (2000), we can express the smoothing error on x by rearranging Eqs. ( 6) and (1): where S = (I − A) (x a − x) is the smoothing error.As pointed out by Rodgers (2000), the smoothing error statistics must be derived from the pdf of possible true states, in the same way as for the aggregation error and characterized by the error covariance matrix S e .For purposes of designing the inverse system we assume that S e = S a .Thus we have We can also express the smoothing error in observation space, * S , (i.e., as a difference between y and K x) by multiplying both sides of Eq. ( 18) by the Jacobian matrix: The corresponding smoothing error covariance matrix in observation space is This expression can be generalized to compute the smoothing error covariance matrix in observation space for any reduceddimension state vector x ω with Jacobian K ω , prior error covariance matrix S a,ω , and averaging kernel matrix A ω :

Total error budget
From Eq. ( 18) we can see that the total error on x without aggregation is T = S + G in the state space, or * T = * S + KG in the observation space.The KG term in the observation space appears because we are interested in the error on x.If x = x then KG = I and A = I, thus S = 0 and our total error reverts to , * (24) Additional consideration of aggregation error for a reduced-dimension state vector x ω yields a total error in the state space where is the gain matrix for the reduced-dimension state vector.In the observation space we get * From these relationships we derive the total error covariance matrix as in the state space and in the observation space.A bias term should exhibit similar scale dependence to the observation error term and could be included by following the derivation from Rodgers (2000).Each of the three error terms above depends on state vector dimension.Because the smoothing error increases with state vector dimension while the aggregation error decreases, analysis of the error budget can potentially point to the optimal dimension where the total error is minimum.It can also point to the minimum state vector dimension needed for the aggregation error to be below a certain tolerance, e.g., smaller than the observation error.We give an example in Sect. 5.
A caveat in the above expressions for the aggregation and smoothing error covariance matrices is that they are valid only if the prior x a is the mean value x for the pdf of true states and if the error covariance matrix S a is the covariance matrix for that pdf (S e = S a ).Rodgers (2000, p. 49) and von Clarmann (2014) provide a detailed discussion of the errors induced by failing to meet this assumption.Since these assumptions define our prior, they can be taken as valid for the purpose of selecting an appropriate state vector dimension in an inverse problem.However, they should not be used to diagnose errors on the inversion results.

Aggregation methods
Aggregation of state vector elements to reduce the state vector dimension introduces aggregation error, as described in Sect.3.1.The aggregation error can be reduced by grouping elements with correlated errors.Analyzing the off-diagonal structure of a precisely constructed prior error correlation matrix would provide the best objective way to carry out the aggregation, as described by Bocquet (2009), Bocquet et al. (2011), andWu et al. (2011).We generally lack such information but do have some qualitative knowledge of prior error correlation that can be used to optimize the aggregation.By aggregating regions that have correlated errors we can exploit additional information that would otherwise be neglected in a native-resolution inversion assuming (by default) uncorrelated errors.
Previous work by Bocquet et al. (2011), Wu et al. (2011), and Koohkan et al. (2012) used tiling and tree-based aggregation methods, while Wecht et al. (2014) used a hierarchal clustering method based on prior error patterns.Bocquet and Wu (2011) also used principal component analysis (PCA) coupled to the hierarchal grid to compute an optimal grid.Here we compare three aggregation methods: (1) simple grid coarsening, (2) PCA clustering, and (3) a Gaussian mixture model (GMM) with radial basis functions (RBFs) to project native-resolution state vector elements to Gaussian pdfs.A qualitative illustration of these methods is shown in Fig. 1 for the aggregation of a native-resolution state vector of methane emissions with 1 2 • × 2 3 • native grid resolution over North America (Turner et al., 2015).We focus here on spatial aggregation and assume that the state vector has no temporal dimension.However, the same methods can be used for temporal aggregation.
The simplest method for reducing the dimension of the state vector is to merge adjacent elements, i.e., neighboring grid cells.This method considers only spatial proximity as a source of error correlation.It may induce large aggregation errors if proximal, but otherwise dissimilar regions are aggregated together.In the case of methane emissions, aggregating neighboring wetlands and farmland would induce large errors because different processes drive methane emissions from these two source types.
The other two methods enable consideration of additional similarity factors besides spatial proximity when aggregating state vector elements.These similarity factors are expressed by vectors of dimension n describing correlative properties • resolution of the GEOS-Chem chemical transport model.The criteria are normalized and then weighted (weighting factor).Criteria 4-14 are prior emission patterns used in the GEOS-Chem model (Wecht et al., 2014;Turner et al., 2015).b The weighting factors (dimensionless) measure the estimated relative importance of the different similarity criteria in determining prior error correlations in the state vector.For the prior emission patterns these weighting factors are the fractional contributions to total prior emissions in North America.c Distance in kilometers from the equator.d Distance in kilometers from the prime meridian.e Initial scaling factors from one iteration of an adjoint inversion at the native resolution.
of the original native-resolution state vector elements.In the case of a methane source inversion, for example, we can choose as similarity vectors latitude and longitude to account for spatial proximity, but also wetland fraction to account for error correlations in the bottom-up wetland emission estimate used as prior.

Similarity matrix for aggregation
Table 1 lists the similarity vectors chosen for our example problem of estimating methane emissions (Turner et al., 2015).The first two vectors account for spatial proximity, the third represents the scaling factors from the first iteration of an adjoint-based inversion at native resolution (Wecht et al., 2014), and the others are the source type patterns from the bottom-up inventories used as prior.All similarity vectors are normalized and then weighted by judgment of their importance.We choose here to include initial scaling factors from the adjoint-based inversion because we have them available and they can serve to correct any prior patterns that are grossly inconsistent with the observations, or to identify local emission hotspots missing from the prior.One iteration of the adjoint-based inversion is computationally inexpensive and is sufficient to pick up major departures from the prior.Let {c 1 , . .., c K } represent the K similarity vectors chosen for the problem (K = 14 in our example of Table 1).We assemble them into a n × K similarity matrix C. We will also make use of the ensemble of similarity vector values for individual state vector elements, which we assemble into vectors {c 1 , . .., c n } representing the rows of C. Thus: In this work all of the aggregation methods except for grid coarsening will use the same similarity matrix to construct the restriction operator.This approach of using a similarity matrix C to account for prior error covariances bears some resemblance to the geostatistical approach for inverse modeling (e.g., Michalak et al., 2004Michalak et al., , 2005;;Gourdji et al., 2008;Miller et al., 2012).The geostatistical approach specifies the prior estimate as x a = Cβ, where β is a vector of unknown drift coefficients to be optimized as part of the inversion.Here we use the similarity matrix to reduce the dimension of the state vector, rather than just as a choice of prior constraints.

Clustering with principal component analysis
In this method we cluster state vector elements following the principal components of the similarity matrix.It is generally not practical to derive the principal components in state vector space because the n-dimension is large.Instead we derive them in similarity space (dimension K) as the eigenvectors of C T C sorted in order of importance by their eigenvalues.The leading j principal components are kept for clustering.The reduced state vector is then constructed by grouping state vector elements that have the same sign patterns for all j principal components.Each unique j -dimensional sign pattern constitutes a cluster.The number of clusters defined in that way ranges between j and 2 j .Figure 1b shows an example of applying this method to methane emissions in North America with reduction of the state vector to n = 8.The separation into four quadrants reflects the importance of latitude and longitude as error correlation factors.The additional separation within each quadrant isolates large from weak sources as defined by the prior.

Gaussian mixture model (GMM)
Here we use a Gaussian mixture model (GMM; Bishop, 2007) to project the native-resolution state vector onto p Gaussian pdfs using radial basis functions (RBFs).Mixture models are probabilistic models for representing a population comprised of p subpopulations.Each subpopulation is assumed to follow a pdf, in this case Gaussian.The Gaussians are K-dimensional where K is the number of similarity criteria.Each native-resolution state vector element is fit to this ensemble of Gaussians using RBFs as weighting factors.
The first step in constructing the GMM is to define a p × n weighting matrix W = [w 1 , w 2 , . .., w p ] T .Each element w i,j of this weighting matrix is the relative probability for native-resolution state vector element j to be described by Gaussian subpopulation i; i.e., "how much does element j look like Gaussian i?".It is given by Here c j is the j th row of the similarity matrix C, µ i is a 1×K row vector of means for the ith Gaussian, i is a K × K covariance matrix for the ith Gaussian, and π = π 1 , . .., π p T is the relative weight of the p Gaussians in the mixture.N c j |µ i , i denotes the probability density of vector c j on the normal distribution of Gaussian i.We define a p × K matrix M with rows µ i and a K × K × p third-order tensor L = [ 1 , . .., p ] as the set of covariance matrices.Projection of the native-resolution state vector onto the GMM involves four unknowns: W, π , M, and L. This is solved by constructing a cost function to estimate the parameters of the Gaussians in the mixture model using maximum likelihood: Starting from an initial guess for π, M, and L we compute the weight matrix W using Eq. ( 31).We then differentiate the cost function with respect to π , M, and L, and set the derivative to zero to obtain (see Bishop, 2007) where The weights are re-calculated from the updated guesses of W, π, M, and L from Eqs. ( 33) to (36), and so on until convergence.The final weights define the restriction operator as ω = W.The computational complexity for the expectationmaximization algorithm is O(nK +pn 2 ) (Chen et al., 2007); however, the actual runtime will be largely dictated by the convergence criteria.Here we use an absolute tolerance of and the superscript star indicates the value from the The GMM allows each native-resolution state vector element to be represented by a unique linear combination of the Gaussians the RBFs.For a state vector of a given dimension, defined by number of Gaussian pdfs, we achieve high for large localized sources sacrificing resolution for weak or uniform source regions where is not needed.This is in 2 with the resolution of Southern California in an inversion of methane sources for North America.The figure shows the three dominant Gaussians describing emissions in Southern California and the corresponding RBF weights for each native-resolution grid square.Gaussian 1 is centered over Los Angeles and is highly localized, Gaussian 2 covers the Los Angeles Basin, and Gaussian 3 is a Southern California background.The sum of these three Gaussians accounts for most of the emissions in Southern California and Nevada (which is mostly background).Additional Gaussians (not shown) resolve the southern San Joaquin Valley (large livestock and oil/gas emissions) and Las Vegas (large emissions from waste).

Application
We apply the aggregation methods described above to our example problem of estimating methane emissions from satellite observations of methane concentrations, focusing on selecting a reduced-dimension state vector that minimizes aggregation and smoothing errors.The inversion is described in detail in Turner et al. (2015) and uses GOSAT satellite observations for 2009-2011 over North America.The forward model for the inversion is the GEOS-Chem CTM with 1 2 • grid resolution.The native-resolution state vector of methane emissions as defined on that grid includes 7366 elements.
For the purpose of selecting an aggregated state vector for the inversion, we consider a subset of observations for May 2010 (m = 6070) so that we can afford to construct the corresponding Jacobian matrix K at the native resolution; this is necessary to derive the aggregation error covariance matrix following Eq.( 17).The prior error covariance matrix is specified as diagonal with 100 % uncertainty at the native resolution, decreasing with aggregation following the central limit theorem (Turner et al., 2015).The observational error covariance matrix is also diagonal and specified as the scene-specific retrieval error from Parker et al. (2011), which dominates the total observational error as shown by Turner et al. (2015).We compare the three methods presented in Sect. 4 for aggregating the state vector in terms of the implications for aggregation and smoothing errors for different state vector dimensions.In addition to the GMM with RBFs, we also consider a "GMM clustering" method where each native-resolution state vector element is assigned exclusively to its dominant Gaussian pdf.This yields sharp boundaries between clusters (Fig. 1) as in the grid coarsening and PCA methods.
Figure 3 shows the mean error standard deviation in the aggregation and smoothing error covariance matrices, computed as the square root of the mean of the diagonal terms, as a function of state vector dimension.The aggregation error is zero by definition at the native resolution (7366 state vector elements), and increases as the number n of state vector Aggregation and smoothing error dependences on the aggregation of state vector elements in an inverse model.The application here is to an inversion of methane emissions over North America using satellite methane data with 7366 native-resolution state vector elements (Sect. 5 and Turner et al., 2015).Results are shown as the square roots of the means of the diagonal terms (mean error standard deviation) in the aggregation and smoothing error covariance matrices.Different methods for aggregating the state vector (Sect.4) are shown separate lines.Note the log scale on the x axis.
elements decreases, following a roughly −0.7 dependence.Conversely, the smoothing error increases as the number of state vector elements increases, following roughly a log(n) dependence.The different aggregation methods of Sect. 4 yield very similar smoothing errors, suggesting that any reasonable aggregation scheme (such as k means clustering; c.f. Bishop, 2007) would perform comparably.The aggregation error is somewhat improved using the GMM method.RBF weighting performs slightly better than GMM clustering (sharp boundaries).As discussed above, a major advantage of the GMM method is its ability to retain resolution of large localized sources after aggregation.
Figure 4 shows the sum of contributions from aggregation, smoothing, and observational error standard deviations .Total error budget from the aggregation of state vector elements in an inverse model.The application here is to an inversion of methane emissions over North America using satellite methane data with 7366 native-resolution state vector elements (Sect.5 and Turner et al., 2015).Results are shown as the square roots of the means of the diagonal terms (mean error standard deviation) in the aggregation, smoothing, and observational error covariance matrices, and for the sum of these matrices.Aggregation uses the GMM with RBF weighting (Sect.4).There is an optimum state vector size for which the total error is minimum and this is shown as the circle.Gray shading indicates the 90 % range for the total error on individual elements as diagnosed from the 5th and 95th quantiles of diagonal elements in the total error covariance matrix.Note the log scale on the x axis.

Number of State Vector Elements
as a function of state vector aggregation using the GMM with RBF weighting.In this application, aggregation error dominates for small state vectors (n < 100), but drops below the observation error for n > 100 and below the smoothing error for n > 1000.The smoothing error remains smaller than the observational error even at the native resolution (n = 7366).The observational error is not independent of aggregation, as shown in Eq. ( 29), but we find here that the dependence is small.
From Fig. 4 we can identify a state vector dimension for which the total error is minimum (n = 2208; circle in Fig. 4).However, error growth is small until n ≈ 200, below which the aggregation error grows rapidly.A state vector of 369 elements, as adopted by Turner et al. (2015), does not incur significant errors associated with aggregation or smoothing, and enables computation of an analytical solution to the inverse problem with full error characterization.
Previous work by Bocquet (2009), Bocquet et al. (2011), Bocquet and Wu (2011), Wu et al. (2011), and Koohkan et al. (2012) analyzed the scale dependence of different grids using the degrees of freedom for signal: DFS = Tr(I−S −1 a,ω Ŝω ).These past works found this error metric to be monotonically increasing.This implies that the native-resolution grid will have the least total error and there is no optimal resolution, except from a numerical efficiency standpoint.Here we find a local minimum that is, seemingly, at odds with this previous work.However, the reasoning for this local minimum is that we have allowed the aggregation to account for spatial error correlations that we are unable to specify at the native resolution.As such, we are taking more information into account and obtaining a minimum total error at a state vector size that is smaller than the native resolution.If the nativeresolution error covariance matrices were correct, then, as previous work showed, the only reason to perform aggregation would be to reduce the computational expense and the grid used here would be suboptimal because it does not depend on the native-resolution grid.

Conclusions
We presented a method for optimizing the selection of the state vector in the solution of the inverse problem for a given ensemble of observations.The optimization involves minimizing the total error in the inversion by balancing the aggregation error (which increases as the state vector dimension decreases), the smoothing error (which increases as the state vector dimension increases), and the observational error.We further showed how one can reduce the state vector dimension within the constraints from the aggregation error in order to facilitate an analytical solution to the inverse problem with full error characterization.
We explored different methods for aggregating state vector elements as a means of reducing the dimension of the state vector.Aggregation error can be minimized by grouping state vector elements with the strongest correlated prior errors.We showed that a Gaussian mixture model (GMM), where the state vector elements are multi-dimensional Gaussian pdfs constructed from prior error correlation patterns, is a powerful aggregation tool.Reduction of the state vector dimension using the GMM retains fine-scale resolution of important features in the native-resolution state vector while merging weak or uniform features.

Figure 1 .•
Figure 1.Illustration of different approaches for aggregating a state vector.Here the native-resolution state vector is a field of gridded methane emissions at 1 2

Figure 2 .•
Figure 2. Gaussian mixture model (GMM) representation of methane emissions in Southern California with Gaussian pdfs as state vector elements.The Gaussians are constructed from a similarity matrix for methane emissions on the 1 2 Figure3.Aggregation and smoothing error dependences on the aggregation of state vector elements in an inverse model.The application here is to an inversion of methane emissions over North America using satellite methane data with 7366 native-resolution state vector elements(Sect.5 and Turner et al., 2015).Results are shown as the square roots of the means of the diagonal terms (mean error standard deviation) in the aggregation and smoothing error covariance matrices.Different methods for aggregating the state vector (Sect.4) are shown separate lines.Note the log scale on the x axis.
Figure 4. Total error budget from the aggregation of state vector elements in an inverse model.The application here is to an inversion of methane emissions over North America using satellite methane data with 7366 native-resolution state vector elements (Sect.5 andTurner et al., 2015).Results are shown as the square roots of the means of the diagonal terms (mean error standard deviation) in the aggregation, smoothing, and observational error covariance matrices, and for the sum of these matrices.Aggregation uses the GMM with RBF weighting (Sect.4).There is an optimum state vector size for which the total error is minimum and this is shown as the circle.Gray shading indicates the 90 % range for the total error on individual elements as diagnosed from the 5th and 95th quantiles of diagonal elements in the total error covariance matrix.Note the log scale on the x axis.

Table 1 .
Similarity vectors for inverting methane emissions in North America a .