Technical Note: Methods for Interval Constrained Atmospheric Inversion of Methane

Atmospheric Chemistry and Physics Discussions This discussion paper is/has been under review for the journal Atmospheric Chemistry and Physics (ACP). Please refer to the corresponding final paper in ACP if available. Abstract Three interval constrained methods, including the interval constrained Kalman smoother, the interval constrained maximum likelihood ensemble smoother and the interval constrained ensemble Kalman smoother are developed to conduct inversions of atmospheric trace gas methane (CH 4). The negative values of fluxes in an uncon-5 strained inversion are avoided in the constrained inversion. In a multi-year inversion experiment using pseudo observations derived from a forward transport simulation with known fluxes, the interval constrained fixed-lag Kalman smoother presents the best results, followed by the interval constrained fixed-lag ensemble Kalman smoother and the interval constrained maximum likelihood ensemble Kalman smoother. Consistent 10 uncertainties are obtained for the posterior fluxes with these three methods. This study provides alternatives of the variable transform method to deal with interval constraints in atmospheric inversions.


Introduction
The atmospheric inversion modeling, often called the top-down approach, is an important way to quantify the magnitudes of various sources and sinks of trace gases (Enting, 2002).It proceeds by comparing the forward model simulations from an atmospheric transport model driven by sources and sinks from prior knowledge to the spatiotemporally discrete observations.The prior sources and sinks are then optimized to provide improved estimates through some optimization schemes, which are often reduced to minimizing a cost function that characterizes the differences between the forward model simulation and observations (e.g., Gurney et al., 2002).
Methods deduced from the Bayesian theorem (Tarantola, 2005), including the fixedlag Kalman smoother (KS) (Hartley and Prinn, 1993;Bruhwiler et al., 2005), fixed-lag ensemble Kalman smoother (EnKS) (Peters et al., 2005) and fixed-lag maximum likelihood ensemble filter (Zupanski et al., 2007), have been widely used to invert the various trace gas fluxes.When used properly, those methods give reasonable posterior inferences conditioned on the available observations.However, in applying these methods to do inversions, sometimes the inverted results are of some physically inaccessible values.Therefore, these methods should be improved to impose proper constraints, for instance, interval constraints as shown in this study, in addition to the constraints provided by observations.
To apply the interval constraints imposed over the state variables, the variable transform method can be applied efficiently, as was done in Tang and Zhuang (2010).However, the variable transform involved is non-linear, and is thus difficult to deal with using the KS, which is developed based on linear dynamics.Another problem with the variable transform method is the difficulty to interpret the posterior uncertainties, which are not defined in the same space of the inverted fluxes.In this note, we show the way to solve the problem using the interval constrained inversion methods.The proposed methods directly optimize the state variables in its original space, and the interval constraints are imposed either through a posterior correction or a direct constrained minimization.Three different methods developed are the interval constrained fixed-lag Kalman smoother (ICKS), interval constrained fixed-lag maximum likelihood ensemble smoother (ICMLES) and the interval constrained fixed-lag ensemble Kalman smoother (ICEnKS).These methods are evaluated with an inversion using pseudo observations of atmospheric CH 4 concentrations derived from a forward atmospheric transport of CH 4 with known fluxes.

The inversion problem and its lagged-form
where z is the vector of observations, s is the vector of sinks and sources, H is the sensitivity matrix that maps the flux s into measurement space, and v is the uncertainty 19983 An inversion is to solve for s in Eq. (1) using the Bayes theorem, by assuming variables z, s and v as random variables with certain probability distributions (Tarantola, 2005).

Discussion
In the lagged form, the forward equation Eq. ( 1) is where s u is defined by fluxes that are still being estimated, from time J back to time J−L+1, and s v is defined by fluxes that are no longer updated, from time J−L back to time 1.The observation operators H u and H v are defined accordingly for s u and s v .Since s v is no longer updated once it is obtained, we combine the term H v (s v ) into the measurement and denote the state variable by s to simplify the presentations hereinafter, unless explicitly stated otherwise.

The fixed-lag interval constrained Kalman smoother
By assuming normal distributions of the prior estimate and the measurements, the cost function solved by an interval constrained inversion is where m is the length of the state vector s, and the superscript − denotes the prior estimate.
There are different ways to solve Eq. ( 4), e.g. the L-BFGS-B algorithm (Zhu et al., 1997).However, a two-stage strategy is used in this study to obtain the posterior estimate of the state variable and the covariance.At the first stage, Eq. ( 4) is solved as an unconstrained problem, which uses the Kalman update where the Kalman gain is Extension to including correlations between on-line (variables are still being updated) and off-line (variables that are no longer updated) state variables is straightforward (see Bruhwiler et al., 2005;Tang and Zhuang, 2010).However, for the specific problem in our study, the gain from accounting for such correlation is rather small, as we showed in Tang and Zhuang (2010).
At the second stage, the estimate from Kalman update at the first stage is updated to satisfy the interval constraints by minimizing the cost function Eq. ( 8) is solved iteratively with the active set method (Murty, 1988), just as documented in Tang and Zhuang (2010).When a set of active constraints are identified, the constraints are set to equality, such that where the linear operator c chooses the proper active constraints.Then a new solution is found using the Kalman update After the new solution is obtained, if there are still constraints being violated, the iteration is repeated until all interval constraints are satisfied.Zigzag may happen during iterations.Anti-cycling rules are sometimes needed to stop the iteration (Murty, 1988).However, we have not experienced such situation in this study.

The fixed-lag interval constrained maximum likelihood ensemble smoother
The maximum likelihood ensemble filter for unconstrained problem was proposed in Zupanski (2005), and documented in Zupanski et al. (2008).It uses a set of N+1 ensemble simulations to approximate the covariance, and update the maximum likelihood estimation in the space spanned by the ensemble.In their formulation (Zupanski, 2005), a Hessian pre-conditioning is used to make the minimization problem wellposed.The posterior maximum likelihood estimation is obtained through the variable transform where the matrix C is defined as where q i =s i −s − , i =1,•••,N are the prior perturbations, derived from the ensemble simulations with respect to the prior maximum likelihood estimator s − .
The gradient function used in minimization is The posterior covariance is which is similar to that of the ensemble transform Kalman filter (Bishop et al., 2001).
To apply the maximum likelihood ensemble smoother for the interval constrained inversion in this study, we find it is useful to solve the problem in terms of s, rather than in term of the transformed variable ξ.This enables the direct application of available constrained minimization algorithms, e.g. the L-BFGS-B algorithm (Zhu et al., 1997).
The gradient function used for minimization is then Let the singular vector decomposition of Q − 1/2 be where D E is a diagonal matrix of size N×N, with diagonal values filled by the eigen values.
This provides an approximation to the gradient as For a full rank covariance matrix Q − , minimization of the cost function in Eq. ( 4) using s as an independent variable and using ξ as an independent variable will give same results.Numerical difficulty may be encountered when Q − is approximated with an ensemble whose size is less than the dimension of the system, or if Q − is dominated by some leading order eigen values.In such cases, a truncated form may be used.This only uses eigen vectors that contribute most of the variance, e.g.95% of the covariance, a criteria often used in screening the eigen values.The posterior covariance for the constrained estimation is still derived using Eq. ( 17).19987 Another feature of the ICMLES is that it is initialized by sampling the ensembles from a truncated multi-dimensional Gaussian distribution.This makes all prior ensemble members satisfy the constraints.After the posterior update, a new ensemble is created by sampling the updated truncated Gaussian distribution, and is used for next update.The methods to sample from a truncated multi-dimensional Gaussian distribution will be presented in Sect.2.5.

The fixed-lag interval constrained ensemble Kalman smoother
The fixed-lag interval constrained ensemble Kalman smoother (ICEnKS) is implemented to solve the minimization problem by Eq. ( 4).After initialized with a set of constrained ensemble simulations (see Sect. 2.5 for the method to sample from a truncated Gaussian distribution), the forecast statistics are computed as The ensemble is updated by solving the constrained minimization problem for each member simulation where is the perturbed observations, with j drawn from the distribution N(o,R).
After updating the ensemble members, the posterior ensemble statistics are computed as

Sampling from a truncated multi-dimensional Gaussian distribution
We now describe the method that is needed by ICMLES and ICEnKS to create ensembles from a truncated multi-dimensional Gaussian distribution.Our method is a generalization of the method presented in Prakash et al. (2010).
Given a sub-optimal approximation of the square root of covariance matrix Q 1/2 (of size m×n), the goal of sampling is to get n samples satisfying the condition where u is drawn from a n-dimensional normal distribution N(o,I), truncated with proper bounds determined by the interval constraint and Q 1/2 .
In implementation, Q 1/2 should be of the shape Usually from the posterior update, the structure of Q + 1/2 is different from the one in Eq. (30).A qr factorization of Q + T/2 can be performed first to get the required shape of matrix, i.e. the R matrix from a qr factorization.
Specifically, for j -th sample, the i -th (i ≤ n) component s j (i ) is obtained by sampling u j (i ) recursively from N(0,1), truncated with bounds The sampling is done with the rejection sampling algorithm (von Neumann, 1963), using a uniform distibution as the instrumental distribution.
The remaining components of s j (i ) with i >n, are determined by

Implementation and comparison experiment of inversions using pseudo observations
The A similar set up to that in Tang and Zhuang (2010) is used for the comparison experiment, so only necessary information is briefed here.Specifically, the sensitivity matrix is derived by running a group of tagged-CH 4 simulations using the GEOS-Chem model (Bey et al., 2001;Wang et al., 2004), driven by the GEOS-5 meteorology data at a resolution of 4 • ×5 • .Pseudo measurements are taken by sampling at 211 locations (excluding the towers) involved in the dataset of globalview-CH 4 2009 (GLOBALVIEW-CH4, 2009).Sampling errors are simulated by using the relative residual error method (Palmer et al., 2003) derived by an optimized reference run against the globalview data.
The prior fluxes are obtained by adding random perturbations to a set of known fluxes (11 seasonal fluxes and 7 yearly constant fluxes, see Tang and Zhuang, 2010), and are then used to obtain prior CH 4 concentrations at those sampling locations.The inversion is assessed by comparing the inverted fluxes and CH 4 concentrations to the known fluxes and CH 4 concentrations from the reference run.

Results and discussion
The prior fluxes and sampled prior CH 4 concentrations are shown in Fig. 1.Linear regression indicates the prior fluxes are rather poor approximations to the true fluxes.This makes the inversion difficult to get posterior fluxes to agree well with the known truth, as we will show below.
We first showed the results from the unconstrained inversion (Fig. 2).A lag length of 6 is used in all the inversions.It was shown in Tang and Zhuang (2010) such a lag length is sufficient to give stable inversions.All methods showed good posterior CH 4 concentrations compared to the observed data.However, because of the over-posed set up of the problem, unrealistic negative values of fluxes are inferred due to some spurious correlations among the different fluxes.We also compared the inversion using the MLES formulation in Zupanski (2005).It was found that their formulation provides slightly better posterior CH 4 concentrations.This can be explained by the use of eigen value screening in our study, which actually implies a greater null space of the state variables than that using the formulation in Zupanski (2005).Inclusion of correlation among on-line and off-line state variables does not help to keep the inverted fluxes within their feasible ranges (result not shown).Therefore, constrained inversion methods should be applied.This was achieved by using the variable transform technique in Tang and Zhuang (2010).There, reasonable posterior fluxes were obtained, but the posterior uncertainty is difficult to interpret.Thus, inversion methods that directly impose the constraint in the space of state variables are useful.
With the interval constrained methods, all inverted fluxes are in their feasible ranges (Fig. 3 and Table 1).The statistics of the posterior CH 4 concentrations are similar to that from the unconstrained inversion for KS and EnKS.The ICMLES did not provide better posterior CH 4 concentrations than other methods when compared to the observations.For the posterior fluxes, ICKS performed the best, followed by the ICEnKS and ICMLES.The posterior for the wetland emissions from the northwestern region (defined by 45 • N north, 180 • W to 0 • W) in a typical year inverted from the three different interval constrained methods is shown in Fig. 4. We found that the three methods showed consistent estimates.All three methods have allowed the flux to be negative in the presence of uncertainty in the low emission months.This is a problem for unconstrained inversions, but not an issue for interval constrained inversions.
In addition, we tested the impact of ensemble sizes on the two ensemble-based interval constrained approaches.For ICEnKS, increasing the ensemble size to 200 or decreasing the ensemble size to 50, both give inferior results to the one using an ensemble size of 100 (see Fig. 5, Table 2).Similar results are observed for the ICMLES when the ensemble size is increased or decreased (see Fig. 6, Table 2).The reason is that the covariance matrix is mostly determined by half of the eigen values, which is around 50 for the lag length of 6.This implies a pair-wised ensemble would be of size

Conclusions
When the inversion is done with unconstrained methods, the inverted fluxes are found sometimes to be of physically infeasible values.This problem is solved with the interval constrained methods, including the interval constrained Kalman smoother, the interval constrained maximum likelihood ensemble smoother and the interval constrained ensemble Kalman smoother.A comparison experiment to invert CH 4 fluxes against the pseudo observations of atmospheric CH 4 concentrations from a forward transport of CH 4 with known fluxes indicates that the three methods are able to constrain the inversion to provide physically feasible posteriors.These methods provide alternatives to the traditional variable transform method to deal with interval constraints in inversions.
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 11) 19985 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

)
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 30) 19989 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | three different methods are coded with Fortran 95.The linear algebra is carried out with publicly available packages of BLAS and LAPACK in the Intel compiler.The state variable is defined as a vector of scaling factors of the flux adjustments defined with respect to the prior fluxes.Thence, the posterior fluxes are the sum of prior fluxes and their inverted adjustments.The interval constraints are set by confining the scaling factors in the range [−0.95, 2.0] for all flux adjustments, except for that of the stratospheric destruction, which is set to [−0.2, 0.2].Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

19991
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Paper | Discussion Paper | Discussion Paper | Discussion Paper | Fig. 1.(a) Prior fluxes; (b) prior CH 4 concentrations used in the inversion experiment.

Fig. 2 .Fig. 4 .
Fig. 2. (e) posterior fluxes from unconstrained MLES inversion, using formulae in this study; (f) posterior CH 4 concentrations from MLES inversion, using formulae in this study; (g) posterior fluxes from unconstrained EnKS inversion; (h) posterior CH 4 concentrations from EnKS inversion.A ensemble size of 50 is used for MLES inversion, and an ensemble size of 100 is used for EnKS inversion.The regressions are statistically significant with p<0.0001.20000

Table 1 .
Statistics of the interval constrained inversion experiments against the observations.

Table 2 .
Statistics of the interval constrained inversion experiments against the observations with different ensemble sizes.