Note : Adapting a fixed-lag Kalman smoother to a geostatistical atmospheric inversion framework

Inverse modeling methods are now commonly used for estimating surface fluxes of carbon dioxide, using atmospheric mass fraction measurements combined with a numerical atmospheric transport model. The geostatistical approach to flux estimation takes advantage of the spatial and/or temporal correlation in fluxes and does not require prior flux estimates. In this work, a previously-developed, computationally-efficient, fixed-lag Kalman smoother is adapted for application with a geostatistical approach to atmospheric inversions. This method makes it feasible to perform multi-year geostatistical inversions, at fine resolutions, and with large amounts of data. The new method is applied to the recovery of global gridscale carbon dioxide fluxes for 1997 to 2001 using pseudodata representative of a subset of the NOAA-ESRL Cooperative Air Sampling Network.


Introduction
Inverse modeling methods are now commonly used for estimating surface fluxes of carbon dioxide, using atmospheric mass fraction measurements combined with a numerical atmospheric transport model.The majority of recent studies have implemented a Bayesian synthesis inversion approach (e.g.Enting, 2002) applied to continental or sub-continental regions.In the majority of these applications, the errors associated with prior flux estimates were considered uncorrelated, as were the errors between the modeled and observed measurements.Researchers and policy makers are increasingly interested in estimating sources and sinks of greenhouse gases at finer spatial and temporal discretizations.This exacerbates two issues associated with the classical Bayesian setup.First, the assumption of uncorrelated errors becomes Correspondence to: A. M. Michalak (amichala@umich.edu)more invalid, because a priori flux estimates are likely to have consistent errors within regions.Second, the computational cost of the inversion increases, with a batch setup requiring the inversion of a matrix with dimensions of either the number of observations or the number of fluxes to be estimated.This computational cost becomes prohibitive as inversions are performed using more data, at finer scales, and over longer periods.One solution to the first of these problems was recently proposed by Michalak et al. (2004) in the form of a geostatistical formulation of the inverse problem.Such a setup does not require the use of prior flux estimates and takes advantage of the spatial correlation between fluxes, making it particularly well suited for inversion at small spatial scales.One solution to the second of these problems was recently proposed by Bruhwiler et al. (2005) in the form of a fixed-lag Kalman smoother (FLKS) that steps through an inversion in multiple steps while conserving information about the covariance between sequential sets of fluxes.This method builds upon the time-stepping approach presented in Law (2004), and dramatically increases the computational efficiency of inversions, while providing uncertainty estimates almost identical to those obtained using batch inversions.However, the method presented in Bruhwiler et al. (2005) is not applicable in a geostatistical setup, due to the lack of a priori estimates of fluxes.Other recently proposed numerical tools based on variational approaches (e.g.Chevallier et al., 2005;Baker et al., 2006) and ensemble methods (e.g.Peters et al., 2005;Zupanski et al., 2007) can solve large inverse problems, but are not designed to provide full information on flux uncertainties and their covariances.
The objective of this technical note is first to develop the geostatistical counterpart to the method of Bruhwiler et al. (2005), yielding a method that combines the desirable characteristics of a geostatistical setup, and offers the computational efficiencies of the Kalman smoother.Second, the new method is tested by estimating global monthly-averaged fluxes at the 5.0 • longitude by 3.75 • latitude grid scale, Published by Copernicus Publications on behalf of the European Geosciences Union.
A. M. Michalak: Geostatistical fixed-lag Kalman smoother using pseudodata generated at 44 observation sites from the NOAA-ESRL Cooperative Air Sampling Network (Tans and Conway, 2005), in order to verify that the proposed approach yields estimates consistent with those from a batch geostatistical inversion.

Geostatistical inverse modeling
The geostatistical approach to inverse modeling is a Bayesian approach in which the prior probability density function is based on an assumed form for the spatial and/or temporal correlation of the surface fluxes to be estimated.This differs from the traditional Bayesian approach, where the prior information is in the form of initial surface flux estimates.Geostatistical flux estimates are not subject to some of the limitations of traditional Bayesian inversions, such as potential biases created by the choice of prior fluxes and aggregation error resulting from the use of large regions with prescribed flux patterns (Michalak et al., 2004).The geostatistical approach is also ideally suited to inversions at fine spatial scales.The objective function used in the solution of a linear geostatistical inverse problem is the negative logarithm of the a posteriori probability density function p": where H is an (N×M) matrix of sensitivities of the observations z (with dimensions N×1) to the discretized unknown surface flux distribution s (with dimensions M×1), R is the (N ×N) model-data mismatch covariance matrix, Xβ is the model of the mean of the flux distribution, where X (with dimensions M×p) contains known information on the form of the mean trend of the fluxes and β (with dimensions p×1) are unknown drift coefficients (e.g. the fluxes can have a constant but unknown mean), and the (M×M) flux covariance matrix Q is based on a spatial and/or temporal correlation structure of flux deviations from the mean trend.The inverse problem involves solving for both β and s, and the form of the solution is therefore different from the classical Bayesian setup (Michalak et al., 2004).
The best estimates of s are obtained by finding the minimum of L s,β with respect to both s and β.After some algebra the system of linear equations can be expressed as: and, after solving for the observation weights and the Lagrange multipliers M (see Michalak et al., 2004 for a detailed discussion), the best estimate s and posterior uncertainty covariance V s of s are defined as: The reader is referred to Michalak et al. (2004) for a detailed discussion of the geostatistical approach to the inverse problem as applied to the estimation of sources and sinks of atmospheric trace gases.For the discussion presented in this paper, we will be estimating a total of T months of fluxes, discretized to m regions globally, using T sets of monthlyaveraged observations, sampled at n locations (i.e.M=T * m; N=T * n).

Fixed-lag Kalman smoother
The size of the matrix that must be inverted in the solution of a synthesis Bayesian inversion is either (N×N) or (M×M), depending on the selected setup (see, for example, Enting, 2002).The cost of the geostatistical inversion is almost identical, with the typical inversion being set up in (N+p) × (N+p) format (see Eq. 2), and an equivalent (M+p) × (M+p) system being the alternative (not shown).
Given that the geostatistical approach to the inverse problem is particularly interesting when fluxes are to be estimated at fine spatial resolutions, the system is typically underdetermined (M>N), and the form presented in Eq. ( 2) is more computationally economical.
As the spatial and or temporal resolution of the fluxes increases and as the total time period for which the fluxes are to be estimated becomes longer, M becomes very large and solutions in the (M×M) or (M+p) × (M+p) form become computationally prohibitive.Similarly, as the amount of data increases as a result of observation network expansions, an increase in the sampling frequency, and/or an increase in the total time period for which the fluxes are to be estimated, N becomes very large and solutions in the (N×N ) or (N+p) × (N+p) form become computationally prohibitive.These two situations are currently happening simultaneously, as researchers strive to estimate more fluxes using more data.
Recently, Bruhwiler et al. (2005) proposed a fixed-lag Kalman smoother (FLKS) to remedy this situation for synthesis inversions.This method allows for the sequential estimation of a subset of t m sets of fluxes (e.g.monthly-average fluxes) using a subset of t n sets of data (e.g.monthly-average observations), while providing a rigorous method for tracking the inferred temporal and spatial covariance between subsets of fluxes.The method is illustrated in Fig. 1.In the example in the figure, each set of monthly fluxes is estimated a total of three times (t m =3), each time using one month of atmospheric observations (t n =1).For each iteration, the latest estimate available for each month of fluxes and its covariance are used as prior information.A covariance propagation scheme allows for correlations between fluxes being estimated and fluxes no longer being estimated to be conserved.Mathematically, each step of the FLKS proceeds as follows: where s and s p now have dimensions (t m * m), z has dimensions (t n * n), and the other matrix dimensions are defined accordingly.In a typical setup, a single month of monthly-averaged observations would be used at a time, yielding a setup that requires the inversion of an n×n matrix.An equivalent form requiring an inversion of dimension (t m * m) × (t m * m) is: In this approach, s p are the most recent estimates of the subset of fluxes being estimated in a given step, Q is the most recent estimate of their covariance, z is the month of data being used to update these flux estimates, R is the covariance of model-data mismatch for these observations, and H relates the single month of observations to the several months of fluxes being estimated.In each iteration of the smoother, some fluxes are estimated for the first time, using a priori flux estimates in the corresponding portions of s p .Other fluxes are estimated for at least the second time, using the latest estimates of these fluxes from previous iterations in the corresponding portion of s p .The reader is referred to Bruhwiler et al. (2005) for additional details, including the equivalent equations for the case where the covariance is to be conserved between fluxes being estimated and fluxes no longer being estimated.Note that this approach does still require the calculation of the sensitivity of each observation to the estimated fluxes, but these sensitivities only need to be calculated for the number of months included in the lag of the Kalman smoother.These sensitivities can be calculated using an adjoint formulation of the atmospheric transport model in the case where M>N, yielding one model run per observation.

Derivation of the Geostatistical Kalman smoother
The form of the solution developed in Bruhwiler et al. (2005) is compatible with the classical Bayesian approach.For the case of monthly flux estimates, independently obtained flux estimates (typically from flux inventories and/or biospheric models) are used as prior information the first time a given month of fluxes is estimated, and the latest (a posteriori) estimate is updated in the subsequent steps using additional months of atmospheric data.In the geostatistical approach, the system needs to account for the unknown components of the model of the mean (β) in obtaining the first estimate of a given month's fluxes, but needs to use the latest (a posteriori) estimates for subsequent estimates of a given month's fluxes.This requires a substantial modification to the form of the Kalman smoother because each step through the smoother involves both flux periods being estimated for the first time (with no prior flux estimate), and months being estimated for at least the second time (with the latest flux estimates used as priors).
In the discussion that follows, the subscript k will refer to variables associated with the newest set of fluxes which have not yet been estimated, j will refer to variables associated with fluxes that have been estimated at least once, i will refer to variables associated with fluxes that are no longer being estimated, and p will refer to flux estimates from a previous iteration.
We start with two populations of fluxes currently being estimated: s j are the fluxes that have already been estimated at least once, and s k are the fluxes that have not yet been estimated.In Sect.2.3.3,we will also refer to s i , which represent one or more months of fluxes which are no longer being estimated, but whose inferred covariance with s j can be incorporated into the estimation.In the case where each iteration adds one month and removes one month of fluxes from the active state (i.e. the set of fluxes being estimated in that step), the dimensions of s j are m (t m −1) ×1, and the dimensions of s k are m×1.The latest estimate of s j obtained in the previous iteration is designated s p , whereas the model for the mean behavior of fluxes not yet estimated is designated X k β k .The latest estimate of the covariance of s j is designated Q jj , the prior covariance of s k is designated Q kk , and the cross-covariance between s j and s k is designated Q j k .Jointly, these covariances are defined as Note that given that the fluxes s k have not yet been estimated in the inversion, Q j k and Q kj represent any prior information on the temporal covariance between fluxes s j and s k .In subsequent steps of the Kalman smoother, the covariance between consecutive months of fluxes will be determined based both on this prior information as well as temporal covariance information derived from the atmospheric data.If no temporal covariance is assumed a priori, Q j k =Q T kj =0.The objective function defining an inverse problem involving fluxes that have a prior estimate and others that do not can be written as: where H j is the sensitivity of the new observations z k to fluxes s j , and H k is the sensitivity of these same observations to fluxes s k .Note that throughout this derivation, the observations z have the background state (i.e. the effect of the months that we are no longer estimating) pre-subtracted.
In the next iteration, part of s j drops out of the active state and its estimate is treated as the final best estimate, whereas s k becomes part of s j .For the example presented in blue in Fig. 1, s j = {s l−1 , s l } , and s k = {s l+1 }.For the next iteration, presented in red, s j = {s l , s l+1 }, and s k = {s l+2 }.

Best estimate
First, let us define the inverse of Q as: where To obtain the best estimate of the fluxes, we take the first derivative of the objective function in Eq. ( 10) with respect to s j , s k , and β k and set it to zero in order to minimize the objective function.Manipulating these three equations and putting them into a system of equations we obtain: where H= H j H k is the full sensitivity matrix of the observations to all the fluxes being estimated.This linear system of equations is then inverted to obtain the best estimates.The above system of equations requires the inversion of a matrix of dimensions ((t m * m) +p) × ((t m * m) +p).
Following some linear algebra manipulations, a form analogous to the batch geostatistical inverse problem can be derived, which instead only requires the inversion of an where the best estimate of the fluxes becomes: In subsequent iterations through the smoother, s k and the portions of s j that will be estimated again become the new priors s p .

Posterior covariance
The inverse of the Hessian is typically used in inversions as an estimate of the posterior covariances and crosscovariances of fluxes.In this case, taking the second derivative of the objective function with respect to s j , s k , and β k , individually and in combination, we obtain: where V •,• represents the a posteriori covariance components of s j , s k , and β k .Following algebraic manipulations, the posterior covariance of the fluxes can be expressed in terms of the solution to Eq. ( 13): In subsequent iterations through the smoother, the portion of V s corresponding to fluxes that will be estimated again becomes the new Q jj .

Covariance correction
As discussed in Bruhwiler et al. (2005), we want to include the covariance between fluxes no longer being estimated and those still being estimated to avoid underestimating the uncertainty associated with fluxes being estimated at each step.
In order to do so, we perform the derivation described above a second time, this time including the influence of fluxes no longer being estimated, s i .First, let us define the inverse of Q as: where Q ii represents the final covariance of fluxes that are no longer being estimated, but that are temporally correlated to the current set of estimated fluxes.Q ij , Q ik , Q j i , and Q ki represent the inferred or assumed covariance between these older fluxes and the currently-estimated set.The portion of the inverse corresponding to the fluxes currently being estimated is: The corresponding objective function in the case of a geostatistical Kalman smoother becomes: where s i are fluxes that we are no longer estimating but that are correlated with the current set of fluxes.For the example presented in blue in Fig. 1, assuming that a single month is used for the covariance correction, s i = {s l−2 } , s j = {s l−1 , s l } , and s k = {s l+1 }.To obtain the best estimate of the fluxes, we minimize this objective function with respect to s i , s j , s k and β k .We then take into account the fact that, given that we are no longer updating s i , E s i − s p,i =0, and manipulate the resulting three equations as outlined in Sect.2.3.1.to obtain: where and the estimated fluxes are: An analytical expression for the a posteriori uncertainty that takes into account the cross-correlation between fluxes no longer being estimated and those still being estimated can be derived in a manner analogous to the method presented in Bruhwiler et al. (2005).Given the influence of the uncertainty of β k on the uncertainty of the fluxes, however, the resulting expression becomes exceedingly cumbersome.A computationally equivalent but simpler solution is to present the resulting covariance as a subset of a larger covariance by solving the system in Eq. ( 13), but where H= H i H j H k and Q is as defined in Eq. ( 18).The solution of the system defines the posterior covariance: where we only keep the lower right-hand block for future iterations because we are no longer updating estimates of s i and its covariance.

Sample application
The following section describes an application of the geostatistical fixed-lag Kalman smoother (GFLKS) to the estimation of global monthly-averaged surface fluxes of CO 2 on a 3.75 • latitude by 5.0 • longitude grid.Because the goal is to validate the proposed method, we choose a setup that is sufficiently small such that a batch geostatistical inversion can still be performed.We also use pseudodata (with added noise) to evaluate the ability of the method to recover the actual fluxes.generate the pseudodata were selected to reflect a realistic set of fluxes for CO 2 .The estimates used for fossil fuel (FF), oceanic exchange (OE), and net ecosystem production (NEP) were the same as those applied as priors in the Atmospheric Tracer Transport Model Intercomparison Project 3 (TransCom3) (Gurney et al., 2002(Gurney et al., , 2003)).All fluxes used to generate the pseudodata are constant from year to year, but OE and NEP fluxes have monthly within-year variations whereas FF fluxes are assumed constant.Note that although the fluxes used to generate the pseudodata do not exhibit year-to-year variability, the inversion does allow for such variability to be inferred.All flux data were defined on a 3.75 • latitude by 5.0 • longitude grid, which yields a 48×72 surface grid with a total of 3456 regions for which the surface fluxes are defined and will be estimated.Over the five year period, this results in 207 360 unknowns.Samples of the fluxes used to generate the pseudodata are presented in Fig. 2. Note that these fluxes are used only to generate the pseudodata and are not used in any way in the inversion.The sensitivity of the atmospheric measurements to surface fluxes (represented by matrix H) is calculated using an adjoint implementation of the Tracer Model 3 (TM3) atmospheric transport model (Kaminski et al., 1999;Rödenbeck et al., 2003).Sensitivities relating monthly averaged CO 2 observations at a subset of the NOAA observation network sites to monthly averaged grid-scale fluxes were calculated by Rödenbeck et al. (2003Rödenbeck et al. ( ) for 1982Rödenbeck et al. ( -2001Rödenbeck et al. ( , and the 1997Rödenbeck et al. ( -2001 subset of this transport information is used for the work presented here.The model uses interannually varying ECMWF wind fields.

Data and basis functions
In an effort to generate a set of pseudodata that is consistent with the amount of data typically used in inversion studies, the available basis functions were used to generate pseudodata for months and NOAA-ESRL sites where actual CO 2 data are available.Therefore, although the observational data have been numerically generated, their spatial and temporal distribution represents a subset of the NOAA-ESRL Cooperative Global Air Sampling Network's data collected for 1997 to 2001.Overall, the dataset consists of 2275 monthlyaveraged datapoints, collected over 60 months at a total of 44 sites.Random error with a standard deviation of σ R =0.25 ppm was added to the pseudodata to simulate the effect of measurement transport errors.Although this error is unrealistically low for real applications, the goal here was to magnify any differences between a batch approach and the GFLKS.Because using a low model-data mismatch increases the adjustments that must be made from the a priori to the a posteriori covariance matrix, and the degree to which flux estimates deviate from their overall trend Xβ, any approximations caused by the GFLKS would be more easy to detect when using a low model-data mismatch.Note that not every station has data at every month.A map illustrating the sites at which data was modeled, as well as the number of months for which these sites were sampled, is presented in Fig. 3. Given the 2275 observations and the 207 360 fluxes to be estimated, the inversion is strongly underdetermined.

Inversion setup
We assume that the background concentration in the atmosphere prior to the start of the inversion period is known, in order to avoid the "ramp-up" period typically necessary where the first several months of estimated fluxes are nonsensical as they simply represent the inversion's attempts to reproduce the initial background concentration.As was done in Michalak et al. (2004), surface fluxes are estimated us-ing a constant mean model with a different mean for land and ocean fluxes.These constants, however, are allowed to vary month to month.The restricted maximum likelihood approach (e.g.Michalak et al., 2004) can be used to estimate the covariance parameters in a geostatistical inversion, including the spatial and/or temporal covariance terms in Q and the model-data mismatch covariance parameters in R.
Given that this has been demonstrated previously and that we are working with pseudodata, we chose here to focus on the inversion step and have prescribed the covariance parameters based on the variability of the fluxes used in generating the pseudodata.A priori, temporal covariance is not considered in this case, land fluxes are assumed independent of ocean fluxes, and the spatial covariance was modeled as an exponential decay, leading to: where x u −x v is the great circle distance between gridcells at locations x u and x v , t u and t v are the dates of the estimated fluxes, and g is a binary variable identifying whether a particular gridcell is land or ocean.This setup leads to a block-diagonal Q matrix.The covariance parameters were Uncertainties on the GFLKS are in dashed line; uncertainties for the batch inversion are shaded.Panels (c) and (d) represent the difference between estimates and uncertainties obtained using the Kalman smoother, those obtained using a batch inversion, and the true fluxes used in generating the pseudodata.Note that the difference between estimation uncertainties (dashed line) are amplified by an order to magnitude to make them visible on the same scale as the flux differences.
σ 2 Q =0.40(µmol(m 2 s)) 2 and l Q =2700 km for land fluxes, and σ 2 Q =3.0×10 −3 (µmol(m 2 s)) 2 and l Q =5730 km for ocean fluxes.The model-data mismatch was modeled as independent with a fixed error variance, equal to the variance of the errors actually added to the generated pseudodata: where I n is an identity matrix of dimensions n.Based on the work of Bruhwiler et al. (2005), we chose to include 6 months of fluxes in the active state.This means that each month of fluxes is constrained by the subsequent 6 months of available atmospheric data.

Results and discussion
The main goal of the proposed approach is to decrease the computational cost associated with solving large-scale geostatistical inverse problems aimed at constraining budgets of atmospheric trace gases, while providing a best estimate and estimated uncertainty equivalent to those obtained using a batch inversion, where all fluxes are estimated using all available measurements.Past work by Bruhwiler et al. (2005) has established that the vast majority of information about monthly-averaged fluxes can be derived from the six months of subsequent observations.Therefore, even for inversions covering many years, the dimensions of the matrix to be inverted is limited to six months of observations, making the problem computationally manageable.Estimated gridscale fluxes for selected months of 2000 are presented in Fig. 4.These fluxes were obtained using the proposed Geostatistical Fixed Lag Kalman Smoother method.Equivalent fluxes obtained using a batch inversion are visually very similar to those in Fig. 4, and are therefore not presented here.This similarity indicates that the proposed method is able to reproduce estimates obtained using the geostatistical batch inversion in cases where a sufficient amount of observations (in this case 6 months) are used to estimate each month of fluxes.The estimated fluxes are smoother than the true fluxes presented in Fig. 2, which is indicative of the strongly underconstrained nature of the inverse problem.
Figure 5  uncertainty estimates relative to the batch inversion are very small relative to the magnitude of the estimated fluxes and their uncertainties.The best estimates obtained using the proposed approach are very similar to those obtained using the batch inversion approach.The average a posteriori uncertainties, expressed as a standard deviations, are 0.45 GtC/year and 0.46 GtC/year for the batch and GFLKS inversions, respectively.The uncertainties for the South Atlantic are 0.32 GtC/year and 0.34 GtC/year, respectively.The difference between these two sets of results could be further decreased by using additional months of observations to constrain each month of fluxes, if such a computational tradeoff were deemed appropriate.The differences between estimates are more pronounced relative to the magnitude of the total flux for underconstrained regions such as the South Atlantic, where more time is required for the flux signal to propagate to observations.This is also consistent with the fact that the relative a posteriori uncertainty is also larger for the South Atlantic.Importantly, the uncertainty estimated using the GFLKS with the covariance correction reflects the information content of the observations used to constrain the fluxes.As such, the uncertainty estimated using the Kalman filter is always slightly higher, correctly reflecting the slight loss of information content associated with using only six months of observations to constrain each month of fluxes.Without the correction, the uncertainty estimated with the GFLKS would in some cases be erroneously low, because it would ignore the inferred temporal covariance between fluxes.Figure 6 presents the estimated fluxes for two specific gridcells, to evaluate the impact of the Kalman smoother approach on estimates at the grid scale.As also seen in Fig. 5, the estimated fluxes and uncertainties are very similar to those obtained using the batch inversion.At the grid scale, the inferred uncertainty is sometimes marginally lower for the GFLKS, because the inferred temporal correlation at the grid scale spans more than a single month, whereas the implemented covariance correction included only one month.If more months had been included in the covariance correction, we could have achieved the intuitive result of the GFLKS uncertainty always being higher than that from the batch inversion.The uncertainty at the gridscale is quite high overall, due to the strongly underconstrained inversion setup used in this application.Therefore, results at the gridscale serve primarily as a basis for estimating fluxes at aggregated scales (Fig. 5), where a single month covariance correction was sufficient to accurately estimate the uncertainty.

Conclusions
The tools developed in this paper decrease the computational costs associated with the solution of a geostatistical inverse problem aimed at estimating fluxes of atmospheric trace gases.For each set of estimated fluxes, the method uses only observations that provide significant constraints on flux distributions.The covariance between consecutive sets of fluxes is directly incorporated into the estimation, including covariances with flux periods for which estimates are no longer being updated using the most recent observations.Overall, this method makes the solution of large-scale geostatistical inverse problems feasible, paving the way for additional studies on gridscale flux estimation.Note that the proposed approach does still entail the explicit calculation of the sensitivity of observations to estimated fluxes, in this case using an adjoint model, but these sensitivities are only required for the months included in the lag in each iteration.In short, the approach provides an accurate characterization of the a posteriori uncertainties, but the computational cost is higher relative to variational or ensemble based methods that involve fewer model runs but provide a more approximate representation of the a posteriori uncertainties.
Whereas past work on the application of geostatistical inverse modeling to trace gas flux estimation focused on yearly-averaged fluxes, this example also demonstrates the applicability of the geostatistical approach to inverse modeling for estimating monthly-averaged fluxes.Results indicate that even the constant mean model yields flux estimates that agree well with independent flux information for wellconstrained areas of the Earth (e.g.temperate North America).Ongoing work is exploring the use of auxiliary environmental data to inform a more sophisticated model of the trend, which will allow the geostatistical approach to represent more fine-scale spatial structure in the flux distribution, while still avoiding the use of prior flux estimates.

Fig. 2 .
Fig. 2. Sample fluxes used in generating pseudodata.These fluxes represent the sum of the fossil fuel, oceanic exchange and net ecosystem production fluxes.Fluxes vary monthly, but only January, April, July, and October fluxes are presented here.Units are µmol/(m 2 s).

Fig. 3 .
Fig. 3. Locations of pseudodata measurements.The indicate the number monthly averaged measurements available at location.Note that the listing a sum are areas where two observation are too close to one another to be resolved on the This occurs for (i) St. Davids Head, Bermuda (BME), and Tudor Hill, Bermuda (BMW), and (ii) Mauna Loa, Hawaii (MLO), and Cape Kumukahi, Hawaii (KUM).Black squares designate gridcells for which flux estimates are compared to prescribed fluxes in Fig. 6.Shaded areas represent the Temperate North America and South Atlantic TransCom3 regions, for which flux estimates are compared to prescribed fluxes in Fig. 5.

Fig. 5 .
Fig. 5. Monthly recovered flux estimates and uncertainties for the year 2000 aggregated to the TransCom3 regions.Results for Temperate North America and South Atlantic are presented in panels (a) and (b).Uncertainties on the GFLKS are in dashed line; uncertainties for the batch inversion are shaded.Panels (c) and (d) represent the difference between estimates and uncertainties obtained using the Kalman smoother, those obtained using a batch inversion, and the true fluxes used in generating the pseudodata.Note that the difference between estimation uncertainties (dashed line) are amplified by an order to magnitude to make them visible on the same scale as the flux differences.

Fig. 6 .
Fig. 6.Monthly recovered flux intensity estimates and uncertainties for the year 2000 for two sample gridcells.(a) Latitude= [41.25 N, 45.00 N], Longitude=[85.00 W, 80.00 W], surrounding Ann Arbor, Michigan.(b) Latitude=[33.75S, 30.00 S], Longitude=[10.00 W, 5.00 W] in the South Atlantic.Uncertainties on the GFLKS are in dashed line; uncertainties for the batch inversion are shaded.Panels (c) and (d)represent the difference between estimates and uncertainties obtained using the Kalman smoother and those obtained using a batch inversion.
Representation of time stepping through fixed-lag Kalman smoother.The subscripts indicate month numbers.In the presented example, four consecutive steps through the GFLKS are presented in orange, blue, pink, and green, respectively.Notice that observations are only sensitive to fluxes occurring in the same or previous months, and the l'th month of observations is therefore used to constrain fluxes for months l−t m +1 through l.