Comparison of ensemble Kalman filter and variational approaches for CO 2 data assimilation

Introduction Conclusions References


Introduction
Data Assimilation (DA) is best known as a tool in Numerical Weather Prediction (NWP; e.g.Swinbank, 2010) and has been applied to analyze complex datasets and estimate parameters in a variety of fields, including atmospheric constituent (e.g.Lahoz and Errera, 2010;Elbern et al., 2010), oceanographic (e.g.Haines, 2010), and land surface Introduction

Conclusions References
Tables Figures

Back Close
Full (e.g.Reichle, 2008;Houser et al., 2010) assimilation problems.In all such applications, a DA system aims to optimally combine the information from available observations with a prior model estimate (or the background derived from a model forecast), based on their respective uncertainty estimates.DA methods for estimating CO 2 fluxes aim to constrain the spatial and temporal distributions of CO 2 sources and sinks by integrating atmospheric, terrestrial and oceanic data together into a common analysis framework.CO 2 -DA applications (e.g.ensemble filter based methods - Peters et al., 2005;Feng et al., 2009;Miyazaki et al., 2011;Chatterjee et al., 2012;Kang et al., 2012, variational based methods -Rayner et al., 2005;Chevallier et al., 2005;R ödenbeck, 2005;Baker et al., 2006, or hybrid approaches such as the Maximum Likelihood Ensemble Filter - Zupanski et al., 2007;Lokupitiya et al., 2008) have been in vogue for nearly a decade and are viewed as an alternative to more traditional batch inverse modeling schemes.Unlike DA, which uses a combination of numerical approximations and time-stepping approaches, the batch schemes directly solve the linear system of equations relating the fluxes and the atmospheric CO 2 observations in a single step.The DA approaches are attractive because of their computational efficiency (e.g.Rayner, 2010) but the impact of their underlying numerical approximations on the final estimates and their associated uncertainties is ambiguous.
Recently, Chatterjee et al. (2012) pointed out that because of fundamental differences between the carbon flux estimation (i.e., the inverse framework) and the NWP/constituent (i.e., assimilation framework) problems: (a) the performance of the DA methods are not necessarily equivalent for the two frameworks, and (b) only under specific inversion scenarios are the DA methods able to perform optimally.Differences between the two frameworks are mainly driven by the ill-conditioned and highly diffusive nature of the flux estimation problem, and the absence of an explicit dynamical model that can evolve a set of estimated fluxes forward in time.The lack of a dynamical model represents a loss of valuable information to the DA system.By propagating the state vector between different assimilation time steps, a dynamical model directly contributes to the growth of the eigenvalue spectrum of the state covariance matrix Introduction

Conclusions References
Tables Figures

Back Close
Full in certain preferred directions and decay in others (Bengtsson et al., 2003;Furrer and Bengtsson, 2007).For the CO 2 inverse problem, however, the dynamics are embedded within the atmospheric transport, and cannot be used explicitly to inform the temporal evolution of the state vector.The absence of this information, coupled with the availability of only sparse observational datasets, may result in the DA approaches performing sub-optimally.The authors are not aware of any study specifically related to the CO 2 flux estimation problem that attempts to evaluate the performance of DA techniques.This is unlike the weather forecasting community, where several studies have evaluated the strengths and weaknesses of ensemble and variational approaches for different weather-related applications ranging from simple to chaotic non-linear systems (e.g.Lorenc, 2003;Caya et al., 2005;Fertig et al., 2007;Kalnay et al., 2007;Liu et al., 2008;Whitaker et al., 2009;Buehner et al., 2010a,b;Jardak et al., 2010;Zhang et al., 2011; also see the special collection of papers on inter-comparison at http://journals.ametsoc.org/page/EnsembleKalman Filter).Apart from NWP-related comparison studies, DA methods have also been inter-compared for chemical (e.g.Carmichael et al., 2008) and constituent (e.g.ozone - Wu et al., 2008) assimilation problems.These comparison studies cannot be used as a baseline, however, because of differences between the flux estimation and the NWP/constituent data assimilation frameworks, as stated earlier.
The main purpose of this work is thus to fill this gap and build on the existing body of inter-comparison studies from the perspective of the CO 2 flux estimation problem.Specifically, this study aims to answer the following two questions: (a) what is the relative performance of two state-of-the art DA approaches (Ensemble Square Root Filter, EnSRF -e.g.Whitaker andHamill, 2002 and4-dimensional variational, 4D-VAR -e.g. Talagrand, 2010) for solving the CO 2 inverse problem, and (b) how well can the DA approaches reproduce the flux estimates from a batch inverse modeling scheme?
To facilitate the inter-comparison, we consider here a one-dimensional passive tracer transport problem.Similar to previous studies (e.g.Liu and Rabier, 2002;Park and Introduction

Conclusions References
Tables Figures

Back Close
Full The low computational cost associated with the 1D problem enables the implementation of a batch least-squares method in addition to the DA approaches.The DA estimates are thus compared to both the true signal and the batch estimates in order to isolate the degradation due to the underlying numerical approximations.This study assesses whether these approximations allow the examined DA methods to be suitable long-term replacements for batch inversion techniques under different inversion conditions.
When designing the 1D problem, we focus on a framework that allows us to examine an under-determined and fine-scale CO 2 flux estimation problem.This setup is necessary to mimic the challenges of a true CO 2 flux estimation problem in which atmospheric mixing coupled with the sparseness of available observations results in the inverse problem being highly under-determined and ill-posed.The under-determined nature of the problem is accentuated by the need for estimating CO 2 fluxes at fine spatial and temporal scales, which is necessary to not only avoid aggregation errors (e.g.Kaminski et al., 2001;Gourdji et al., 2012) but also to improve the understanding of the fine-scale processes driving the carbon cycle.This paradigm shift has brought about a computational bottleneck in solving the batch inverse problem, which requires the atmospheric transport model to be run either once per estimated flux region/period combination, or once per observation if an adjoint to the transport model is available.This in turn has prompted the use of computationally efficient alternatives, such as DA methods, in which the number of atmospheric transport model runs is proportional to the number of ensemble members (in the ensemble approach) or the number of descent iterations (in the variational approach), both of which are typically set to be much lower than the number of estimated parameters or available observations.Analogous to a real CO 2 flux estimation problem, no dynamical model is explicitly specified for solving the 1D problem.The experiments are specifically targeted to evaluate the impact of three factors on the two DA approaches: (a) the impact of the Introduction

Conclusions References
Tables Figures

Back Close
Full observational density and homogeneity, (b) the impact of the model-data mismatch covariance, and (c) the impact of the operational parameters of the DA system (i.e., ensemble size, number of iterations).While examining the first two factors, issues of sampling and convergence error are minimized by specifying a large number of ensemble members and descent iterations, for the EnSRF and the 4D-VAR 1 , respectively.
Operational constraints are subsequently imposed in the final set of experiments to not only evaluate the fundamental differences between the two DA methods but also the effect of the compromises necessary to make the algorithms practically feasible.This study is a first comparison of the EnSRF and the 4D-VAR for a simplified flux estimation problem, and is expected to guide the development of future inter-comparison experiments with real data, including satellite observations of atmospheric CO 2 .
2 Experimental framework

Estimation methods
In a Bayesian framework, prior information and likelihood are expressed as probability density functions or pdfs (e.g.Enting, 2002).If the pdfs can be approximated as Gaussian, then maximizing the posterior probability of the state is equivalent to finding the 1 Typically in the DA community the term 4D-VAR is used to represent the 3-dimensional space plus time.In this study, the variational approach is applied to a 1-dimensional space plus time, which may suggest that the term "1 + 1D-VAR" may be more appropriate.Within the geophysical community, however, the term 1D in 1D-VAR specifically refers to the vertical column, and is quite popular for radio occultation data (e.g.Eyre et al., 1993;Poli et al., 2002), total column water vapor (e.g.Mar écal and Mahfouf, 2002;Bauer et al., 2006), cloud (e.g.Janiskova et al., 2012) assimilation etc.Since in the current study, 1D refers to a single dimension along the horizontal space and not necessarily the vertical column, we persist with usage of the term 4D-VAR rather than the term 1D-VAR.Introduction

Conclusions References
Tables Figures

Back Close
Full where s is a m × 1 vector of the discretized state (e.g.CO 2 flux), z is the n × 1 vector of observations (e.g.CO 2 observations), h is a forward model that is often a combination of an observation operator and an atmospheric transport model with embedded dynamics, R is the n × n model-data mismatch covariance, s b is the m × 1 prior estimate of the state, Q b is the m × m error covariance of the prior estimate s b , and the superscript T denotes the matrix transpose operation.Note that in the case of atmospheric CO 2 inverse problem, h is linear and typically represented via its matrix form H (a.k.a. sensitivity matrix with dimensions n × m, or Jacobian matrix), which captures the sensitivity of the observations z to the fluxes s (i.e.H i ,j = ∂z i /∂s j ).The inverse problem as formulated via Eq.( 1) determines a least squares fit of the prior state estimate to the observations with the ultimate aim of estimating the pdf of the true state.In this study, the estimate of the pdf of the true state is obtained via a batch inverse modeling (BIM; e.g.Enting, 2002), a variational DA (4D-VAR; e.g.Talagrand, 2010), and an ensemble square root filter (EnSRF; e.g.Whitaker and Hamill, 2002) scheme.
A review of these methods and their underlying mathematical framework are outlined in Appendix A, along with an exposition of the algorithmic choices necessary to adapt the DA methods for solving the CO 2 flux estimation problem.at various locations and times within the domain.The locations and times of the observations as well as their precision can be regulated to simulate a variety of inversion scenarios.The inverse problem involves using the noisy tracer observations along with the transport information to infer the original tracer fluxes.

Problem description
In the following description, the units of mass, length and time are reported as [M], [L] and error with a pre-specified variance (10 [M 2 L −2 ]) is added to the tracer observations to simulate measurement, transport, aggregation, and representation errors.Later in the study, different configurations of x o and error variances (σ 2 R ) are prescribed to test the impact of these parameters.
The tracer observations (z) and the tracer fluxes (s) are related in the following fashion: where H is the sensitivity matrix that is generated using a 1D tracer transport model as: where x r and t r are the tracer flux release locations and times, x o and t o are the tracer observation locations and times, erfc represents the complimentary error function.The tracer transport model embedded in Eq. ( 4) assumes conservation of mass and is based on a well-known one-dimensional analytical solution for contaminant transport in the groundwater literature (e.g.Ogata and Banks, 1961;Runkel, 1996).The tracer observations obtained at a particular time step are sensitive to the tracer flux released at multiple previous time steps.Given that the total length of the domain is 300 [L] and the advection velocity is 50 [L T The 1D framework was designed to capture most of the characteristic features of the CO 2 flux estimation problem.For a real CO 2 inversion, units are most typically [µmol (m 2 s) −1 ] for s and σ Q , [ppm] for z and σ R , [ppm µmol −1 (m 2 s)] for H and [km] for l Q .

Experiments
Experiments are designed to explore the impact of three factors on the ability of the DA methods to solve the inverse problem: (a) the observational density and homogeneity, (b) the model-data mismatch covariance, and (c) the computational constraints of the DA system (i.e., ensemble size, number of iterations).In all the experiments, the size of the state vector or the total number of fluxes to be inferred is 10500 × 1 (i.e.300 locations × 35 times).The first set of experiments (Table 1 -Experiments A through C) aims to investigate the effect of the density and homogeneity in space and time of the observational network.Three different observational networks are designed (Fig. 2).In the first network configuration (denoted as REF -"reference observational set", same as outlined in Sect.2.2), observations are obtained throughout the domain (x o = 1, 2 . . .300 [L]) and for all 35 measurement times (t o = 1.5, 2.5, 3.5 . . .34.5, 35.5 [T]) (see, e.g.Fig. 2a).The total number of observations available is thus 10500 (i.e.300 locations × 35 times).In the second network configuration (denoted as HM -"homogeneous"), observations are obtained at 25 equally spaced locations within the  5, 2.5, 3.5 . . . 34.5, 35.5 [T]) (see, e.g.Fig. 2b).The total number of observations is thus reduced to 875 (i.e. 25 locations × 35 times).In the final configuration (denoted as HT -"heterogeneous"), observations are taken at 25 randomly selected locations, which vary arbitrarily over the 35 time periods (t o = 1.5, 2.5, 3.5 . . .34.5, 35.5 [T]) (see, e.g.Fig. 2c).Similarly to HM, the total number of observations in HT is 875 (i.e. 25 locations × 35 times), but the observations are neither uniform in space nor consistent over time.Note that unlike REF, both HM and HT represent under-determined inversion problems where the total number of observations is significantly lower than the number of unknowns in the state space to be estimated.In reality, the HT network configuration scheme is the closest to current CO 2 monitoring networks where different monitoring locations (ground-based or remote-sensing) can come online and go offline over different periods.For all the three network configurations, random error with a variance of 10 [M 2 L −2 ] is added to the observations to make the problem more realistic.This allows us to specify the model data mismatch covariance matrix R (in Eq. 1) with its diagonal values corresponding to variance of the errors actually introduced into the synthetic observations.In contrast to the prior error covariance, the model-data mismatch variance is constant for all observations and the errors have no spatial or temporal correlation.
The second set of experiments (Table 1 -Experiments AR through CR) examines the effect of the model-data mismatch variance on the best estimates and their associated uncertainties.The model data mismatch covariance matrix captures not only the errors in the observations but also errors due to other sources, such as the transport model, representation and aggregation errors (e.g.Engelen et al., 2002).For all the network configurations, the variance of the random errors is increased to 400 with the diagonal values of the model data mismatch covariance matrix R increased accordingly.All other parameters are kept the same as in the first set of experiments.
The third set of experiments (Table 1 -Experiments AO, BO and CO) explores the impact of operational constraints, which are always an important consideration in implementing a DA system.To minimize the numerical approximations and avoid any form of sampling or convergence error, the ensemble size (for the EnSRF) and the number Introduction

Conclusions References
Tables Figures

Back Close
Full

Results
In the sections that follow, results from the nine experiments are interpreted at both the native and aggregated spatial scales, and estimates from the EnSRF and 4D-VAR approaches are compared both to the truth and to the BIM estimates.Three metrics -the root mean square difference (RMSD), the correlation coefficient (CC) and the standard deviation (SD) are calculated between the flux estimates and the truth at the grid-scale (Fig. 4) and at spatially aggregated scales (Fig. 5).All metrics are averaged across 30 time periods (t r = 6, 7, 8 . . .34, 35 [T]) after discarding the first 5 time periods as spin-up.

Impact of observational density and homogeneity
For the REF-network, all three analyses methods perform extremely well in recovering the true flux (e.g.Fig. 3a), and in fitting the observations within the specified errors (results not shown).For the sample time period presented in Fig. 3a, both the 4D-VAR and the EnSRF estimates capture the flux profile, including its large and small peaks.These results are typical of the performance of the three analysis methods Introduction

Conclusions References
Tables Figures

Back Close
Full Despite the lower quality of the best estimates, both the BIM and the EnSRF are able to quantify the uncertainty associated with estimates accurately, as reflected by the fact that they capture the true flux profile within their 95 % confidence bounds.The EnSRF uncertainty estimates are also consistent with the BIM uncertainty estimates.
For the REF observational configuration, the ratio of the predicted standard deviation of the individual flux estimates in EnSRF (σ ŝEnSRF ) to those from the BIM (σ ŝBIM ) is approximately 0.98, i.e. on average EnSRF underestimates the posterior uncertainties by 2 % relative to BIM.Going from the REF to the HT scheme, the EnSRF over-estimates the uncertainty by 2 % and 6 % for HM and HT, respectively.
In the EnSRF framework, the uncertainties are directly related to the ensemble spread.In the absence of a dynamical model, there is little source of variability for the ensemble to maintain a consistent spread.As observations are assimilated, the ensemble tends to collapse to the ensemble mean and the adaptive inflation has to Introduction

Conclusions References
Tables Figures

Back Close
Full compensate for this degeneracy by inflating the ensemble spread.In the HT case, however, the inflation technique has a delayed response in adjusting to the heterogeneity in the observation network, as different observation locations come into and out of the network.For the adaptive inflation component to work properly, we find it beneficial to have a consistent set of observations to maintain a reasonable ensemble spread.It is worthwhile to mention here that the magnitudes of the inflation factors are very small in Experiments A-C.This is not surprising given that a large number of ensemble members have been specified and hence, the sampling error is quite low.
Even without the application of inflation and localization (results not shown here), the EnSRF estimates have a high CC (∼ 0.93), low RMSD (∼ 0.66 ).Note that posterior analysis covariances are not calculated for the variational approach (see Appendix A); the implications of this choice is further discussed in Sect.4, specifically keeping the CO 2 source-sink estimation problem in mind.
Overall, we find that both the 4D-VAR and the EnSRF match the performance of BIM for all three observational network configurations.Even though small discrepancies are noticeable, the impact of a sparse and/or heterogeneous observational network is similar for the DA approaches compared to the BIM approach.

Impact of model-data mismatch covariance
For all the network configurations, the quality of the estimates degrades when a higher model-data mismatch error is prescribed (comparing Figs. from all the three analyses methods have a lower CC, higher RMSD, and lower SD when compared to panels a-c.The standard deviation changes considerably between panels ar (∼ 1.45-1.5 [M L −1 T −1 ] for the three methods) and cr (∼ 0.94-1.00[M L −1 T −1 ] for the three methods).Increasing the σ 2 R to 400 [M 2 L −2 ] results in the analysis rejecting the information from the observations and giving more weight to the prior, yielding overly smooth a posteriori estimates.A typical example of this is seen by comparing the estimated peak around 50-100 [L] in Fig. 3c and cr.Estimates in both these panels are based on the same observational fields but the estimates in Fig. 3cr are unable to capture the amplitude of the peak in the true flux signal.
The inability to capture the true amplitude is slightly mitigated for BIM and EnSRF due to the fact that these approaches infer higher uncertainties on the estimates when a higher model-data mismatch error is prescribed.Consequently, both estimates do capture the true flux profile within their respective 95 % uncertainty bounds.As the observation network becomes sparser and more heterogeneous, the EnSRF slightly over-estimates the BIM uncertainty, by 3 % (HM) and 5 % (HT), while it underestimates the uncertainty by only 1 % for the reference network.In spite of over-estimating the uncertainties, the EnSRF exhibits a clear advantage over the 4D-VAR by providing analyses errors as part of the estimation procedure.
Experiments AR-CR reconfirm that in the absence of operational limitations, an increase (or decrease) in the model-data mismatch covariance does not affect the ability of the DA methods to reproduce the BIM estimates.

Impact of operational constraints
Operational constraints hinder the performance of the DA approaches, and the impact is further intensified as the observational network becomes more heterogeneous (Figs. 3 and 4, panels ao-co).We find that the 4D-VAR and the EnSRF performance is impacted differently as the numerical approximations come into play.Introduction

Conclusions References
Tables Figures

Back Close
Full For 4D-VAR, an inadequate number of iterations fails to find the global minimum of the quadratic objective function (convergence results not shown here).When the observation fields are heterogeneous, the minimization has more difficulty in finding the path towards the true minimum.Thus, comparing panels ao, bo and co in Fig. 4, the 4D-VAR estimates are worst for the HT network configuration.In general, we find that for the HT network, 4D-VAR needs approximately 150 iterations to converge completely.Conversely, for the REF network 4D-VAR required fewer than 50 iterations to reach full convergence.When only 25 iterations are allowed, the negative performance of the 4D-VAR is more visible for Experiment CO than Experiments AO and BO, which have a relatively homogeneous network.A separate set of experiments was run in which the minimization was preconditioned by defining a new variable to be optimized as: While preconditioning reduces the number of iterations required to reach convergence, we note that for the CO 2 inversion problem, the size and structure of the prior covariance matrix constitutes a significant impediment to taking its inverse square root as proposed in Eq. ( 6).Preconditioning approaches have only been applied in solving largescale CO 2 inversion problem under very specific assumptions of the correlation structure and sparsity (e.g.Chevallier et al., 2005Chevallier et al., , 2007)).Even without pre-conditioning, however, for all the three experiments, the magnitude of the posterior objective function is reduced relative to the prior, indicating that a more probable posterior state has been found relative to the prior.As pointed out by R ödenbeck ( 2005), the minimization determines the large-scale gradient in the initial iterations, while in subsequent iterations fine-scale tuning is performed to capture the optimum.By artificially limiting the number of iterations in Experiments AO and BO (Fig. 4ao and bo), even though the estimates are close to the BIM estimates, the ability of 4D-VAR to make small scale changes are hindered.
For the EnSRF, the degradation can be solely attributed to sampling error caused by a limited ensemble size.This reduces the estimation accuracy (both flux estimates and 12840 Introduction

Conclusions References
Tables Figures

Back Close
Full networks.Specifying a longer localization length scale than 30 [L] for the REF network led to a divergence of the EnSRF system.In this case the spurious noise in the ensemble outweighs the positive impact of the observations.The complicated interplay between the ensemble size and the observational density makes it difficult to identify a mathematical or physical basis for selecting an appropriate localization length scale.We refer the reader to Chatterjee et al. (2012), and the sensitivity tests presented therein, for a more detailed discussion of the role of localization for the CO 2 source-sink estimation problem.
In terms of the recovered uncertainty, the sampling error plays a dominant role in determining the ability of the EnSRF to correctly represent the error in the ensemble mean.The EnSRF uncertainty estimates are close to the BIM uncertainty estimates, but under/over-estimate them in different portions of the domain, rather randomly for the three network configurations.In the worst-case scenario (Experiment CO), the EnSRF estimates fail to capture the true tracer flux within the 95 % uncertainty bounds (see around 200-210 [L] in Fig. 3co).
An important caveat here is that the results for both the DA approaches could potentially be improved through further tuning of each algorithm.For example, the implementation of pre-conditioning algorithms to reach faster convergence, or of more sophisticated localization schemes to dampen the spurious noise in the ensemble, might Introduction

Conclusions References
Tables Figures

Back Close
Full provide slightly different responses and reduce the error incurred due to the numerical approximations.In spite of having state-of-the-art algorithms, however, once the underlying numerical approximations come into play: (a) the EnSRF fails to reproduce the BIM estimate, with the EnSRF performance decreasing as the observation network becomes sparser and more heterogeneous, and (b) the 4D-VAR fails to reproduce the BIM estimate when the observation network is heterogeneous but still performs better than the EnSRF.The better performance of the 4D-VAR is offset by the fact that the EnSRF provides explicit uncertainty bounds on the recovered flux estimates.
The DA approaches are particularly sensitive to the information flow from the observations because of the lack of an explicit dynamical model.As discussed in Sect. 1, the dynamical model adds to the information content of the system.Identifying/developing an appropriate dynamical model relevant to the CO 2 flux estimation problem, and subsequently repeating experiments such as those presented here, may further inform the assessment of the interplay between the operational constraints and the observational network.

Examining results at aggregated spatial and temporal scales
In the previous sections, the performance of the DA and the BIM approaches were analyzed and reported at the native estimation scales (both in space and time).Typically in current CO 2 inversion studies, the fine-scale analyses are averaged a posteriori spatially and/or temporally to coarser scales (for e.g., daily grid-scale fluxes averaged to monthly continental scales) and then examined.
Here, the domain is divided into two sub-regions, 1-150 [L] and 151-300 [L], and the true flux and the flux estimates are aggregated over each of these areas and examined across time.This is qualitatively analogous to aggregating fluxes a posteriori to "large regions" (e.g.biomes, continents) within the inversion domain.Because the true flux has different spatial variability between 50-100 [L] and 200-250 [L], and these differences themselves vary in time, it is possible to examine the ability of the inversion methods to capture these distinct spatiotemporal variations.Figure 5  comparison at the aggregated scales in the form of a Taylor diagram (Fig. 5).For all the 9 experiments, the 4D-VAR is able to match the temporal variation of the spatially aggregated BIM estimates better than the EnSRF.The 4D-VAR iterative algorithm reliably captures the large scale patterns, as a result of which at any given time period and for any experiment, the 4D-VAR estimates are closer to the BIM estimates than the estimates from EnSRF.Even when the number of descent iterations is reduced (panel ao-co in Fig. 5), the 4D-VAR estimates have negligible difference from the BIM at the spatially aggregated scales.Comparing panel co in Figs. 4 and 5 demonstrates that the differences observed at fine scales are substantially reduced when aggregating the estimates to a coarser resolution.
Because there is no explicit dynamical model to evolve the information between assimilation time periods, the EnSRF estimates are always contaminated with small sampling error, but these errors partially cancel when the estimates are spatially aggregated.Overall, the EnSRF estimates still exhibit more spurious variability relative to the 4D-VAR estimates.When the number of ensemble members is reduced (panel ao-co in Fig. 5), however, both the sampling error and the observational density and homogeneity start playing a role, exacerbating the situation.For example in Fig. 5, panels ao and co, the CC between the spatially aggregated EnSRF and the true flux estimates drops from 0.99 to 0.90, and the RMSD increases from 0.03 to 0.21 [M L −1 T −1 ].The standard deviation of these spatially aggregated EnSRF estimates also increases from 0.37 to 0.40 [M L −1 T −1 ], leading to an overestimation relative to the true flux, which has a standard deviation of 0.36 Analysis at aggregated scales demonstrates that when operational constraints are not imposed, both the DA approaches provide aggregated estimates that are close to the aggregated BIM estimates.Even when operational constraints are imposed, the aggregated flux estimates for the EnSRF and the 4D-VAR (Fig. 5co) have higher CC and lower RMSD than the corresponding estimates at the fine scale (Fig. 4co).This is encouraging from the perspective of a real CO 2 flux estimation problem, as it implies Introduction

Conclusions References
Tables Figures

Back Close
Full that the DA flux estimates may serve as reliable alternatives for the BIM flux estimates, at least when aggregated a posteriori to coarse resolutions.

Discussion
It is clear that the choice between 4D-VAR and EnSRF DA approaches for the CO 2 flux estimation problem, should be based on the carbon science questions being targeted, as well as the tradeoff between the impact of slow convergence of the minimization algorithm (for 4D-VAR) and the impact of sampling error (for EnSRF) on the estimated fluxes.When an adjoint for the atmospheric transport model is available, and if only the best estimates are needed, the 4D-VAR performs better than the EnSRF, especially when the measurement network is highly variable.With a small number of iterations, the 4D-VAR may not converge to the global minimum, but it still reliably captures the majority of the large-scale features, with the final estimates close to those expected from a BIM batch solution.In addition, the 4D-VAR framework may potentially be more advantageous for the CO 2 flux estimation problem due to its ability to more easily account for correlated observation errors (O. Talagrand, personal communication, 2012).
Although this specific issue has not been examined in this study, it is worth keeping in mind with the increasing use of satellite-based CO 2 measurements.Recent work by Brankart et al. (2009) has demonstrated techniques to cope with such correlations in an ensemble filter setting as well.Such techniques, however, are numerically efficient only for certain types of error correlation structures.The main disadvantages of the 4D-VAR, on the other hand, are its more cumbersome implementation and its inability to provide direct estimates of the analysis error.Algorithms have been proposed within the CO 2 flux estimation context (e.g.Chevallier et al., 2007) that can generate uncertainty estimates from the variational approach, but these are extremely computationally expensive and not currently viable for large-scale applications.
Conversely, and in the absence of correlated errors, the serial EnSRF is easier to implement than a 4D-VAR system, and does not require the development and Introduction

Conclusions References
Tables Figures

Back Close
Full maintenance of an adjoint model.Due to restrictions on the size of the ensemble, however, it is necessary to adapt and tune ancillary algorithms such as localization and inflation, which make the ensemble approximations to the full-rank Kalman filter computationally viable.Unlike adaptive inflation, the choices to be made in implementing a localization scheme remain highly subjective, however.Experiments in this study clearly showed that the localization length scale is dependent on both the ensemble size and the observational density.Will higher and higher volumes of observations push us towards specifying shorter localization length scales?If so, what is the limit beyond which decreasing the localization length scale may actually degrade the analysis?It is necessary to identify more rigorously a basis for selecting the localization parameters (e.g.Anderson, 2012) or develop approaches that may be less sensitive to variations in the observational network.
One critical advantage of the EnSRF over a 4D-VAR system is that it explicitly provides second-order statistical moments for the estimated system states.In an inverse problem framework, this makes it possible to quantify the uncertainty associated with estimates, and thereby to ascertain their reliability.Therefore, the EnSRF is more desirable for attribution purposes, wherein source/sink estimates with confidence bounds can be used to gain a better understanding of the mechanistic processes driving the carbon cycle, or reconcile estimates from top-down and bottom-up approaches.Recovering realistic a posteriori uncertainty bounds on the flux estimates makes it possible to better tune parameters in bottom-up biospheric models, or at a minimum aid in the identification of a range of suitable values for such parameters.
With both the 4D-VAR and EnSRF approaches there is a direct trade-off between computational savings and estimation accuracy, which is intensified when solving an under-determined problem with a heterogeneous observational network.For largescale flux estimation problems, operational constraints will always exist along with scarce and inconsistent observations, erroneous transport models (thus further limiting the use of available observations) etc.The HT scheme with a limited number of ensemble members/descent iterations (panel co in Figs. 3 and 4) serves as the closest Introduction

Conclusions References
Tables Figures

Back Close
Full analogue to how a real inversion problem is tackled.Even if we account for the increase in remote-sensing measurements of CO 2 , the observational network is going to be a complex hybrid between the REF and the HT scheme.In this scenario, the accuracy and precision of either of the DA approaches will be compromised relative to BIM, until and unless the issues of convergence (4D-VAR) and sampling error (EnSRF) are appropriately addressed.

Conclusions
We present a comparative assessment of two advanced data assimilation approaches with a batch inverse modeling scheme for the atmospheric CO 2 inversion problem.The performance of the DA approaches is found to depend on a complex interplay between the underlying numerical approximations, the lack of a dynamical model, and the information available from the observations.Beyond CO 2 source-sink estimation problems, the conclusions of this study are also applicable to other DA problems where a dynamical model is lacking.Overall, the 4D-VAR scheme is found to be more robust for obtaining the best estimates, while the EnSRF scheme has the advantage of providing explicit estimates of analysis error in addition to a best estimate.The relative performance of the 4D-VAR and the EnSRF, when a large and homogeneous set of observations are available, are consistent with the conclusions obtained from inter-comparison studies carried out by other DA communities.The sensitivity of the approaches to the observational scheme in the absence of an explicit dynamical model and specifically for solving an under-determined inverse problem, however, has not been thoroughly explored before.
The sensitivity experiments demonstrate that when a large number of ensemble members or descent iterations is specified, state of the art implementations of the 4D-VAR and the EnSRF yield analyses that are similar to BIM, irrespective of the observational characteristics.When operational constraints are imposed, on the other hand, both the characteristics of the observation network and the numerical approximations Introduction

Conclusions References
Tables Figures

Back Close
Full play a greater role in differentiating the performance of the two DA approaches.Because such operational constraints are always present for real CO 2 source-sink estimation problems, the choice of an approach should be based on: (a) the carbon science questions being targeted (i.e., is the science requirement only best estimates of fluxes or flux estimates with associated uncertainties?), and (b) the inversion conditions under which they are being applied (i.e., characteristics of the observational network, such as data density and heterogeneity).
Recently, the 4D-VAR and EnSRF approaches have begun to influence each other's development, and hybrid approaches may take center-stage for solving the CO 2 flux problem in the future.In addition, the emerging focus on the assimilation of satellite CO 2 observations will require additional inter-comparisons that go beyond the scope of the work presented here, including the consideration of temporal error correlations in the atmospheric CO 2 observations.Introduction

Conclusions References
Tables Figures

Back Close
Full studies, generation of the matrix H requires an atmospheric transport model to be run either once per estimated state, or once per observation.The large number of forward model runs ultimately makes the batch approaches, such as BIM, computationally intractable for solving large-scale problems.
In the variational approach, the solution ŝ to the objective function (Eq. 1) is sought iteratively by minimizing the misfit between a feasible state trajectory and the observations that are available over a given assimilation window.The overall approximation lays in the fact that the minimization can be stopped by artificially limiting the number of iterations or by requiring that the norm of the gradient decreases by a predefined amount during the minimization.Most minimization schemes (e.g.BFGS, Nocedal and Wright, 2006) rely on the availability of the gradient of the objective function with respect to the state (or control vector in 4D-VAR terms).
Instead of analytically calculating the gradient, the adjoint of the forward transport model is used to compute the term H In this study, the 4D-VAR is implemented with successive overlapping windows to account for the long residence times of CO 2 in the atmosphere.If a single long window is employed for a real CO 2 flux estimation problem, the computational cost associated with calculating the inverse of the prior covariance matrix Q b in Eq. (A4) increases significantly.In addition, the CO 2 flux errors are both spatially and temporally correlated (e.g.Chevallier et al., 2012), which implies that the prior error covariance matrix has offdiagonal terms.Any algebraic manipulations, such as taking inverses or calculating the square roots for preconditioning purposes, thus, become operationally cumbersome.
In the absence of a dynamical model, the implementation of the 4D-VAR approach is similar to the FGAT-3DVAR (Massart et al., 2010)  The main caveat with the variational approach is that a direct estimate of the analysis error is not available (no clear analogue of Eq.A2).Mathematically this can be obtained from the inverse of the Hessian (e.g.Le Dimet et al., 2002;R ödenbeck, 2005;Meirink et al., 2008).But computational challenges restrict the calculation and storage of the Hessian for large-scale problems.Recent applications of 4D-VAR for NWP problems have shown that computationally efficient alternatives do exist (e.g.Cheng et al., 2010;Gejadze et al., 2012), which may have potential applicability for the CO 2 flux estimation problem as well.
In the ensemble filter approach, the key innovation is to work in a reduced subspace of the error covariance matrices.Observations are assimilated to update the ensemble representation of the error covariance matrices.The optimal analysis states and an estimate of the analyses error are determined in a similar fashion to Eqs. (A1) and (A2) but the calculation of Q b H T and HQ b H T are approximated by running the transport model with the ensemble members directly: While several variants of the ensemble approach exist, here we have used a serial ensemble square root filter (Whitaker and Hamill, 2002) implemented in a fixed-lag smoother form (e.g.Whitaker and Compo, 2002;Chatterjee et al., 2012).Similar to an ensemble square-root filter, the ensemble smoother uses Monte-Carlo estimates of the error covariances to compute a Kalman smoother gain matrix.This is applied iteratively to a time series of observations, where the analysis at the first time step is equivalent to a an ensemble square root filter analysis as it only utilizes observations taken up to and including the analysis time.All subsequent time steps utilize observations taken a number of observing times past the analysis time.The localization scheme (i.e., to cut down spurious noise in the ensemble) is based on Houtekamer and Mitchell (2001) using a fifth-order Gaspari-Cohn function (Gaspari and Cohn, 1999), while the 12849 Introduction

Conclusions References
Tables Figures

Back Close
Full adaptive inflation algorithm (i.e., to counter spurious variance deficiency among the ensemble members) is based on Anderson (2009).Implementation of these algorithms within an ensemble smoother framework is described in further detail in Chatterjee et al. (2012).In spite of these ancillary algorithms, the overall implementation of EnSRF is quite simple and computationally efficient.
The setup of 4D-VAR (with overlapping time windows) and EnSRF (expressed as a fixed-lag smoother) reflect state of the art implementations of these two DA approaches, keeping in mind the nature of the atmospheric CO 2 process and the fact that CO 2 information has a finite residence time over any given domain.We tested other flavors of the ensemble filter (EnKF with perturbed observations; Evensen, 2003) and the variational approach (PSAS; Courtier, 1997) and found that the overall conclusions from the presented experiments remain consistent across these algorithmic choices.We encourage the reader to look at Lorenc (2003) and Nichols (2010) (and references therein) for a more detailed discussion on these DA methods.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |Kalnay, 2004), a one-dimensional framework allows us flexibility in setting up the problem because multiple experiments can be simulated in a computationally efficient way.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | minimum of a quadratic objective function as shown in Eq. (1):

A 1 -
dimensional (1D) advection-diffusion problem of a passive tracer is selected to emulate the CO 2 flux estimation problem.In the 1D illustration, the passive tracer represents atmospheric CO 2 .Tracer fluxes get released from a series of locations over a finite duration and get transported by a tracer transport model that encapsulates the physics of advection and diffusion.No sink is specified, and there is therefore a gradual buildup of the passive tracer within the domain.Observations of the tracer are obtained Discussion Paper | Discussion Paper | Discussion Paper | [T] to keep the problem generic.Both the length of the 1D domain and the time period of the experiment are arbitrarily discretized.The parameters for the experiment are: the grid size ∆x = 1 [L]; the domain length x = 300 [L]; the time step of release ∆t = 1 [T]; the total number of time periods over which the tracer flux is released t = 35 [T]; the longitudinal dispersion coefficient D L = 2.0 [L 2 T −1 ]; and the advection velocity v = 50.0[L T −1 ].The tracer flux s [M L −1 T −1 ] that is released (Fig. 1a) is modeled as: s (x r , t r ) = 0.25 (36 − t r ) exp − (represents the locations along the 1D domain (x r = 1, 2, 3 . . .299, 300 [L]) over which the tracer flux is released at times t r (t r = 1, 2, 3 . . .34, 35 [T]).Equation (2) is designed to model two large peaks with fluctuating amplitudes (Fig. 1b) between 50-100 [L] and 200-250 [L], and a smaller consistent double peak (Fig. 1b) between 100-200 [L].Even though the spatial tracer flux profiles are different for each time period, the spatially averaged flux has a constant value of 0.84 [M L −1 T −1 ] across all time periods.Note that the true tracer flux s is used only for simulating the observations, but later assumed unknown during the analysis.The tracer is sampled at locations x o (x o = 1, 2, 3, . . ., 299, 300 [L]) for 35 consecutive time periods (t o = 1.5, 2.5, 3.5 . . .34.5, 35.5 [T]) to obtain the observational dataset z [M L −1 ], such that the observations times are offset from the release times.Initial random Discussion Paper | Discussion Paper | Discussion Paper | −1 ], the maximum residence time of the tracer within the domain is approximately 6 [T].Based on the form of Eq. (2), however, the majority of fluxes occur at x r > 50 [L] (Fig. 1b), and the typical residence time is therefore < 5 [T].This means that an observation taken at time t o [T] provides information about the tracer flux approximately up to time t o − 5 [T].In all subsequent experiments the lag window size for the DA schemes is set to 5 [T].The final piece of information necessary for setting up the inverse problem is the prior estimate (s b ) of the tracer flux and its error characteristics.s b is constant across matrix Q b is based on an exponential decay model, with a correlation length (3 l Q ) of 90 [L] and variance (σ 1D domain (x o = 10, 22, 34 . . .298 [L]), which are maintained for all 35 time periods 12834 Discussion Paper | Discussion Paper | Discussion Paper | (t o = 1.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | of descent iterations (for the 4D-VAR) are set to 1000 and 250, respectively, for the first two sets of experiments ( Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | across other time periods.The performance across the full examined time period is summarized in Fig. 4a, where all three methods show a high CC (∼ 0.97), low RMSD (∼ 0.3 [M L −1 T −1 ]) and similar standard deviation (∼ 1.5 [M L −1 T −1 ]) relative to the true fluxes.The performance of all three methods degrades as the observation density and homogeneity decrease in going from Experiments A to C. This is evident by looking at Fig. 3b,c where the methods fail to capture the smaller double peak around 100-200 [L], while the Taylor diagram in Figs.4b,c shows the resultant drop in CC and a corresponding increase in RMSD.In general, for observations with spatially uncorrelated error, decreasing the observation density is expected to decrease the analysis accuracy.The response of the two DA approaches mirrors the BIM estimate in such cases, including the inference of a wrong flux pattern for the HT network around 100 [L] in Fig. 3c.This indicates that in the absence of any operational limitations, if all other parameters of the inverse problem are the same, then the DA estimates are consistent with the BIM estimate.Neither the lack of a dynamical model nor the under-determined nature of the inverse problem impedes the ability of the DA methods relative to BIM.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 3 and 4 panels a and ar, b and br, c and cr), albeit the heterogeneous network estimates show the most pronounced degradation.An increase in σ Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | the uncertainties) and makes the filter sensitive to the observational density.Note that a Schur-based localization scheme was implemented for the EnSRF (see Appendix A).Since the localization length scale is dependent on the ensemble size, when an ensemble size of 1000 was used (Experiments A-C, or AR-CR) a long localization length scale of 90-120 [L] is sufficient.The localization length scale is determined subjectively based on sensitivity tests, and hence a range of values (i.e., 90-120 [L]) is acceptable within which the EnSRF estimates are not contaminated by spurious noise.Reducing the ensemble size to 100 requires the use of a shorter localization length scale.It was found beneficial to have different length scales for the different observational networks, namely 10-30 [L] for the REF network and 45-60 [L] for the sparser Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | variant occasionally used within the NWP communityDiscussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 2 .
Fig. 1. (A) Spatiotemporal variability of the tracer flux.(B) Flux profile for a particular time period corresponding to the dashed white-line in (A).

Table 1 -
Experiments A-C and Experiments AR-CR).The number of descent iterations is prescribed to be lower than the number of ensemble members, keeping in mind that 4D-VAR typically requires more model integrations (i.e., both forward and adjoint model run) than EnSRF.Given that it is not feasible to either run a large number of ensemble members or specify a large number of descent iterations for any real atmospheric application, these are reduced to 100 ensemble members for the EnSRF and 25 descent iterations for the 4D-VAR in the third set of experiments.The noise added to the observations is kept the same as in the first set of experiments, namely 10 [M 2 L −2 ], to allow for a direct comparison with Experiments A-C.

Table 1 .
Summary of the experiments outlined in Sect.2.3.The following parameters are held constant for all the experiments in this study: s b (Eq.5), 3 l Q = 90 [L] and σ

Table 1 -
Summary of the experiments outlined in Section 2.3.The following parameters are held constant for all the experiments in this study: s b (Equation11), 3l Q = 90 [L] and σ Q