A Joint Data Assimilation System (tan-tracker) to Simultaneously Estimate Surface Co 2 Fluxes and 3-d Atmospheric Co 2 Concentrations from Observations

We have developed a novel framework (" Tan-Tracker ") for assimilating observations of atmospheric CO 2 concentrations, based on the POD-based (proper orthogonal decomposition) ensemble four-dimensional variational data assimilation method (PODEn4DVar). The high flexibility and the high computational efficiency of the PODEn4DVar approach allow us to include both the atmospheric CO 2 concentrations and the surface CO 2 fluxes as part of the large state vector to be simultaneously estimated from assimilation of atmospheric CO 2 observations. Compared to most modern top-down flux inversion approaches, where only surface fluxes are considered as control variables, one major advantage of our joint data assimilation system is that, in principle , no assumption on perfect transport models is needed. In addition, the possibility for Tan-Tracker to use a complete dynamic model to consistently describe the time evolution of CO 2 surface fluxes (CFs) and the atmospheric CO 2 concentrations represents a better use of observation information for recycling the analyses at each assimilation step in order to improve the forecasts for the following assimilations. An experimental Tan-Tracker system has been built based on a complete augmented dynamical model, where (1) the surface atmosphere CO 2 exchanges are prescribed by using a persistent forecasting model for the scaling factors of the first-guess net CO 2 surface fluxes and (2) the atmospheric CO 2 transport is simulated by using the GEOS-Chem three-dimensional global chemistry transport model. Observing system simulation experiments (OSSEs) for assimilating synthetic in situ observations of surface CO 2 concentrations are carefully designed to evaluate the effectiveness of the Tan-Tracker system. In particular, detailed comparisons are made with its simplified version (referred to as TT-S) with only CFs taken as the prognostic variables. It is found that our Tan-Tracker system is capable of outperform-ing TT-S with higher assimilation precision for both CO 2 concentrations and CO 2 fluxes, mainly due to the simultaneous estimation of CO 2 concentrations and CFs in our Tan-Tracker data assimilation system. A experiment for assimilating the real dry-air column CO 2 retrievals (XCO 2) from the Japanese Greenhouse Gases Observation Satellite (GOSAT) further demonstrates its potential wide applications.


Introduction
Carbon cycle data assimilation systems offer a promising new tool for CO 2 surface flux (CF) inversion (e.g., Peters et al., 2005;Feng et al., 2009), which tends to yield CO 2 surface flux estimates by optimally combining information X.Tian et al.: A joint carbon cycle data assimilation system (Tan-Tracker) from both chemistry transport model (CTM) simulations and atmospheric CO 2 observations.Previous studies have helped to improve our understanding of the contemporary carbon cycle (e.g., David et al., 2006;Peters et al., 2007;Feng et al., 2011;Kang et al., 2012).The ensemble Kalman filter (referred to as EnKF) has been widely adopted in carbon cycle data assimilation (e.g., Peters et al., 2007;Feng et al., 2009Feng et al., , 2011;;Kang et al., 2012;Liu et al., 2012), largely due to its simple conceptual formulation and relative ease of implementation (Evesen, 2003).Peters et al. (2005) coupled the state-of-the-art atmospheric transport TM5 model (http: //www.projects.science.uu.nl/tm5/) to the ensemble square root filter (EnSRF), which forms the "CarbonTracker" data assimilation system, and its CF inversion results are fairly consistent with the majority of carbon inventories reported by the first North American State of the Carbon Cycle Report (SOCCR) (Peters et al., 2007).In CarbonTracker, a simple persistence forecasting operator is taken as the forecast model to represent the surface CO 2 flux propagation.This implies that the CFs (actually the scaling factors) are essentially treated as the model (i.e., the simple persistence forecasting operator) prognostic variables.Inclusion of a CF dynamical model in CarbonTracker meant that any useful information for CFs' improvement achieved by the current data assimilation procedure could be used in the next assimilation cycle, so that the observed information would not be wasted.However, the uncertainty of the initial CO 2 concentration fields has been ignored in CarbonTracker.In fact, this uncertainty has such a large effect on CF estimates that neglecting this effect might result in unpredictable consequences (Bousquet et al., 2000;McKinley et al., 2004;Peylin et al., 2005).Recently, Kang et al. (2011Kang et al. ( , 2012) also presented a simultaneous data assimilation system of surface CO2 fluxes and atmospheric CO2 concentrations by means of the local ensemble transform Kalman filter (LETKF-CDAS).Here "LETKF-CDAS" means the LETKF (i.e., the local ensemble transform Kalman filter)-based carbon cycle data assimilation system (referred to as CDAS).In LETKF-CDAS, the CFs were also treated as part of the model states (as in Peters et al., 2005) and essentially a simple persistence dynamical model is adopted to describe the CFs' integration.Similarly, Feng et al. (2009) also developed an ensemble Kalman filter to estimate 8-day CO 2 surface fluxes over geographical regions globally from satellite measurements of CO 2 .
The four-dimensional variational data assimilation (4D-Var) method has also been introduced in this field (e.g., Baker et al., 2006a;Engelen et al., 2009).Compared with EnKF, 4D-Var has its own attractive features: for example, it has the ability to simultaneously assimilate the observations at multiple times to the analysis fields (Tian and Xie, 2012).Nevertheless, the needs of the adjoint model and the linearization of the forecast model limit the wider applications of 4D-Var.Tian et al. (2008bTian et al. ( , 2011) ) proposed the POD-based (proper orthogonal decomposition) ensemble four-dimensional variational data assimilation method (PODEn4DVar) based on the POD and ensemble forecasting techniques, which aims to exploit the strengths of the two forms (i.e., EnKF and 4D-Var) of data assimilation while simultaneously offsetting their respective weaknesses.In PODEn4DVar, the control (state) variables in the 4D-Var cost function appear explicitly so that the adjoint model is no longer needed and the data assimilation process is significantly simplified (Tian et al., 2008).Furthermore, PODEn4DVar largely retains the basic advantages of the traditional 4D-Var.Its feasibility and effectiveness are demonstrated in an idealized model with simulated observations (Tian et al., 2011;Tian and Xie, 2012).It is found that the PODEn4DVar performs better than both 4D-Var and EnKF, and with lower computational costs than the EnKF (Tian et al., 2011).This method has been successfully applied to land data assimilation (Tian et al., 2009(Tian et al., , 2010)).Furthermore, we have built a PODEn3DVar (the three-dimensional version of PODEn4DVar)-based radar assimilation system on the atmospheric transport WRF model platform (Pan et al., 2012).This WRF-based data assimilation system indicates its (PODEn4DVar) potential in the atmospheric transport data assimilation.
In this study, we report on a new development of a CF data assimilation system based on the PODEn4DVar approach, named Tan-Tracker (in Chinese, "Tan" means carbon).This system is developed by incorporating a joint PODEn4DVar assimilation framework into the GEOS-Chem model (V9-01-03, http://acmg.seas.harvard.edu/geos/).We choose an identity operator as the CF dynamical model to describe the CFs' evolution and then utilize such a CF dynamical model to constitute an augmented dynamical model together with the GEOS-Chem atmospheric transport model.Therefore in this case, the large-scale state vector made up of both the CFs and CO 2 concentrations is assumed to be the prognostic variable, which will be simultaneously constrained by assimilation of atmospheric CO 2 concentration observations.In Sect.2, we describe our Tan-Tracker data assimilation system, including the Tan-Tracker joint assimilation framework, a simple review of the PODEn4DVar assimilation approach and its coupling with the joint assimilation framework, and its covariance localization scheme.The following section (Sect.3) shows observing system simulation experiments (OSSEs) for the evaluations of the Tan-Tracker system in comparison to its simplified version only taking CFs as the prognostic variables.Furthermore, another assimilation experiment for assimilation of real spaceborne CO 2 dry-air mole fraction observations (XCO 2 ) indicates potential wider applications of this new proposed Tan-Tracker system (Sect.4).Finally, a summary and concluding remarks are provided in Sect. 5.

The Tan-Tracker joint data assimilation system
Joint or dual-pass assimilation schemes have been utilized to optimize model states and parameters simultaneously from noisy measurements through classical filters (e.g., the dual UKF or EnKF) (Tian et al., 2008;Tian and Xie, 2008).Tian et al. (2009) expanded the dual-pass assimilation strategy to the PODEn4DVar approach and built a PODEn4DVar-based dual-pass microwave land data assimilation system (Tian et al., 2010).Similar to the usual joint assimilation schemes, the augmented vector used in LETKF-CDAS is also a stateparameter-augmented one and the CFs are treated as the model parameters.However it should be noted that the prognostic variable used in Tan-Tracker is the large-scale vector made up of CFs and CO 2 concentrations, whose evolutions, according to the augmented dynamical model, consist of an identity operator and the CTM.

The Tan-Tracker joint assimilation framework
An ordinary ensemble-based assimilation system (for example, CarbonTracker) usually begins with the preparation of an ensemble of NCFs F i,g (i = 1, . .., N ) based on the firstguess net CO 2 surface exchange F * (t) at the rth assimilation cycle: where λ g,r represents a set of linear scaling factors (Peters et al., 2005) for each day and each grid (g) to be estimated and the subscript "r" denotes the rth assimilation cycle.Usually, the CTM would integrate and produce the 3-D CO 2 concentration ensemble U m,i (i = 1, . .., N ) N times derived by the ensemble of CFs F i,g (t) from the same initial background CO 2 concentration field.However, for Tan-Tracker, we seek a more innovative way to accomplish its implementation.
Figure 1 shows the flowchart of the Tan-Tracker joint assimilation system: Tan-Tracker is initiated by two CTM runsone is the background run (the blue part in Fig. 1) and the other is the sampling run (the red part in Fig. 1).
Figure 2 shows the makeup of the assimilation window (i.e., the optimized window + the lag window + the observational window; see Fig. 2) in Tan-Tracker.F a b (F s b ) denotes the prior CF series over the assimilation (sampling) window, and F * a (F * s ) represents the first-guess CF series over the assimilation (sampling) window.In the background run, we integrate the CTM (GEOS-Chem) to produce the background CO 2 concentration fields U b forced by the prior CF series F a b at the rth assimilation cycle over the assimilation window which is used to prepare the background joint state vector (λ b , U b ) T .Here L a is the length of the assimilation window and λ b,r is the prior scaling factor at the rth assimilation cycle.As mentioned, the assimilation window consists of an optimized window (1 week), a lag window (5 weeks) and an observational window (1 week).In each assimilation cycle, the observations in the observational window will be used to update the joint prognostic variables (λ, U ) T in the optimized window.
Correspondingly, in the sampling run, we run the CTM from the background CO 2 concentration field U s b at the beginning of the sampling window (i.e., the Pre-Assim window + the Assimilation window + the Post-Assim window) (Fig. 2) driven by the prior CF series in the same (rth) assim-

X. Tian et al.: A joint carbon cycle data assimilation system (Tan-Tracker
where t = 1, . .., L s ; L s (= L Pre + L a + L Pos ) is the length of the sampling window; and L Pre and L Pos are the lengths of the Pre-Assim and Post-Assim windows, respectively (see Fig. 2), over the sampling window to yield the sampling CO 2 concentration series . Next, a 4-D moving sampling strategy (Fig. 2; Wang et al., 2010) is adopted to create the large-scale vector ensemble λ m,i , U m,i T (i = 1, . .., N , N = L s − L a + 1) as follows: . . .
As a result the large-scale joint state vector (λ, U ) T is viewed as the prognostic variable in Tan-Tracker, with the identity operator (4) chosen to be the CF dynamical sub-model to describe the CFs' evolution: where I is the identity matrix.This CF persistence forecasting model (4) follows Peters et al. (2005) and assumes that the prior (or background) scaling factors λ b,r+1 for the next assimilation cycle [(r+1)th] are equal to the optimized scaling factors λ a,r of the current (rth) assimilation cycle.In the actual implementations, the following dynamical model ( 5) is applied to the linear scaling factors, λ where L o is the length of the optimized window (Fig. 2) and λ j a,r are the daily optimized scaling factors λ a,j (j = 1, . ..., L o ).The CF dynamical sub-model M CF is thus utilized to constitute the augmented dynamical model for Tan-Tracker together with the CTM (GEOS-Chem) model.By applying the observation operator H to the modeled CO 2 concentrations U m,i and the background CO 2 concentrations U b , we can obtain the ensemble simulated observations U o m and the background simulated observations U o b as follows: and So far, the background joint vector (λ b , U b ) T , the joint vector ensemble λ m,i , U m,i T , Eqs. ( 8) and ( 9) and the real CO 2 measurements U o b would be input to the PODEn4DVar assimilation processor, which yields the assimilated (λ a , U a ) T and the optimized CFs F a = λ a F * as a result.
In conclusion, Tan-Tracker works as follows: two CTM runs forced by the background CFs' series are firstly achieved over the assimilation window and the sampling window, respectively: the background run is used to prepare the background joint vector, and the sampling run is used to produce the joint vector ensemble by applying a 4-D moving strategy (Wang et al., 2010) to the sampling simulations throughout the sampling window.The background joint vector and the joint vector ensemble are then input into the PODEn4DVar processor, in which the usual observation operator (e.g., the interpolation function to interpolate the model gridded variables to the in situ observations) compares the simulated CO 2 concentrations with the observed according to the 4D-Var cost function: the CO 2 concentrations are assimilated to initialize the next assimilation cycle.Meanwhile, the scaling factors λ in the optimized window are also optimized and used for the next assimilation cycle through Eq. ( 5).

The PODEn4DVar and its coupling with the joint assimilation framework
The PODEn4DVar approach is born out of the incremental format of the 4D-Var cost function where x = x −x b is the perturbation of the background field x b at the initial time t 0 , Here index k denotes the observation time; the superscript T stands for a transpose; b represents background values; S is the total observational time steps in the observational window; H k acts as the observation operator; and matrices R k and B are the observational and background error covariances, respectively.With the prepared background field x b , the initial model perturbations (MPs) x (x 1 , x 2 , • • •, x N ), the simulated observation perturbations y (y 1 , y 2 , • • •, y N ), the observational increments y obs,k , and the background and observational error covariances B and R k , the final PODEn4DVar analysis solution x a without localization of its analysis error covariance P a is formulated through some necessary calculations (see Tian et al., 2010Tian et al., , 2011, for more details) as where and V is derivable from y T y = V 2 V T and P y = y V. To clarify, the background covariance B is approximately estimated by B = In particular, in Tan-Tracker, where Here we mark As mentioned, the model state to be optimized is the joint vector (λ, U ) T , which indicates in Tan-Tracker.
We have realized the coupling between the joint assimilation framework with the PODEn4DVar assimilation processor through Eqs.(18-22) (see the green part of Fig. 1).

Covariance localization
As an ensemble-based assimilation system, Tan-Tracker also utilizes the covariance localization techniques to ameliorate the contaminations resulting from the spurious long-range correlations (Houtekamer and Mitchell, 2001).It uses the following exponential decay of the covariance structure with distance between state and observational variables (Gaspari and Cohn, 1999), to calculate the elements where L x and L y are the lengths of the state vector x and the observational vector y, respectively; d i,j is the distance between the ith state and the j th observation locations and d 0 is the horizontal covariance localization Schur radius.
Consequently, the covariance localization in Tan-Tracker can be implemented by calculating the Schur product • (i.e., piecewise multiplication) as follows (Greybush et al., 2011): 3 OSSEs for the evaluations of Tan-Tracker In this section, Tan-Tracker will be comprehensively evaluated through a group of well-designed global observing system simulation experiments (OSSEs) over a given assimilation period.

Experimental setup
We simulate atmospheric CO 2 concentrations using the global three-dimensional chemical transport model GEOS-Chem (version 9-01-03, http://acmg.seas.harvard.edu/geos/)driven by the assimilated meteorological data from the Goddard Earth Observing System (GEOS) of the NASA Global Modeling and Assimilation Office.The version of the model we use is driven by the GEOS-5 meteorological fields with a horizontal resolution of 2 • latitude by 2.5 • longitude and 47 vertical layers up to 0.01hPa.The original GEOS-Chem CO 2 simulation was described in Suntharalingam et al. (2004) and updated by Nassar et al. (2010).Our simulations include CO 2 fluxes from monthly fossil fuel burning and cement production CO 2 emissions from the Carbon Dioxide Information Analysis Center (CDIAC) inventory for year 2009 (Andres et al., 2010), monthly biomass burning from the third version of the Global Fire Emission Database (GFEDv3) for 2010 (van der Werf et al., 2010;Mu et al., 2011), climatological biofuel burning (Yevich and Logan, 2003), monthly ocean exchange (Takahashi et al., 2009), 3-hourly biospheric fluxes from the Carnegie-Ames-Stanford Approach (CASA) model for 2000 (Olsen and Randerson, 2004), annual climatology terrestrial biosphere exchange based on TransCom CO 2 inversion results adjusted with GFEDv2 fire emissions (Baker et al., 2006b;van der Werf et al., 2006), the chemical production of CO 2 from the atmospheric oxidation of other carbon species (Nassar et al., 2010), the monthly emissions from shipping (Olivier and Berdowski, 2001), and aviation CO 2 emissions (Friedl, 1997;Sausen and Schuman, 2000;Kim et al., 2005Kim et al., , 2007;;Wilkerson et al., 2010).2010).After the spin-up run, the obtained CO 2 fields were used to drive the observing system simulation experiments.In all the following OSSEs, we firstly assume the default surface CO 2 fluxes released with the GEOS-Chem model as the true CF series F True .Then we run the GEOS-Chem model, driven by the true CF series F True , to obtain the true CO 2 concentration results from 1 January 2010 to 31 December 2010 (i.e., the assimilation period).The artificial CO 2 observations are thus generated every day by sampling the daily true CO 2 concentrations through adding small random noise (whose error variance is 0.01 ppm 2 ) through the 136 observational sites used in this study (Fig. 3).The first-guess CF series F * are set to 1.8F True , which drives the GEOS-Chem model at the same resolution (2 • latitude × 2.5 • longitude) to produce the background CO 2 simulations from the spun-up equilibrium state.
The performance of our Tan-Tracker system is examined by comparison with the simplified version (referred to as TT-S), taking only CFs as the prognostic variables.TT-S is somewhat similar to CarbonTracker except that the ensemble square root filter (EnSRF) has been replaced by the PO-DEn4DVar approach and the GEOS-Chem model is used instead of the TM5 model.Similar to CarbonTracker, the GEOS-Chem model in TT-S is actually the observation operator linking the CFs with CO 2 observations.In TT-S, since the CO 2 concentrations are not assimilated together with the CFs, we first obtain the optimized scaling factors through assimilating CO2 observations, and thus the CO 2 concentrations are also updated by the GEOS-Chem modeling forced by the optimized CFs.All the assimilation processes are initiated by the GEOS-Chem model with first-guess CF series F * (= 1.8F True ) and conducted continuously by assimilating the daily pseudo-observations throughout the assimilation period.In all the assimilation experiments, the scaling factors are initiated from λ b,0 (i, j ) = 1.0 (where i and j are the longitude and latitude indexes, respectively, and 0 denotes the rth (= 0) assimilation cycle).In all the OSSEs, the default lag window is 5 weeks, and the observational window and optimized window are both 1 week.The reference ensemble size N is 106 and the standard localization radius d 0 is 900 km.Changes in the assimilation parameters might influence the assimilation performance.We further investigate the effects of the length of the horizontal localization Schur radius and the ensemble size in Tan-Tracker by means of several sensitivity numerical experiments, the results of which are presented in Sect.3.2.In all assimilation experiments, we use the adaptive inflation technique proposed by Li et al. (2009).

Experimental results
To evaluate Tan-Tracker's performance in a general view, time series of the daily global mean fluxes and CO 2 concentrations from the background simulations, the TT-S and the TT (Tan-Tracker) assimilations are compared with the true  simulations in Fig. 4.Not surprisingly, the background simulations (referred to as Sim) will inevitably deviate seriously from the "true" simulations due to the predetermined background CF series F b (= 1.8F True ).Remarkably, since both the CO 2 concentrations and CFs are simultaneously assimilated under the joint assimilation framework, it could largely eliminate the uncertainty of the initial CO 2 concentrations on the CO 2 evolution during the assimilation window and maximize the observations' potential.Probably for this reason, Fig. 4 shows that Tan-Tracker works very well throughout the whole assimilation period, especially after the first few months, which can be considered a spin-up period.However, the performance of TT-S is not very robust and its assimilated errors do not show a trend of becoming less even though its performance seems to be substantially better than the background simulation case: obviously, the impacts of the CO 2 concentration have not been taken into full consideration in the TT-S system and there must be some non-negligible errors remaining in the TT-S-optimized CO 2 concentrations (Fig. 4b).The resulting errors in the initial CO 2 concentrations will in turn contaminate the TT-S assimilation of CO 2 fluxes for the next assimilation cycle.In the following discussions, we focus on the results only during the latter half of the year 2010 and thus remove the spin-up period occurring in the first half of the year.Figure 5 also shows that the posterior uncertainties of the analyzed CFs are gradually decreased with assimilation of CO 2 observations.Furthermore, Fig. 6 shows time series of the daily globally averaged scaling factor.The daily averaged scaling factor is also decreased and becomes close to ∼ 0.56 (i.e., 1/1.8) with small fluctuations during the latter half of the year 2010.
Similar to Peters et al. (2005), we also aggregated the daily, gridded (2 • latitude × 2.5 • longitude) simulation and assimilation results to 24 "super-regions" corresponding to the TransCom 3 regions given by Gurney et al. (2002).Figure 7 shows the 24 super-regions' aggregated mean CO 2 concentrations and fluxes during the latter half of the year 2010.Generally, Tan-Tracker is able to reproduce the true fluxes well and its superiority dominates most of the 24 super-regions except for 3 -CT-09 (Tropical Asia), CT-12 (North Pacific Temperate) and CT-20 (Southern Ocean) -whose absolute values are very small (Fig. 7a).Furthermore, as far as the CO 2 concentration is concerned, the superior performance of Tan-Tracker beyond TT-S is increasingly obvious (Fig. 7b): the differences between the "truth" and the TT-assimilated CO 2 concentrations are much less than those between the TT-S-assimilated and the "truth" in the overwhelming majority of cases, which illustrates once more that the simultaneous assimilation of CO 2 concentrations and CFs is indispensable.The time series of daily mean  fluxes and CO 2 concentrations from the four selected superregions (Temperate North America, Europe, Boreal Eurasia, and Southern Ocean) are shown in Figs. 8 and 9. Similar to the global mean case shown in Fig. 3, the ability of our assimilation system to represent the variations of seasonal peak-to-trough amplitudes of CO 2 concentrations and fluxes is expressed thoroughly and demonstrates its power to make full use of the observations.Comparatively speaking, the ability of the TT-S system is considerably inferior to Tan-Tracker, especially in the Southern Ocean super-region during October-December, 2010: here the TT-S-optimized CO 2 concentrations are even worse than the background simulations (Fig. 9d).
To evaluate the performance of our Tan-Tracker data assimilations system comprehensively, we show the rootmean-square errors (RMSEs) for the daily, gridded (2 • latitude × 2.5 • longitude) TT-and TT-S-assimilated fluxes from 1 July to 31 December 2010 in Fig. 10.In addition, their corresponding RMSEs for the assimilated (optimized) CO 2 concentrations are also shown in Fig. 11.Compared with the Tan-Tracker case, larger RM-SEs (> 300 × 10 −11 kg C m −2 s −1 ) for the TT-S-assimilated fluxes can be found in the central parts of South America, most of East Asia, and southern Africa (Fig. 10b).Encouragingly, the TT-assimilated flux RMSEs are largely kept at a very low level (≤ 80 × 10 −11 kg C m −2 s −1 ), in which relatively larger RMSEs (but still much less than the TT-Sassimilated) appear only in a very small area in the central parts of South America (Fig. 10a).Naturally, a parallel circumstance is also replayed in the CO 2 concentration case (Fig. 11).Evidently, a relatively definite conclusion can be drawn that the uncertainty of the initial CO 2 concentrations cannot be ignored and the joint assimilation framework contributes a lot to the final Tan-Tracker performance.Moreover, the application of the advanced hybrid assimilation approach (i.e., PODEn4DVar) would definitely make a positive con- tribution to its excellent performance (Tian et al., 2011).Of course, the imbalance of CFs and CO 2 concentrations in TT-S partly explains its inferior performance.
Another group of experiments using the Tan-Tracker system with different horizontal localization radii (d 0 =100, 900, 1450, 2000 and 5000 km) are also conducted to explore the sensitivity of our Tan-Tracker assimilation system to the variations of the horizontal radius.As suggested by Peters et al. (2005), we take 900km as the default or reference radius.Figure 12 shows time series of the daily global CO 2 concentrations and fluxes from the "truth" as well as the TT assimilations using the three different horizontal localization radii (d 0 = 900, 1450 and 2000 km).Therefore, we can roughly judge that the Tan-Tracker system could perform well with its horizontal localization radius around 900 km.Nevertheless, two extremely inappropriate localization radii (d 0 = 100 and 5000 km) are also tested in our experiments (but not shown here), whose poor performance demonstrates that the choice of an appropriate covariance localization radius is essential to Tan-Tracker's successful implementation.
Finally, to investigate the impacts of sample sizes on Tan-Tracker's assimilation results, we also conduct another group of Tan-Tracker assimilation experiments with the ensemble numbers N = 60, 106 and 150. Figure 13 shows that the differences between the two assimilation experiments with N = 106 and 150 are very small.However, if we decrease the ensemble number to 60 (not shown), the assimilation results become divergent.Synthesizing the above results, we can conclude that giving a certain number of sample sizes (≥ 100) could generally guarantee the robust performance of our system.

Real-data assimilation experiment with spaceborne observations
In this section, a preliminary real assimilation experiment is conducted by using spaceborne CO 2 dry-air mole fraction observations to illustrate the potential applications of Tan-Tracker in real-data assimilation.

Experimental setup
The basic experimental designs (such as the GEOS-Chem model, ensemble size, assimilation window, localization radius, etc.) are exactly the same as those adopted in Sect.
3. Nevertheless, in this real-data experiment, we took the default surface CO 2 fluxes released with the GEOS-Chem model as the first-guess CF series F * and used spaceborne CO 2 dry-air mole fraction observations (XCO 2 ) instead of artificial CO 2 observations.The spaceborne ob-  servations used here are originated from the Japanese Greenhouse Gases Observing Satellite (GOSAT), which was launched into orbit in 2009.TANSO-FTS, onboard GOSAT, operates in the shortwave infrared band (SWIR) between 758 and 2080 nm and thermal infrared band (TIR) from 5.56 to 14.3 µm, providing information on CO 2 and CH 4 in the atmosphere.Level 2 data or the so-called the column-averaged CO 2 dry-air mole fraction XCO 2 is taken from version 3.3 atmospheric CO 2 observations from space (ACOS) data product (O'Dell et al., 2012).Validation against ground-based TCCON data shows a mean bias less than 1.4 ppm; these biases can be further reduced by applying the recommended data screening criteria and bias correction technique (for more details please refer to the document "ACOS Level 2 Standard Product Data User's Guide", http://disc.sci.gsfc.nasa.gov/acdisc/documentation/ACOS_v3.3_DataUsersGuide.pdf).Furthermore, to guarantee the high quality of the assimilated data as much as possible, we discarded the XCO 2 data with observation errors ≥ 0.75 ppm.
In order to assimilate the spaceborne XCO 2 directly, the following observation operator (Eq.25) needs to be incorporated into Tan-Tracker to provide a link between the observational variable XCO 2 and the GEOS-Chem-simulated CO 2 concentrations (Feng et al., 2009): where h is the pressure weighting function; A is the full averaging kernel matrix; U a and XCO 2 , a are the prior CO 2 profile and the associated column amount, respectively; and u m is the GEOS-Chem-produced CO 2 profile.The experiment period is from 1 January 2010 to 31 March 2010.In particular, we chose one arbitrary day's (15 March 2010 in this experiment) XCO 2 data as the evaluation data set, which are designedly not assimilated in the experiments to provide an independent evaluation for the Tan-Tracker system.

Experimental results
The lack of reliable independent CF estimates derived from GOSAT XCO 2 retrievals (Chevallier et al. 2014)   a) and XCO 2 , i o being the simulated (assimilated) and observed XCO 2 values for each valid footprint, respectively.However, the TT-assimilated case only has a very small bias (err = −0.45ppm).Obviously, the above discussions could only demonstrate that our Tan-Tracker system is capable of yielding fairly good CO 2 concentration results.It is encouraging to find that the performance of the TT-Sim case is slightly inferior to the TT case (RMSE = 1.45 ppm and r = 0.83), suggesting that Tan-Tracker does enhance the CO 2 concentration and flux estimations .It provides a promising new tool for CO 2 surface flux (CF) inversion.In addition, in Fig. 14, α (0.01) is the confidence coefficient.Certainly, extra efforts should be made to give a more detailed assessment for Tan-Tracker satellite data assimilation, which will be provided in another study.

Summary and concluding remarks
In this study, a new carbon cycle data assimilation system (i.e., Tan-Tracker) is developed based on an advanced hybrid assimilation approach (PODEn4DVar), as a part of the preparation for the launch of the Chinese carbon dioxide observation satellite (TanSat) (Liu et al., 2012;Cai et al., 2014).Tan-Tracker adopts a joint data assimilation framework: a simple persistence model is chosen to describe the CFs' evolution, which acts as the CF dynamical sub-model and constitutes an augmented dynamical model together with the GEOS-Chem atmospheric transport model.In such an augmented dynamical model, the large-scale state vector made up of CFs and CO 2 concentrations is actually the prognostic variable, which is designed to be simultaneously constrained by the observations of atmospheric CO 2 concentrations.As a step towards the application of Tan-Tracker, we carefully designed several groups of observing system simulation experiments (OSSEs) to comprehensively evaluate Tan-Tracker's performance in comparison to its simplified version (TT-S), taking only CFs as the prognostic variables.It is found that the simultaneous estimation of CO 2 concentrations and CFs plays a vital role in enhancing the Tan-Tracker system's performance: contamination in Tan-Tracker's performance in CF estimation from the uncertainty in the CO 2 concentration evolution has been gradually reduced through continuously fitting model CO 2 concentration simulations to the observations.Our future work will focus on the realization of XCO 2 assimilation in the first version of Tan-Tracker, which is a key step to extending Tan-Tracker with functions for assimilating satellite measurements.This goal could be achieved by introducing the observation operator to link the CO 2 concentration profiles with XCO 2 .As the Chinese TanSat has not yet been launched, we will focus our proposed Tan-Tracker on GOSAT and OCO-2 (O'Dell et al., 2012) measurements of CO 2 .Encouragingly, a preliminary real-data assimilation experiment conducted by using spaceborne (GOSAT) observations demonstrates its potential wider applications.

Figure 1 .
Figure 1.Flowchart of the Tan-Tracker joint data assimilation system.

Figure 3 .
Figure 3.The observational sites used in this study.
For this work, our model simulation was initialized on 01 January 2008 with a globally uniform 3-D CO 2 field of 383.76 ppm.According to the record of NOAA-ESRL Mauna Loa Observatory in Hawaii (http://www.esrl.noaa.gov/gmd/ccgg/),which is a marine surface site, the annual mean CO 2 at Mauna Loa in 2007 was 383.76 ppm, with monthly means of 383.89 ppm in December 2007 and 385.44 ppm in January 2008.A 2year spin-up simulation from this initialized state allows for model transport, sources and sinks to develop the global spatial patterns of CO 2 ; this approach was evaluated in Nassar et al. (

Figure 4 .
Figure 4. Time series of the global mean (a) CO 2 surface fluxes and (b) CO 2 concentrations from the "truth", simulations, TT-S (the simplified version of Tan-Tracker) and TT (Tan-Tracker) assimilations from 1 January to 31 December 2010.

Figure 5 .
Figure 5.Time series of the posterior uncertainties (shaded areas) of the analyzed surface fluxes (TT) from 1 January to 31 December 2010.

Figure 6 .
Figure 6.Time series of the averaged scaling factors from 1 January to 31 December 2010.

Figure 8 .
Figure 8.Time series of the daily mean CO 2 surface fluxes from the "truth", simulations, TT-S (the simplified version of Tan-Tracker) and TT (Tan-Tracker) assimilations aggregated to the selected four TransCom regions (i.e., CT-02, CT-07, CT-11 and CT-20) during the period from 1 July to 31 December 2010.

Figure 12 .
Figure 12.Time series of the daily global mean (a) CO 2 surface fluxes and (b) CO 2 concentrations from the "truth" and the TT (Tan-Tracker) assimilations using different covariance localization radii (900, 1450 and 2000 km), respectively, from 1 January to 31 December 2010.

Figure 13 .
Figure 13.Time series of the daily global mean (a) CO 2 surface fluxes and (b) CO 2 concentrations from the "true" and the TT (Tan-Tracker) assimilations with ensemble numbers N = 106 and 150, respectively, from 1 January to 31 December 2010.

Figure 14 .
Figure 14.Comparisons between the observed XCO 2 and the openloop GEOS-Chem-simulated (Sim), Tan-Tracker-assimilated (TT) and the TT-Sim (i.e., the GEOS-Chem model run without data assimilation forced by the TT-optimized CF series derived from the Tan-Tracker assimilation run with the TT-assimilated initial CO 2 fields at 1 January 2010) simulated XCO 2 on 15 March 2010.
forces us to seek an indirect way to evaluate the Tan-Tracker assimilations.Here, we performed a parallel free run of GEOS-Chem forward simulation without any data assimilation.Then, to examine Tan-Tracker's performance quantitatively, the simulated and assimilated CO 2 dry-air mole fraction observations of XCO 2 on 15 March 2010 were compared with the corresponding (independent) GOSAT observations.After the data quality control (observation error < 0.75 ppm) implemented in this experiment, there are still 163 valid footprints left for system evaluation.Compared with the Sim case, the TT-assimilated XCO 2 is improved considerably with higher correlation (0.83 vs. 0.77) and a smaller RMSE (1.38 ppm vs. 2.95 ppm).The GEOS-Chem model generally underestimates the XCO 2 values by a substantial negative bias of −2.46 ppm, where the mean bias is given by − XCO 2 , i o , with XCO 2 , i s(