A regional carbon data assimilation system and its preliminary evaluation in East Asia

Abstract. In order to optimize surface CO2 fluxes at grid scales, a regional surface CO2 flux inversion system (Carbon Flux Inversion system and Community Multi-scale Air Quality, CFI-CMAQ) has been developed by applying the ensemble Kalman filter (EnKF) to constrain the CO2 concentrations and applying the ensemble Kalman smoother (EnKS) to optimize the surface CO2 fluxes. The smoothing operator is associated with the atmospheric transport model to constitute a persistence dynamical model to forecast the surface CO2 flux scaling factors. In this implementation, the "signal-to-noise" problem can be avoided; plus, any useful observed information achieved by the current assimilation cycle can be transferred into the next assimilation cycle. Thus, the surface CO2 fluxes can be optimized as a whole at the grid scale in CFI-CMAQ. The performance of CFI-CMAQ was quantitatively evaluated through a set of Observing System Simulation Experiments (OSSEs) by assimilating CO2 retrievals from GOSAT (Greenhouse Gases Observing Satellite). The results showed that the CO2 concentration assimilation using EnKF could constrain the CO2 concentration effectively, illustrating that the simultaneous assimilation of CO2 concentrations can provide convincing CO2 initial analysis fields for CO2 flux inversion. In addition, the CO2 flux optimization using EnKS demonstrated that CFI-CMAQ could, in general, reproduce true fluxes at grid scales with acceptable bias. Two further sets of numerical experiments were conducted to investigate the sensitivities of the inflation factor of scaling factors and the smoother window. The results showed that the ability of CFI-CMAQ to optimize CO2 fluxes greatly relied on the choice of the inflation factor. However, the smoother window had a slight influence on the optimized results. CFI-CMAQ performed very well even with a short lag-window (e.g. 3 days).


Introduction
Considerable progress has been made in recent years to reduce the uncertainties of surface CO 2 flux estimates through the use of an advanced data assimilation technique (e.g.Chevallier, 2007;Chevallier et al., 2005Chevallier et al., , 2007;;Baker et al., 2006;Engelen et al., 2009;Liu et al., 2012).Feng et al. (2009) showed that the uncertainties of surface CO 2 flux estimates can be reduced significantly by assimilating OCO X CO 2 measurements.Peters et al. (2005Peters et al. ( , 2007Peters et al. ( , 2009) ) developed a surface CO 2 flux inversion system, CarbonTracker, by incorporating the ensemble square-root filter (EnSRF) into the atmospheric transport TM5 model; the inversion results obtained by assimilating in situ surface CO 2 observations are in excellent agreement with a wide collection of carbon inventories that form the basis of the first North American State of the Carbon Cycle Report (SOCCR) (Peters et al., 2007).CarbonTracker has also been frequently used to constrain the surface CO 2 fluxes over Europe and Asia (eg., Zhang et al., 2014a, b).Kang et al. (2012) presented a simultaneous data assimilation of surface CO 2 fluxes and atmospheric CO 2 concentrations along with meteorological variables using a local ensemble transform Kalman filter (LETKF).They indicated that an accurate estimation of the evolving surface fluxes can be gained even without any a priori information.Recently, Tian et al. (2014) developed a new surface CO 2 flux data assimilation system, Tan-Tracker, by incorporating a joint PODEn4DVar assimilation framework into the GEOS-Chem model on the basis of Peters et al. (2005Peters et al. ( , 2007) ) and Kang et al. (2011Kang et al. ( , 2012)).They discussed in detail that the assimilation of CO 2 surface fluxes could be improved through the use of simultaneous assimilation of CO 2 concentrations and CO 2 surface fluxes.Despite the rigour of data assimilation theory, current CO 2 flux-inversion methods still face many challenging scientific problems, such as: (1) the well-known "signal-to-noise" problem (NRC, 2010); (2) large inaccuracies in chemical transport models (e.g.Prather et al., 2008); (3) vast computational expenses (e.g.Feng et al., 2009); and (4) the sparseness of observation data (e.g.Gurney et al., 2002).
The "signal-to-noise" problem is one of the most challenging issues for an ensemble-based CO 2 flux inversion system due to the fact that surface CO 2 fluxes are the model forcing (or boundary condition), rather than model states (like CO 2 concentrations), of the chemistry transport model (CTM).In the absence of a suitable dynamical model to describe the evolution of the surface CO 2 fluxes, most CO 2 flux-inversion studies have traditionally ignored the uncertainty of anthropogenic and other CO 2 emissions and focused on the optimization of natural (i.e.biospheric and oceanic) CO 2 emissions at the ecological scale (e.g.Deng et al., 2007;Feng et al., 2009;Peters et al., 2005Peters et al., , 2007;;Jiang et al., 2013;Peylin et al., 2013).
This compromise is acceptable to some extent.Indeed, the total amount of anthropogenic CO 2 emissions can be estimated by relatively well-documented global fuelconsumption data with a small degree of uncertainty (Boden et al., 2011), and the uncertainties involved in the total amount of anthropogenic CO 2 emissions are much smaller than those related to natural emissions.However, their spatial distribution, strength and temporal development still remain elusive because of their inherent non-uniformities (Andres et al., 2012;Gurney et al., 2009).Marland (2008) pointed out that even a tiny amount of uncertainty, i.e. 0.9 %, in one of the leading emitter countries like the U.S. is equivalent to the total emissions of the smaller emitter countries in the world.Furthermore, the usual values of anthropogenic CO 2 emissions in chemical transport models have thus far been simply interpolated from very coarse monthly-mean fuel consumption data.Therefore, great uncertainty in the spatiotemporal distributions of anthropogenic emissions likely exists, which could reduce the accuracy of CO 2 concentration simulations and subsequently increase the inaccuracy of natural CO 2 flux inversion results.In addition, current research approaches tend only to assimilate natural CO 2 emissions at the ecological scale, which is far from sufficient.Therefore, surface CO 2 fluxes should be constrained as a whole at a finer scale.
In CarbonTracker (Peters at al., 2007), a smoothing operator is innovatively applied as the persistence forecast model.In that application, the surface CO 2 fluxes can be treated as the model states and the observed information ingested by the current assimilation cycle can be used in the next assimilation cycle effectively.However, the "signal-to-noise" problem has not yet been resolved, and thus CarbonTracker also has to assimilate natural CO 2 emissions at the ecological scale only.In Tan-Tracker (Tian et al., 2014), a fourdimensional (4-D) moving sampling strategy (Wang et al., 2010) is used to generate the flux ensemble members, and so the surface CO 2 fluxes can be optimized as a whole at the grid scale.In this work, the persistence dynamical model taken by Peters et al. (2005) was further developed for the purpose of resolving the "signal-to-noise" problem, to optimize the surface CO 2 fluxes as a whole at the grid scale.This process is described in detail in Sect. 2 of this paper.
The surface CO 2 flux inversion system presented in this paper was developed by simultaneously optimizing the surface CO 2 fluxes and constraining the CO 2 concentrations.As we know, assimilating CO 2 observations from multiple sources can improve the accuracy of simulation results (e.g.Miyazaki, 2009;Liu et al., 2011Liu et al., , 2012;;Feng et al., 2011;Tangborn et al., 2013;Huang et al., 2014).In addition, previous studies showed that the simultaneous assimilation of CO 2 concentrations and surface CO 2 fluxes can largely eliminate the uncertainty in initial CO 2 concentrations on the CO 2 evolution (Kang et al., 2012;Tian et al., 2014).Therefore, we also use the simultaneous assimilation framework; the ensemble Kalman filter (EnKF) was used to constrain CO 2 concentrations and the ensemble Kalman smoother (EnKS) was used to optimize surface CO 2 fluxes.Since the regional chemical transport models, compared to global models, have some advantages in reproducing the effects of meso-microscale transport on atmospheric CO 2 distributions (Ahmadov et al., 2009;Pillai et al., 2011;Kretschmer et al., 2012), we choose a regional model, Regional Atmospheric Modeling System and Community Multi-scale Air Quality (RAMS-CMAQ) (Zhang et al., 2002(Zhang et al., , 2003(Zhang et al., , 2007;;Kou et al., 2013;Liu et al., 2013;Huang et al., 2014), to develop this inversion system.For simplicity, this system is referred to as CFI-CMAQ (Carbon Flux Inversion system and Community Multi-scale Air Quality).
Since this is the first introduction of CFI-CMAQ, we focus mainly on introducing the methodology in this paper.Nevertheless, in addition, Observing System Simulation Experiments (OSSEs) were designed to assess the system's ability to optimize surface CO 2 fluxes.The retrieval information of GOSAT X CO 2 are used to generate artificial observations because of the sparseness and heterogeneity of ground-based measurements.
The remainder of the paper is organized as follows.Section 2 describes the details of the regional surface CO 2 flux inversion system, CFI-CMAQ, including the developed persistence dynamical model, a simple review of the EnKS and EnKF assimilation approaches, and the process involved.The experimental designs are then introduced and the assimilation results shown in Sect.3. Finally, a summary and conclusions are provided in Sect. 4.
2 Framework of the regional surface CO 2 flux inversion system Suppose we have the prescribed net CO 2 surface flux, F * (x, y, z, t), which can be released from a climate model or be generated by other's methods, our ultimate goal is to optimize F * (x, y, z, t) by assimilating CO 2 observations from various platforms.As an ensemble-based assimilation system, CFI-CMAQ was also developed by applying a set of linear multiplication factors, similar to the approach by Peters et al. (2007) and Tian et al. (2014).The ith ensemble member of the surface fluxes, F i (x, y, z, t), from an N-member ensemble can be described by where λ i (x, y, z, t) represents the ith ensemble member of the linear scaling factors (Peters et al., 2007;Tian et al., 2014) for each time and each grid to be optimized in the assimilation.The notations are standard: the subscript i refers to the ith ensemble member.In the following, λ i (x, y, z, t) is referred to as λ i,t , F * (x, y, z, t) is referred to as F * t , and F i (x, y, z, t) is referred to as F i,t for simplicity.
At each optimization cycle, CFI-CMAQ includes the following four parts in turn (see Fig. 1): (1) forecasting of the linear scaling factors at time t, λ a i,t|t−1 ; (2) optimization of the scaling factors in the smoother window, (λ a i,t−M|t−1 , λ a i,t−M+1|t−1 , . .., λ a i,j |t−1 , . .., λ a i,t−1|t−1 , λ a i,t1|t−1 ), by EnKS, where λ a i,j |t−1 (j = t − 1 − M, . .., t − 1) refer to analysed quantities from the previous assimilation cycle at time j (see Fig. 1), |t − 1 means that these factors have been updated using observations before time t − 1, and the superscript a refers to the analysed; (3) updating of the fluxes in the smoother window, (F a i,t−M|t−1 , F a i,t−M+1|t−1 , . .., F a i,j |t−1 , . .., F a i,t−1|t−1 , F a i,t|t−1 ); and (4) assimilation of the forecast CO 2 concentration fields at time t, C f i (x, y, z, t) (referred to as C f i,t , and the superscript f refers to the forecast or the background), by EnKF.A flowchart illustrating CFI-CMAQ is presented in Fig. 2. The assimilation procedure is addressed in detail below.In addition, the observation operator is introduced, particularly for use in the GOSAT X CO 2 data in Sect.2.4.Furthermore, covariance inflation and localization techniques applied in CFI-CMAQ are introduced briefly in Sect.2.5.

Forecasting the linear scaling factors at time t λ a
i,t|t−1 In the previous assimilation cycle t − 1 − M∼ t − 1 (see Fig. 1), the optimized scaling factors in the smoother window are (λ a i,t−1−M|t−1 , λ a i,t−M|t−1 , λ a i,t−M+1|t−1 , . .., λ a i,j |t−1 , . .., λ a i,t−1|t−1 ) and the assimilated CO 2 concentration fields at time t − 1 are C a i (x, y, z, t − 1) (referred to as C a i,t−1 ).In the current assimilation cycle t − M ∼ t, the scaling factors in the current smoother window are (λ a i,t−M|t−1 , λ a i,t−M+1|t−1 , . .., λ a i,j |t−1 , . .., λ a i,t−1|t−1 , λ a i,t|t−1 ) and the forecast CO 2 concentration fields at time t are C f i,t .In order to pass the useful observed information onto the next assimilation cycle effectively, following Peters et al. (2007) the smoothing operator is applied as part of the persistence dynamical model to calculate the linear scaling factors λ a i,t|t−1 , where λ p i,t|t−1 refers to the prior values of the linear scaling factors at time t.The superscript p refers to the prior.This operation represents a smoothing over all the time steps in the smoother window (see Fig. 1), thus dampening variations in the forecast of λ a i,t|t−1 in time.In order to generate λ p i,t|t−1 , the atmospheric transport model (CMAQ) is applied and a set of ensemble forecast experiments are carried out.It integrates from time t − 1 to t to produce the CO 2 concentration fields Ĉf i (x, y, z, t) (referred to as Ĉf i,t hereafter to distinguish from C f i,t ) forced by the prescribed net CO 2 surface flux F * t with C a i,t−1 as initial conditions.Then, the ratio κ i,t = Ĉf i,t Ĉf i,t is calculated, where Ĉf i,t .Suppose that λ p i,t|t−1 = κ i,t due to the fact that the surface CO 2 fluxes correlate with its concentrations, the values for λ p i,t|t−1 are obtained and then λ a i,t|t−1 can finally be calculated (see the red arrows in the flowchart in Fig. 2).
The way the prior scaling factor λ p i,t|t−1 is updated by associating with the atmospheric transport model is the main improvement over the one used in CarbonTracker (Peters et al., 2007).In CarbonTracker, all λ p i,t|t−1 are set to 1 (Peters et al., 2007).The distribution of the ensemble members of the linear scaling factors at time t, λ p i,t|t−1 , are finally dependent on the distribution of the previous scaling factors because Eq. ( 2) is a linear smoothing operator.In this study, the values of λ p i,t|t−1 are updated by association with the atmospheric transport model.It is important to note that λ p i,t|t−1 in this study are rand fields with mean 1.However, the distribution of λ a i,t|t−1 are dependent on the distribution of all are the optimized scaling factors in the smoother window and C a i,t−1 are the assimilated CO 2 concentrations fields at time t − 1 in the previous assimilation are the scaling factors in the smoother window and C f i,t are the forecast CO 2 concentrations fields at time t which need to be optimized in the current assimilation cycle t − M∼ t. the scaling factors in the smoother window.An OSSE was designed to illustrate the difference between our method and the one in which λ p i,t|t−1 are set to 1 in Sect.3. It is also important to note that, similar to Peters et al. (2007), this dynamical model equation still does not in-clude an error term in the dynamical model, and the model error cannot be estimated yet.However, the covariance inflation is applied to compensate for model errors before optimization, which is addressed in Sect.2.5.

2.2 Optimizing the scaling factors in the smoother window by EnKS
Substituting λ a i,t|t−1 into Eq.( 1), the ith member of the surface fluxes at time t, F a i,t|t−1 , can be generated.Then, forced by F a i,t|t−1 , CMAQ was run from time t − 1 to t to produce the background concentration field C f i,t with C a i,t−1 as initial conditions.
In the current assimilation cycle t − M∼ t (see Fig. 1), the scaling factors to be optimized in the smoother window are (λ a , as stated in the first paragraph of Sect.2.1.Using the EnKS analysis technique, these scaling factors are updated in turn via where K e j,t|t−1 is the Kalman gain matrix of EnKS, y obs t is the observation vector measured at time t and y f i,t is the simulated values, υ i,t is a random normal distribution perturbation field with zero mean, S e j,t|t−1 is the background error cross-covariance between the state vector λ a i,j |t−1 and λ a i,t|t−1 , S e t,t|t−1 is the background error covariance of the state vector λ a i,t|t−1 , H (•) is the observation operator that maps the state variable from model space into observation space, R is the standard deviation representing the measurement errors, and φ(•) is the atmospheric transport model.
In actual implementations, it is unnecessary to calculate S e j,t|t−1 and S e t,t|t−1 separately.S e j,t|t−1 H T and H S e t,t|t−1 H T can be calculated as a whole by Then the ensemble mean values of the assimilated fluxes in the smoother window can be calculated via, Finally, those ensemble mean assimilated fluxes which are before the next smoother window and will not be updated by the succeeding observations are regarded as the final optimized fluxes.We referred to them as F a t for simplicity.

Assimilating the CO 2 concentration fields at time t by EnKF
The analysis of CO 2 concentrations fields at time t in the EnKF scheme is updated via where K is the Kalman gain matrix of EnKF, P f is the background error covariance among the background CO 2 concentration fields C f i,t .In the actual application, P f H T and H P f H T can be calculated as a whole by Finally, the ensemble mean values of the assimilated CO 2 concentrations fields can be gained via where C a t is regarded as the final analysing concentration field.

The observation operator
As mentioned above, the observation operator H (•) transforms the state variable from model space into observation space.Usually, it is the spatial bilinear interpolator for traditional ground-based observations.Since the GOSAT X CO 2 retrieval is a weighted CO 2 column average, the simulated Z. Peng et al.: A regional carbon data assimilation system X CO 2 should be calculated with the same weighted column average method (Connor et al., 2008;Crisp et al., 2010Crisp et al., , 2012;;O'Dell et al., 2012).Hence, the observation operator to assimilate the GOSAT X CO 2 retrieval is where y f i,t is the simulated X CO 2 ; y priori is the a priori CO 2 column average used in the GOSAT X CO 2 retrieval process; S(•) is the spatial bilinear interpolation operator that interpolates the simulated fields to the GOSAT X CO 2 locations to obtain the simulated CO 2 vertical profiles there; f priori is the a priori CO 2 vertical profile used in the retrieval process; h is the pressure weighting function, which indicates the contribution of the retrieved value from each layer of the atmosphere; and a CO 2 is the normalized averaging kernel.

Covariance inflation and localization
In order to keep the ensemble spread of the CO 2 concentrations at a certain level and compensate for transport model error to prevent filter divergence, covariance inflation is applied before updating the CO 2 concentrations.So, where α is the inflation factor of CO 2 concentrations and (C f i,t ) new is the final field used for data assimilation.
Similarly, covariance inflation is also used to keep the ensemble spread of the prior scaling factors at a certain level and compensate for dynamical model error.Hence, where β is the inflation factor of scaling factors and (λ p i,t|t−1 ) new is the final scaling factors used for data assimilation.
In addition, the Schur product is utilized to filter the remote correlation resulting from the spurious long-range correlations (Houtekamer and Mitchell 2001).Hence, the Kalman gain matrix K e j,t|t−1 and K are updated via where the filtering matrix ρ is calculated using the formula where c is the element of the localization Schur radius.The matrix ρ can filter the small background error correlations associated with remote observations through the Schur product (Tian et al., 2011); and the Schur product tends to reduce the effect of those observations smoothly at intermediate distances due to the smooth and monotonically decreasing of the filtering matrix.

OSSEs for evaluation of CFI-CMAQ
A set of OSSEs were designed to quantitatively assess the performance of CFI-CMAQ.The setup of the experiments and the results are described in this section.

Experimental setup
The chemical transport model utilized was RAMS-CMAQ (Zhang et al., 2002), in which CO 2 was treated as an inert tracer.The model domain was 6654 × 5440 km 2 on a rotated polar stereographic map projection centred at (35.0 • N, 116.0 • E), with a horizontal grid resolution of 64 × 64 km 2 and 15 vertical layers in the σ z -coordinate system, unequally spaced from the surface to approximately 23 km.The initial fields and boundary conditions of the CO 2 concentrations were interpolated from the simulated CO 2 fields of CarbonTracker 2011 (Peters et al., 2007).The prior surface CO 2 fluxes included biosphere-atmosphere CO 2 fluxes, ocean-atmosphere CO 2 fluxes, anthropogenic emissions, and biomass-burning emissions (Kou et al., 2013), F p (x, y, z, t) = F bio (x, y, z, t) + F oce (x, y, z, t) where F p (x, y, z, t) (referred to as F p t ) was the prior surface CO 2 flux; F bio (x, y, z, t) and F oce (x, y, z, t) were the biosphere-atmosphere and ocean-atmosphere CO 2 fluxes, respectively, which were obtained from the optimized results of CarbonTracker 2011 (Peters et al., 2007); F ff (x, y, z, t) was fossil fuel emissions, adopted from the Regional Emission inventory in ASia (REAS, 2005 Asia monthly mean emission inventory) with a spatial resolution of 0.5 • × 0.5 • (Ohara et al., 2007); and F fire (x, y, z, t) was biomassburning emissions, provided by the monthly mean inventory at a spatial resolution of 0.5 • × 0.5 • from the Global Fire Emissions Database, Version 3 (GFED v3) (van der Werf et al., 2010).Among all these fluxes, F bio (x, y, z, t), F oce (x, y, z, t) and F ff (x, y, z, t) had nonzero values at model level 1, while they all were zeros at the other 14 levels.However, F fire (x, y, z, t) had nonzero values at model level 1 ∼ 5 and they were all zeros at other the 10 levels.So, all fluxes in this paper were the function of (x, y, z, t) for convenience.
Firstly, the prior flux F ; and (f) . All column mixing ratios are column-averaged with real GOSAT X CO 2 averaging kernels at GOSAT X CO 2 locations.Each symbol indicates the monthly average value of all X CO 2 estimates in the model grid.C a t are the ensemble mean values of the assimilated CO 2 concentrations fields of a CFI-CMAQ OSSE, in which the lag-window was 9 days and β was 70.They are the same OSSE in Figs.3-6.
the following).Then, the artificial GOSAT observations y obs t (or X p CO 2 ) were generated by substituting C p t into the observation operator in Eq. ( 16).The retrieval information of GOSAT X CO 2 (y priori , f priori , h and a CO 2 ) needed in Eq. ( 16) were gained from the v2.9 Atmospheric CO 2 Observations from Space (ACOS) Level 2 standard data products, which only utilized the SWIR observations.Only data classified into the "Good" category were utilized in this study.During the retrieval process, most of the soundings (such as data with a solar zenith angle greater than 85 • , or data not in clear sky conditions, or data collected over the ocean but not in glint, etc.) were not processed, so typically data products for the "good" category contained only 10-100 soundings per satellite orbit (Osterman et al., 2011), and there were only 0 ∼ 60 samples per orbit in the study model domain generally.Fig. 3a also showed the total number of "good" GOSAT X CO 2 observations for each model grid in February in 2010.There were relatively more observations over most continental regions of the study domain except some regions in North-East and South China.The total numbers ranged from 1 to 8.However, there were almost no data over the oceans of the study domain.Secondly, the prescribed surface CO 2 fluxes series F * t were created by where δ was a random number.They were standard normal distribution time series at each grid in the integration period of our numerical experiment.Driven by F * t , the RAMS-CMAQ model was integrated to obtain the CO 2 simulations C f (x, y, , z, t) (referred to as C f t hereafter).Then, the column-averaged concentrations X f CO 2 were obtained using Eq. ( 16).
The performance of CFI-CMAQ was evaluated through a group of well-designed OSSEs, and the goal of each OSSE was to retrieve the true fluxes F p t from given true observa-tions X p CO 2 and "wrong" fluxes F * t .In all the OSSEs, we assimilated artificial observations X p CO 2 about three times a day since GOSAT has about three orbits in the study model domain.If there were some observations, CFI-CMAQ paused to assimilate.Otherwise, it continued simulating.The default ensemble size N was 48, the measurement errors were 1.5 ppmv, the standard localization Schur radius c was 1280 km (20 grid spacing), and the covariance inflation factor of concentrations α was 1.1.The referenced lag-window was 9 days and the covariance inflation factor of the prior scaling factors β was 70.Since the smoother window was very important for CO 2 transportation and β was a newly introduced parameter, both these parameters were further investigated by several numerical sensitivity experiments.The primary focus of this paper was to describe the assimilation methodology, so all the numerical experiments started on 1 January 2010 and ended on 30 March 2010.
As for the initialization of CFI-CMAQ, only the ensemble of background concentration fields C f i,0 needed to be initialized at the time t = 0 because the values of λ a i,t|t−1 were updated using the persistence dynamical model.In practice, the mean concentration fields at t = 0 are interpolated from the simulated CO 2 fields of CarbonTracker 2011 (Peters et al., 2007).The ensemble members of the background concentration fields were created by adding random vectors.The mean values of the random vectors were zero and the variances were 2.5 percent of the mean concentration fields.The atmospheric transport model then integrated from time t = 0 to t = 1 driven by F * t with C f i,0 as initial conditions to produce the CO 2 concentration fields Ĉf i,1 .Subsequently, the first prior linear scaling factors, λ p i,1|0 , could be calculated by applying Ĉf i,1 .Assuming that λ a i,1|0 = λ p i,1|0 , λ a i,1|0 are gained, finally.For the first assimilation cycle, the lag-window was only one (that is, only λ a i,1|0 needed to be optimized in the first assimilation cycle).It increased for the first dozens of assimilation cycles until it reached M + 1 as CFI-CMAQ continued to assimilate observations.Once the system was initialized, all future scaling factors could be created using the persistence dynamical model, which associated the smoothing operator with the atmospheric transport model.
In order to illustrate the limitation using only the smoothing operator as the persistence dynamical model to generate all future scaling factors, another OSSE (referred to as the reference experiment to distinguish it from the abovementioned CFI-CMAQ OSSEs) was designed to optimize the surface CO 2 fluxes at grid scale.The reference experiment was under the same assimilation framework as CFI-CMAQ except that all λ p i,t|t−1 were set to 1 (Peters et al., 2007).Beside that, the initialization procedure of the reference experiment was different from that of the CFI-CMAQ.In practice, both the ensemble of background concentration fields at t = 0, C f i,0 , and the ensemble members of the scaling factors at t = 1, λ a i,1|0 , needed to be initialized because they could not be generated in other ways (Peters et al., 2005).The initial concentration fields C f i,0 were created using the same method as that used to generate C f i,0 for the CFI-CMAQ OSSEs.The ensemble members of the scaling factors λ a i,1|0 were rand fields.Their mean values were 1 and their variances were 0.1.In addition, in order to keep the ensemble spread of the scaling factors λ a i,t|t−1 at a certain level and compensate for dynamical model error, covariance inflation was also used and the covariance inflation factor of the scaling factors λ a i,t|t−1 was 1.6.All other parameters are the same as used in the CFI-CMAQ OSSEs.The ensemble size N was 48, the measurement errors were 1.5 ppmv, the standard localization Schur radius c was 1280 km, the covariance infla- tion factor of concentrations α was 1.1, and the lag-window was 9 days.

Experimental results
Essentially, the assimilation part of CFI-CMAQ includes two subsections: one for the CO 2 concentration assimilation with EnKF, which can provide convincing CO 2 initial analysis fields for the next assimilation cycle; and the other for the CO 2 flux optimization with EnKS, which can provide better estimation of the scaling factors for the next time through the persistence dynamical model, except for optimized CO 2 fluxes.The performance of the EnKF subsection will be greatly influenced by the validation of the EnKS subsection, or vice versa.Firstly, the performance of CFI-CMAQ will be quantitatively assessed in detail using the assimilated re- sults of a CFI-CMAQ OSSE, in which the lag-window was 9 days and β was 70.The sensitivities of β and the lag-window will then be discussed in the following two paragraphs.Finally, the assimilation results of the reference experiment in whichλ p i,t|t−1 were set to 1 will be briefly described at the end of this subsection.
We begin by describing the impacts of assimilating artificial observations X p CO 2 on CO 2 simulations by CFI-CMAQ.As shown in Fig. 4a, b and d, the monthly mean values of the background CO 2 concentrations C f t produced by the magnified surface CO 2 fluxes F * t were much larger than those of the artificial true CO 2 concentrations C p t produced by the prior surface CO 2 fluxes F p t near the surface in February 2010.In the east and south of China especially, the magnitude of the difference between C p t and C f t was at least 6 ppmv.Also, as expected, the monthly mean X f CO 2 was much larger than the monthly mean artificial observations X p CO 2 , and the magnitude of the difference between X p CO 2 and X f CO 2 reached 2 ppmv in the east and south of China (see Fig. 3b, c, e).However, the impact of magnifying surface CO 2 fluxes on the CO 2 concentrations was primarily below the model-level 10 (approximately 6 km), and especially below model-level 7 (approximately 1.6 km).Above model-level 10, the differences between C p t and C f t fell to zero (see Fig. 5a, b).After assimilating X p CO 2 , the analysis CO 2 concentrations C a t was much closer to C p t (see Fig. 4c, e, f).The monthly mean difference between C  5c, d).The monthly mean X a CO 2 was also closer to X p CO 2 and the difference between X p CO 2 and X a CO 2 ranged from −0.5 to 0.5 ppmv.In order to evaluate the general impact of assimilating X p CO 2 in the surface layer, time series of the daily mean CO 2 concentration extracted from the background simulations and the assimilations were compared with the artificial true simulations at four national background stations in China and their nearest large cities.As shown in Fig. 3a, Waliguan is 150 km away from Xining, Longfengshan is 180 km away from Haerbin, Shangdianzi is 150 km away from Beijing, and Linan is 50 km away from Hangzhou.The assimilated results are shown in Fig. 6.The background time series were much larger than the artificial true time series, especially at Shangdianzi, Beijing, Linan and Hangzhou, which are strongly influenced by local anthropogenic CO 2 emissions.After assimilating X p CO 2 , the assimilated time series were very close to the true time series with negligible bias, as expected, at Waliguan, Xining, Shangdianzi, Beijing, Linan and Hangzhou, especially after the first 10 days, which can be considered the spin-up period.Meanwhile, the improvements at Longfengshan and Haerbin were limited due to the absence of observation data at those locations (see Fig. 3a).Nevertheless, in general, the substantial benefits to the CO 2 concentrations in the surface layer of assimilating GOSAT X CO 2 with EnKF are clear.All the results illustrated that CFI-CMAQ can provide a convincing CO 2 initial analysis fields for CO 2 flux inversion.
The impacts of assimilating X p CO 2 on surface CO 2 fluxes were also highly impressive by CFI-CMAQ.On the whole, the prescribed CO 2 surface fluxes F * t were much larger than the true surface CO 2 fluxes F p t in February 2010, especially in the east and south of China.The monthly mean difference between F * t and F p t reached 5 µmole m −2 s −1 in Jing-Jin-Ji, the Yangtze River delta, and the Pearl River Delta Urban Circle because of the strong local anthropogenic CO 2 emissions (see Fig. 7a, b, d).After assimilating X p CO 2 , the ensemble mean of the assimilated surface CO 2 fluxes F a t decreased sharply.Thus, the monthly mean values of F a t were much smaller than F * t in most of the model domain in February 2010.The pattern of the difference between F a t and F * t was similar to that of the difference between F p t and F * t (see Fig. 7d).The ensemble mean of the assimilated surface CO 2 fluxes F a t were also compared to the artificial true fluxes F p t , revealing that F a t was equivalent to F p t in most of the model domain.The monthly mean difference between F a t and F p t ranged from −0.1 to 0.1 µmole m −2 s −1 only (see Fig. 7e).
In addition, the root-mean-square errors (RMSEs) of the assimilated flux members were analysed.As shown in Fig. 8, the monthly mean RMSE was less than 0.5 µmole m −2 s −1 in most of the model domain, except in areas near to large cities such as Beijing, Shanghai and Guangzhou, indicating that the assimilated CO 2 fluxes were reliable.
In order to evaluate the ability of CFI-CMAQ to optimize the surface CO 2 fluxes comprehensively, the ratios of the monthly mean F * t to the monthly mean F p t were analysed.In actual implementation, we only analysed the ratios where the absolute values of the monthly mean F p t were larger than 0.1, to avoid random noise.As shown in Fig. 9a, the ratios of the monthly mean F * t to the monthly mean F p t are about 1.8 in most of China, except in the Qinghai-Tibet Plateau, where the absolute values of the monthly mean F p t in February were very small and were not analysed.In addition, the ratios of the monthly mean F a t to the monthly mean F p t are shown in Fig. 9b.This figure demonstrates that the impact of the assimilation of X p CO 2 by CFI-CMAQ on CO 2 fluxes was great in the east and south of China in general, but the influence was negligible in Northeast China due to the lack of observation data.
Time series of daily mean surface CO 2 fluxes extracted from F * t and F a t were also compared with that from F p t at four national background stations in China and their nearest large cities, similar to the CO 2 concentration assimilation.The results are shown in Fig. 10.The background time series were much larger than the artificial true time series, especially at Haerbin, Shangdianzi, Beijing, Linan and Hangzhou, which are strongly influenced by local anthropogenic CO 2 emissions.After assimilating X p CO 2 , the assimilated time series were near to the true time series with acceptable bias, as expected, at Waliguan, Xining, Shangdianzi, Linan and Hangzhou after the 10-day spin-up period.However, the improvements at Longfengshan and Haerbin were negligible because of a lack of observations at these locations.Also, this inversion system failed to show improvements in Beijing.One of the possible reasons was that the values of the ensemble spread of λ a i,t|t−1 in the Beijing area are too large (see Fig. 11c).Beijing is located in the Jing-Jin-Ji Urban Circle, which had strong local anthropogenic CO 2 emissions during January-March.So the values of the ensemble spread of C f i,t in the Beijing area at model-level 1   could be much larger than those in other areas, which had weak local anthropogenic CO 2 emissions (see Fig. 11a).As a result, the values of the ensemble spread of λ p i,t|t−1 before inflating in the Beijing area are much larger than those in other areas with small local anthropogenic CO 2 emissions (see Fig. 11b).After inflating, the ensemble spread of λ p i,t|t−1 in the Beijing area could be too large, compared to those in other areas with small local anthropogenic CO 2 emissions (see Fig. 11c), which led to the failure to reproduce the true fluxes in the Beijing area.Later, CFI-CMAQ will be improved by optimizing the covariance inflation method.
Since the impact of assimilation X p CO 2 by CFI-CMAQ on CO 2 fluxes was in general greater in the east and south of China than other model areas (see Figs. 7, 9), the time series of the daily mean CO 2 fluxes in that area averaged from F a t was compared with those from F * t and F p t (see Fig. 12).This figure indicates that CFI-CMAQ could in general reproduce the true fluxes with acceptable bias.
As stated in the above section, β was a newly introduced parameter.The prior scaling factors should have been inflated indirectly through the inflated CO 2 concentration forecast.However, the values of the ensemble spread of λ p i,t|t−1 before inflating were very small (ranging from 0 to 0.08 in most areas at model-level 1, see Fig. 11b), though the values of the ensemble spread of C f i,t after inflating could reach 1-14 ppmv in most areas at model-level 1 (see Fig. 11a).Consequently, we had to inflate them again before using them in Eq. (2).smaller than 50 (e.g.β = 10), the impact of assimilation was small due to the small ensemble spread; or if β was much larger than 80 (e.g.β = 100), the assimilated CO 2 fluxes deviated markedly from the "true" CO 2 fluxes.In other words, the performance of CFI-CMAQ greatly relies on the choice of β.
From the perspective of the lag-window, the differences among the four assimilation sensitivity experiments with lagwindows of 3, 6, 9 and 12 days were very small (see Fig. 13).Although Peters et al. (2007) indicated that the lag-window should be more than five weeks, it seemed that the smoother window had a slight influence on the assimilated results for CFI-CMAQ.It was clear that the assimilated results with a larger lag-window were better than those with a smaller lag-window; however, CFI-CMAQ performed very well even with a short lag-window (e.g. 3 days).
At the end of this subsection, the assimilation results of the reference experiment in which λ p i,t|t−1 were set to 1 will be addressed briefly.The impact of assimilation X p CO 2 on CO 2 fluxes was disordered.The monthly mean values of the difference between the prior true surface CO 2 fluxes and the ensemble mean values of the assimilated surface CO 2 fluxes were irregular noise (see Fig. 14).The main reason is that all the elements of the scaling factors to be optimized in the smoother window are only random numbers.As stated in the above section, only λ a i,1|0 needed to be optimized in the first assimilation cycle.However, λ a i,1|0 were rand fields (in other words, all the elements of λ a i,1|0 are random numbers) because they could not be generated in other ways in the first instance.Therefore their spatial correlations were too small.The correlations between the scaling factors and the observations were also too small.It was therefore impossible to systematically change the values of λ a i,1|0 in large areas where the observations located after assimilating observations at t = 1.Hence, the signal-to-noise problem arose.So, the elements of λ a i,1|1 are also only random numbers.Though λ a i,2|1 could be generated automatically by the smoothing operator when all λ p i,2|1 were set to 1, the elements of λ a i,2|1 are also random numbers because the smoothing operator is only a linear operator.Similarly, it was impossible to systematically change the values of λ a i,1|1 and λ a i,2|1 in large areas after assimilating observations at t = 2.As this inversion system continued assimilating observations, all future scaling factors could be created by the smoothing operator and then updated.But this inversion system could not ingest the observations effectively because all the elements of the scaling factors were always random numbers.Though the 9-day lagwindow in the reference experiment is too short compared to the 5 week lag-window recommended by Peters et al. (2007), this reference experiment could illustrate the limitation by only using the smoothing operator as the persistence dynamical model.If the lag-window was around 5 weeks, we could achieve better results because there were more observations in every assimilation cycle.However, the results could not be better than those obtained by CFI-CMAQ because most grids have no observations (see Fig. 3a) and the signal-tonoise problem still remained.

Summary and conclusions
A regional surface CO 2 flux inversion system, CFI-CMAQ, has been developed to optimize CO 2 fluxes at grid scales.It operates under a joint data assimilation framework by applying EnKF to constrain the CO 2 concentrations and applying EnKS to optimize the surface CO 2 flux, which is similar to Kang et al. (2011Kang et al. ( , 2012) ) and Tian et al. (2014).The persistence dynamical model, which was first introduced by Peters et al. (2007) by applying the smoothing operator to transport the useful observed information onto the next assimilation cycle, is developed further.We associated the smoothing operator with the atmospheric transport model to con- stitute the persistence dynamical model to forecast the surface CO 2 flux scaling factors for the purpose of resolving the "signal-to-noise" problem, as well as transporting the useful observed information onto the next assimilation cycle.In this application, the scaling factors to be optimized in the flux inversion system can be forecast at the grid scale without random noise.The OSSEs showed that the performance of CFI-CMAQ is effective and promising.In general, it could reproduce the true fluxes at the grid scale with acceptable bias.This study represents the first step in developing a regional surface CO 2 flux inversion system to optimize CO 2 fluxes over East Asia, particularly over China.In future, we intend to further develop the covariance localization techniques and inflation techniques to improve the performance of CFI-CMAQ.Furthermore, the uncertainty of the boundary conditions should be considered to improve the effectiveness of regional CO 2 flux optimization.

Figure 1 .
Figure 1.Schematic diagram of the smoother window.(λ a i,t−1−M|t−1 , λ a i,t−M|t−1 , λ a i,t−M+1|t−1 , . .., λ a i,j |t−1 , . .., λ a i,t−1|t−1 ) are the optimized scaling factors in the smoother window and C a i,t−1 are the assimilated CO 2 concentrations fields at time t − 1 in the previous assimilation cycle t −1−M∼ t −1.(λ a i,t−M|t−1 , λ a i,t−M+1|t−1 , . .., λ a i,j |t−1 , . .., λ a i,t−1|t−1 , λ a i,t|t−1 ) are the scaling factors in the smoother window and C f i,t are the forecast CO 2 concentrations fields at time t which need to be optimized in the current assimilation cycle t − M∼ t.

Figure 2 .
Figure 2. Flowchart of the CFI-CMAQ system used to optimize surface CO 2 fluxes at each assimilation cycle.The system includes the following four parts in turn: (1) forecasting of the linear scaling factors λ a i,t|t−1 (red arrows); (2) optimization of the scaling factors in the smoother window by EnKS (see Fig. 1) (blue arrows); (3) updating of the flux in the smoother window (green arrows); and (4) assimilation of the CO 2 concentration fields at time t by EnKF (black arrows).

pt
Figure 3. (a) Total number of observations in February 2010 in the model grid.Each symbol indicates the total number of all GOSAT X CO 2 measurements in the corresponding model grid.Monthly mean values in February 2010 of (b) X p CO 2 , column mixing ratio of C p t ; (c) X f CO 2 ,

Figure 4 .
Figure 4. Monthly mean values of (a) C p t , the artificial true simulations driven by the prior surface CO 2 fluxes F p t ; (b) C f t , the background simulations driven by magnified surface CO 2 fluxes F * t = (1.8+ δ(x, y, z, t))F p t ; (c) C a t , the ensemble mean values of the assimilated CO 2 concentrations fields; (d) C p t − C f t ; (e) C p t − C a t ; and (f) 100 * (C p t − C a t ) C p t at model-level 1 in February 2010.Black lines EF and GH indicate the positions of the cross sections shown in Fig. 5.

Figure 5 .
Figure 5. Monthly mean cross sections of C p t − C f t along line (a) EF and (b) GH, and monthly mean cross sections of C p t − C a t along line (c) EF and (d) GH (cross section lines shown in Fig. 4d) in February 2010.

Figure 7 .
Figure 7. Monthly mean values in February 2010 of (a) F p t , the prior true surface CO 2 fluxes; (b) F * t , the prescribed CO 2 surface fluxes, F * t = (1.8+ δ(x, y, z, t))F p t ; (c) F a t , the ensemble mean values of the assimilated surface CO 2 fluxes; (d) F p t − F * t ; and (e) F p t − F a t (units: µmole m −2 s −1 ).F a t are the assimilated results of a CFI-CMAQ OSSE, in which the lag-window was 9 days and β was 70.They are the same in Figs.7-10.

Figure 9 .
Figure 9. (a) Ratios of monthly mean F * t to monthly mean F p t ; and (b) ratios of monthly mean F a t to monthly mean F p t in February 2010.The white part indicates the ratios where the absolute values of monthly mean F p t are larger than 0.1, not analysed in this study.The black square labelled I indicates the domain where surface CO 2 fluxes were used for the results presented in Figs. 12 and 13.

Figure 10 .
Figure 10.Daily mean time series of CO 2 fluxes at national background stations in China and their nearest large cities from 1 January to 20 March 2010, extracted from the prior true surface CO 2 fluxes F p t (black), the prescribed CO 2 surface fluxes F * t (red), and the assimilated CO 2 fluxes F a t (blue).All time series were interpolated to the observation locations by the spatial bilinear interpolator method.The sites used are (a) Waliguan, (b) Xining, (c) Longfengshan, (d) Haerbin, (e) Shangdianzi, (f) Beijing, (g) Linan, and (h) Hangzhou.

Figure 12 .
Figure 12.Time series of daily mean CO 2 fluxes averaged in domain I (shown in Fig. 9a) from 1 January to 20 March 2010 with the inflation factor of scaling factors β = 50, 60, 70, 75 and 80.The black dashed line is the time series averaged from F * t and the black solid line is the time series averaged from F p t .
Fig. 11c shows the distribution of the ensemble spread of λ a i,t|t−1 at model-level 1 at 00:00 UT on 1 March 2010 when β = 70.It shows that the values of the ensemble spread of λ a i,t|t−1 ranged from 0.1 to 0.8 in most areas.In order to investigate the sensitivity of the inflation factor of the scaling factors β, a series of numerical experiments were conducted.As shown in Fig. 12, CFI-CMAQ worked rather well for β = 60, 70, 75, 80.However, if β was much

Figure 13 .
Figure 13.Time series of daily mean CO 2 fluxes averaged in domain I (shown in Fig. 9a) from 1 January to 20 March 2010 with different smoother windows (3, 6, 9 and 12 days).The black dashed line is the time series averaged from F * t and the black solid line is the time series averaged from F p t .

Figure 14 .
Figure 14.Monthly mean values of the difference between the prior true surface CO 2 fluxes and the ensemble mean values of the assimilated surface CO 2 fluxes (units: µmole m −2 s −1 ) of the reference experiment in which λ p i,t|t−1 were set to 1.