Influence of Co 2 Observations on the Optimized Co 2 Flux in an Ensemble Kalman Filter

In this study, the effect of CO 2 observations on an analysis of surface CO 2 flux was calculated using an influence matrix in the CarbonTracker, which is an inverse mod-eling system for estimating surface CO 2 flux based on an ensemble Kalman filter. The influence matrix represents a sensitivity of the analysis to observations. The experimental penal element of the influence matrix (i.e., analysis sensitivity) is globally 4.8 % on average, which implies that the analysis extracts 4.8 % of the information from the observations and 95.2 % from the background each assimilation cycle. Because the surface CO 2 flux in each week is optimized by 5 weeks of observations, the cumulative impact over 5 weeks is 19.1 %, much greater than 4.8 %. The analysis sensitivity is inversely proportional to the number of observations used in the assimilation, which is distinctly apparent in continuous observation categories with a sufficient number of observations. The time series of the globally averaged analysis sensitivities shows seasonal variations, with greater sensitivities in summer and lower sensitivities in winter, which is attributed to the surface CO 2 flux uncertainty. The time-averaged analysis sensitivities in the Northern Hemisphere are greater than those in the tropics and the Southern Hemisphere. The trace of the influence matrix (i.e., information content) is a measure of the total information extracted from the observations. The information content indicates an imbalance between the observation coverage in North America and that in other regions. Approximately half of the total observational information is provided by continuous observations, mainly from North America, which indicates that continuous observations are the most informative and that comprehensive coverage of additional observations in other regions is necessary to estimate the surface CO 2 flux in these areas as accurately as in North America.

Recent studies on atmospheric CO 2 inversion have focused on analyzing the difference between prior and optimized surface CO 2 fluxes obtained by using new inversion methods or observations (Chevallier et al., 2009a;Basu et al., 2013), as well as the carbon cycle based on optimized surface CO 2 fluxes.By contrast, the impact of various atmospheric CO 2 observations on the estimation of surface CO 2 fluxes has rarely been studied.One method employed to evaluate the impact of observations on atmospheric CO 2 inver-sion is the calculation of the uncertainty reduction (Peters et al., 2005;Meirink et al., 2008;Chevallier et al., 2009b;Feng et al., 2009), which is a ratio between the variances of the prior and posterior state vectors.A large uncertainty reduction implies that observations have a large impact on the estimation of surface CO 2 fluxes.However, the uncertainty reduction cannot measure the impact of individual observations on the estimated (i.e., analyzed) surface CO 2 fluxes.Another method for assessing the impact of observations is to calculate the information content, which is the amount of information obtained from observations (Rodgers, 2000).Engelen and Stephen (2004) calculated the information content of infrared satellite sounding observations on atmospheric CO 2 concentrations.To estimate the impact of simulated CO 2 observations on surface flux analysis, Zupanski et al. (2007) calculated the information content using the information matrix in the ensemble subspace.However, similar to the uncertainty reduction, these methods calculate the impact of all observations, rather than calculating the impact of individual observations on surface CO 2 flux analysis.
Data assimilation algorithms are fundamentally based on a linear statistical assumption (Talagrand, 1997).Both sequential and variational algorithms combine background and observation information to estimate parameters based on the linear assumption.According to the linear assumption, the influence matrix that measures the impact of individual observations on estimated parameters can be calculated in the observation space.Cardinali et al. (2004) suggested a method for calculating the influence matrix within the general data assimilation framework and applied the method to a forecast model of the European Centre for Medium Weather Forecasts (ECMWF).The diagonal elements of the influence matrix are the analysis sensitivities (i.e., self-sensitivity), which are proportional to the spread of the analysis and are inversely proportional to the predetermined observation error.The trace of the diagonal elements of the influence matrix reflects the information content, which is the amount of information extracted from observations.The influence matrix provides objective diagnostics regarding the impact of observations on the analysis and hence the performance of the data assimilation system because inaccurate observations can be identified by analyzing the observation impact (Cardinali et al., 2004).Liu et al. (2009) suggested a method for calculating self-sensitivity and cross-sensitivity (i.e., off-diagonal elements of the influence matrix) within the EnKF framework and diagnosed the relative importance of individual observations within an observation system using the idealized Lorenz 40 model and the simplified hydrostatic model.
Although Cardinali et al. (2004) and Liu et al. (2009) suggested methods for calculating the impact of individual observations on an analysis, their studies focused on NWP.Therefore, the impact of individual observations on surface CO 2 flux analysis has not been diagnosed in a study on atmospheric CO 2 inversion using the state-of-the-art data assimilation method.Because the analysis is more important than the forecast in atmospheric CO 2 inversion, the methods suggested by Cardinali et al. (2004) and Liu et al. (2009) can be applied to diagnose the impact of observations on the CO 2 flux analysis.
CarbonTracker is a system developed by the National Oceanic and Atmospheric Administration (NOAA), which optimizes the surface CO 2 flux by assimilating mole fraction observations (i.e., concentration) of surface CO 2 (Peters et al., 2005).CarbonTracker has been applied in studies on atmospheric CO 2 inversion in North America (Peters et al., 2010), Europe (Peters et al., 2010), and Asia (Kim et al., 2014).To develop CarbonTracker for use in Asia, Kim et al. (2012) performed an experiment employing Carbon-Tracker in this region and demonstrated that CarbonTracker produces optimized surface CO 2 fluxes for Asia.Kim et al. (2014) showed that the estimates of the surface CO 2 flux are more consistent with observed CO 2 concentrations in Asia when using the nesting domain of the transport model on Asia in CarbonTracker.Zhang et al. (2014) conducted a study on the assimilation of aircraft CO 2 observations from the Comprehensive Observation Network for TRace gases by AIrLiner (CONTRAIL; Machida et al., 2008) in Asia using CarbonTracker.
In this study, an influence matrix is calculated in Carbon-Tracker to evaluate the impact of mole fraction observations of CO 2 on the analyzed surface CO 2 fluxes.The relative importance of each observation site and each observation site category is evaluated by analyzing the self-sensitivity and information content, and the characteristics of the selfsensitivity and information content are subsequently investigated.Section 2 presents the experimental framework, which includes CarbonTracker, EnKF, observations, the methodology for calculating the influence matrix, and the experimental framework.Section 3 presents the results, and Sect. 4 provides a summary and conclusion.

CarbonTracker
CarbonTracker is an atmospheric CO 2 inversion system that estimates the surface CO 2 flux consistent with CO 2 observations.In CarbonTracker, the optimized flux with a 1 • × 1 • horizontal resolution is calculated as where F bio (x, y, t) is the prescribed prior biosphere flux from the Carnegie-Ames-  et al., 2010); and λ r is the scaling factor to be optimized in the data assimilation process, corresponding to 156 ecoregions around the globe.CarbonTracker adopts a smoother window to reflect the transport speed of CO 2 , which is based on the temporal relationship between the surface CO 2 flux and atmospheric CO 2 observations, as found in Bruhwiler et al. (2005) (Peters et al., 2005).For this reason, the scaling factor is optimized for 5 weeks of lag, which implies that the observations made in the most recent week affect the optimized surface CO 2 flux in the preceding 4 weeks.The optimization of the scaling factor during the data assimilation process is presented in Fig. 1.In each assimilation cycle, 5 weeks of analysis scaling factors are estimated by observations from the most recent week.After the fifth cycle, the scaling factor estimated by these 5 weeks of observations is saved as the optimized scaling factor and used to calculate the optimized surface CO 2 flux in Eq. ( 1).During this process, a new mean background scaling factor for the next week is calculated by the estimated mean scaling factors of the previous 2 weeks using a simple dynamic model, as follows: where λ b t is a prior mean scaling factor for the new analysis week; λ a t−2 and λ a t−1 are posterior mean scaling factors estimated 2 weeks and 1 week previous, respectively; and λ p is a prior value fixed as 1.Thus, the information from the previous observations is included in λ b t .The TM5 model (Krol et al., 2005) is used as a transport model that calculates model CO 2 concentrations corresponding to the observed CO 2 concentrations.The TM5 model uses the surface CO 2 fluxes calculated from Eq. (1) and the ECMWF meteorological field to calculate model CO 2 concentrations and is used as the observation operator, which will be explained in Sect.2.2.

Ensemble Kalman filter
The EnKF data assimilation method used in CarbonTracker is the ensemble square root filter (EnSRF) suggested by Whitaker and Hamill (2002).The analysis equation for data assimilation is expressed as where x a is the n-dimensional analysis (posterior) state vector; y o is the p-dimensional observation vector; K is the n × p dimensional Kalman gain; I n is the identical matrix; H is the linearized observation operator, which transforms the information in the model space to the information in the observation space; and x b is the background state vector.In EnSRF, the ensemble mean and perturbed state vectors are updated independently using the following equations: where x a and x b are mean state vectors of the analysis and background, respectively, and x a i and x b t are perturbation state vectors of the analysis and background, respectively.Many inflation techniques (e.g., Wang and Bishop, 2003;Bowler et al., 2008;Whitaker et al., 2008;Li et al., 2009;Anderson, 2009;Miyoshi, 2011;Kang et al., 2012) have been used to maintain proper ensemble spread and to improve the performance of EnKF data assimilation.Although the EnSRF in CarbonTracker does not use the inflation method, Kim et al. (2012) demonstrated that the ensemble spread measured by rank histograms is maintained properly.In Car-bonTracker, the state vector corresponds to the scaling factor, as described in Sect.2.1.K and the reduced Kalman gain, k, are defined as where P b is the background error covariance; R is the observation error covariance, which is predefined at each observation site; and α is a scalar value that varies whenever each observation is used in the analysis process and is calculated as P b H T and HP b H T in Eqs. ( 6) and ( 8) can be calculated as where m is the number of ensembles.
To reduce the sampling error and filter divergence due to the underestimation of background error covariance in EnSRF, the covariance localization method is used (Houtekamer and Mitchell, 2001).Because the physical distance between the scaling factors cannot be defined in Car-bonTracker, correlations between the ensemble of the scaling factor and the ensemble of the model CO 2 concentration are calculated, and a statistical significance test is performed on the correlations.Then, the Kalman gain which has an insignificant statistical value is set to zero.This type of localization is applied to all observation sites except for marine boundary layer (MBL) sites, because the observations at MBL sites are considered to include information on large footprints of flux signals (Peters et al., 2007).

Influence matrix
The influence matrix for EnKF is calculated as in Liu et al. (2009).The projection of Eq. (3) onto the observation space becomes where y a is the analysis value in the observation space and the projection of the state vector x a on the observation space.
The influence matrix S o , representing the sensitivity of the analysis state vector y a to the observation vector y o (i.e., analysis sensitivity to observation) in the observation space, is calculated as follows: where S o is proportional to the analysis error covariance and is inversely proportional to the observation error covariance.By contrast, the analysis sensitivity to background is where y b is the projection of the background on the observation space, and I p is an identity matrix with the size of the number of observations.Consequently, the sum of the analysis sensitivity to observation in Eq. ( 12) and the analysis sensitivity to background in Eq. ( 13) is one.Substituting Eq. ( 10) into Eq.( 12) becomes where HX a is the analysis ensemble perturbation matrix in the observation space, and the ith column of HX a is calculated as where x a i is the ith analysis ensemble member; m is the number of ensembles (i.e., 150); and h(•) is the linear or nonlinear observation operator.More specifically, if the observation errors are not correlated, the diagonal elements of the influence matrix (i.e., self-sensitivity) are calculated as where σ 2 j is the error variance of the j th observation.The cross-sensitivity is the off-diagonal elements of the influence matrix.The influence matrix is calculated for the most recent week of each cycle because the background at the most recent week of each cycle is updated once by observations.
The cumulative impact of the influence matrix for the 5 weeks of lag can be calculated because the background in the lagged window already includes the effect from previous observations.For example, Fig. 2 shows that S b (5) is affected by S o (1), S o (2), S o (3), and S o (4), where the number inside parentheses represents the week of the 5-week assimilation lag.If S o (•) has a value between 0 and 1, S b (1) (i.e., the analysis sensitivity to background at the first week) represents information from a previous analysis cycle and is calculated as Using Eq. ( 13), the cumulative impact of the influence matrix is where S o cum is the cumulative impact of observations during the lagged window.The cumulative impact was defined within the 5-week assimilation lag and calculated when S o (5) exists.
The information content (i.e., degrees of freedom for signal), which is a measure of the information extracted from the observations, is calculated by the trace of the influence  matrix.As suggested by Cardinali et al. (2004), the globally averaged influence of the observations can be calculated by averaging the global self-sensitivities as where p is the total number of observations used in each assimilation cycle.The partial influence of a subset of observations is calculated as where p I represents the number of observations in subset I , which can either be set as specific observation types or as specific vertical and horizontal domains.

Observations
The observations used in this study are surface CO 2 mole fraction data observed at sites distributed around the globe (Table 1, Fig. 3).As in Peters et al. (2007), the surface CO 2 mole fraction data used in this study includes surface air samples collected around the globe and from tall towers.These data were observed by NOAA, the Commonwealth Scientific and Industrial Research Organization (CSIRO), Environment Canada (EC), the National Center for Atmospheric Research (NCAR), and Lawrence Berkeley National Laboratory (LBNL) (Masarie et al., 2011).Observations from three additional sites made by the Japan Meteorological Agency (JMA) are also used in this study.The site categories and model-data mismatch values (i.e., observation error) are shown in Table 2.The model-data mismatch is determined as the innovation χ 2 in Eq. ( 21) becomes 1 at each observation site (Peters et al., 2007).
The innovation χ 2 statistics for each observation site in Asia during the experimental period are presented in Table 3.The model-data mismatch for the TAP site (Tae-ahn Peninsula, South Korea; 36.73 • N, 126.13 • E, 20 m) was changed from the value of 7.5 ppm (parts per million) used in previous studies to 5 ppm because the innovation χ 2 value obtained using 5 ppm was closer to 1.However, TAP was still included in the Difficult category in the statistical analysis in Sect.3. The model-data mismatches of the three JMA sites were set to 3 ppm, as in Zhang et al. (2014).

Experimental framework
The surface carbon flux analysis system used in this study is based on the CarbonTracker 2010 release (CT2010).However, the system employed in this study is different from CT2010 in two aspects: first, the nesting domain of the TM5 model, with 1 • × 1 • horizontal resolution, is centered in Asia rather than in North America, which enables a more detailed analysis of the surface CO 2 fluxes over Asia, as shown in Kim et al. (2014); second, as mentioned in Sect.2.4, three new JMA observation sites are added in this system, which also enhances the analysis of surface CO 2 fluxes over Asia.The global horizontal resolution is 3 • × 2 • , as in CT2010.The experimental period is from 1 January 2000 to 31 December 2009.The number of ensembles is 150, and the scaling factor includes 5 weeks of lag, as in Peters et al. (2007Peters et al. ( , 2010) ) and Kim et al. (2012Kim et al. ( , 2014)).

Average self-sensitivity
Cardinali et al. ( 2004) demonstrated that the self-sensitivity is theoretically between 0 and 1 if observations are not correlated.In 4D-VAR, Cardinali et al. (2004) denoted that analysis error covariance based on the Hessian representation with truncated eigenvector expansion can introduce the selfsensitivities greater than 1 for only a small percentage of the cases.In contrast, the self-sensitivity in EnKF theoretically has a value lesser than 1 (Liu et al., 2009).Nevertheless, the self-sensitivity in this study shows a value greater than 1 because the sparse observations cause insufficient reduction of the background and the observation operator used has nonlinearity in calculating the transport of CO 2 concentrations.In this study, 13 observations from the total of 76 801 observa-tions used for data assimilation present a value greater than 1.This is only 0.02 % of the total number of observations, which implies that the calculated self-sensitivity is generally valid.
Because the spatial coverage and number of observations varies during the experimental period, the average selfsensitivity throughout the experimental period was analyzed to evaluate the overall characteristics of the self-sensitivity at each observation site.As in previous studies (e.g., Peters et al., 2007Peters et al., , 2010;;Kim et al., 2014), the results for the year 2000 were excluded from the data analysis because 2000 is considered as the spin-up period.
Figure 4 shows the average self-sensitivities at each observation site during the experimental period.Different sizes of circles are used in some locations to distinguish sites at the same location or at geographically close locations.In the globe, negative correlations between the spatial density of the observation sites and the self-sensitivities are not as apparent as those reported by Cardinali et al. (2004) and Liu et al. (2009).Negative correlations between the spatial density of the observation sites and the self-sensitivities are apparent in the Northern Hemisphere (NH).In particular, some observation sites in Asia show high sensitivities and a low spatial density of observation sites.The observation sites located in deserts, remote oceans, and high-altitude regions generally exhibit low sensitivities.The average self-sensitivities of each observation site category over the globe, in the NH, tropics, and Southern Hemisphere (SH) are shown in Fig. 5.The average global selfsensitivity is 4.8 % (Fig. 5a), which implies that the analysis extracts 4.8 % of its information from the observations and 95.2 % from the background each assimilation cycle.Although the average self-sensitivity seems low, the background contains the observation information included in the previous analysis cycle, as reported in Cardinali et al. (2004).Moreover, the surface CO 2 fluxes in CarbonTracker are optimized by 5 weeks of observations during the assimilation process.Therefore, the cumulative impact over 5 weeks is 19.1 % much greater than 4.8 %, which only represents the most recent week of each cycle.Although the cumulative impact shows a higher value, the noncumulative impact measured in the most recent week of each cycle is used to discuss the impact of observations because the noncumulative impact has been generally used as the observation impact.
In the globe, the Mixed site category shows the highest average self-sensitivity, and the Difficult site category shows the lowest average self-sensitivity (Fig. 5a), which is related to the model-data mismatch values shown in Table 2.The model-data mismatch for the Mixed site category is relatively low, while that of the Difficult site category is high.Although the MBL site category has the lowest model data mismatch, the MBL site category does not show the highest average self-sensitivity due to the small spread of the analysis CO 2 concentrations at MBL sites.As shown in Eq. ( 16), the model-data mismatch and the spread of the analysis CO 2 concentrations are two factors determining the self-sensitivity.Because MBL sites are located far from strong source and sink regions of CO 2 , the spread of the analysis CO 2 concentrations at these sites is small.The average self-sensitivity in the NH is 5.3 %, which is the highest of all global regions (Fig. 5b).Similar to the global results, the average self-sensitivity is highest for the Mixed site category, while that for the Difficult site category is lowest.The average self-sensitivity in the tropics is 3.6 % (Fig. 5c); the Mixed site category shows the highest values, but they are not significantly higher than those of other categories.In the tropics, there is no Continuous site category.In the SH, the average self-sensitivity is 3.0 %, which is the lowest among the global regions (Fig. 5d); the MBL site category shows the highest values, and there is no Continuous site category.

Time series of self-sensitivity
Figure 6 shows the time series of the average self-sensitivity and number of observations around the globe and in each region.Globally, two apparent characteristics can be identified in the time series (Fig. 6a): first, the average self-sensitivity decreases as the number of observations increases, showing an inversely proportional relationship; second, there is seasonal variability in the average self-sensitivity, showing high values in summer and low values in winter.In the NH, the above two features are more apparent than in other regions (Fig. 6b).Because most of the observation sites are located in the NH, characteristics of the average global selfsensitivity are mostly determined by those in the NH.As the number of observations in the tropics increases in the late 2000s, a slight inversely proportional relationship between the average self-sensitivity and the number of observations appears in the tropics (Fig. 6c).However, the average selfsensitivity in the tropics does not show distinct seasonal variability.In the SH, an inverse relationship between the average self-sensitivity and the number of observations is not clearly shown (Fig. 6d), which is due to the insufficient increase of the number of observations assimilated in the SH compared with the other regions.However, the seasonal variability of the average self-sensitivity appears clearly in the SH.Therefore the inverse relationship is distinctly shown when the increase of the number of observations is enough to cause the decrease of the average self-sensitivity.
Figure 7 shows the average self-sensitivity for each observation site category.Although the MBL site category has the second largest number of observations, the average self-sensitivity shows little variation with respect to time (Fig. 7a).Similarly, the average self-sensitivity for the Continental site category does not vary with respect to time (Fig. 7b).The average self-sensitivity of the Mixed site category shows distinct seasonal variation (Fig. 7c).The Continuous site category displays distinct seasonal variability in the average self-sensitivity and an inversely proportional relationship between the average self-sensitivity and the number of observations (Fig. 7d).Because Continuous sites are mostly located in North America with relatively large numbers (Fig. 3), the impact of a single observation decreases as the number of observations increases.Therefore, the inversely proportional relationships between the average selfsensitivity and the number of observations around the globe (Fig. 6a) and in the NH (Fig. 6b) are mainly attributed to the Continuous site category.The Difficult site category shows a slight inverse relationship between the average selfsensitivity and the number of observations (Fig. 7e).

Effect of the ensemble spread of the model surface CO 2 flux on the average self-sensitivity
Despite the inversely proportional relationship between the self-sensitivity and the number of observations in the NH time series (Fig. 6a), the average self-sensitivity in the NH is higher than in the other regions (Fig. 5).In addition, the average self-sensitivities in the NH and SH are greater in summer than in winter (Fig. 6).The above two characteristics imply that another factor in addition to the number of observations affects the self-sensitivity.As briefly mentioned in Sect.3.2.1,another factor that affects the self-sensitivity is the spread of the analysis CO 2 concentrations.Therefore, the average standard deviations of surface CO 2 fluxes are evaluated in Fig. 8 to investigate the influence of the surface CO 2 flux uncertainties on the seasonal and regional characteristics of the self-sensitivities.The ensemble spread of the background surface CO 2 fluxes reflects the uncertainties, which are projected onto the ensemble spread of the background and analysis CO 2 concentrations (i.e., HX a in Eq. 16) by the transport model.The uncertainties of the background surface CO 2 fluxes over the terrestrial portion of the NH are high in summer months (i.e., June, July, and August: JJA) (Fig. 8a) compared with those in winter months (i.e., December, January, and February: DJF) (Fig. 8b).Due to the high surface CO 2 flux uncertainties in North America (Fig. 8a), the selfsensitivities in North America are not lower than those in the other regions (Fig. 4), regardless of the large number of observations in this region.By contrast, despite the high uncertainties of the surface CO 2 fluxes in the Eurasian boreal region, the self-sensitivities in this region cannot be evaluated owing to the absence of observations.Instead, the selfsensitivities of the observation sites near the Eurasian boreal region show high values (Fig. 4).
The uncertainties of the optimized biosphere and ocean fluxes from 1 week of observations, shown in Fig. 8c and d, are reduced compared with those of the background fluxes, shown in Fig. 8a and b.The magnitude of the reduction of the surface CO 2 flux uncertainties in North America is relatively greater than in other regions, which is consistent with the greater self-sensitivities found in North America.By contrast, when using 5 weeks of observations, the magnitude of the reduction of the surface CO 2 flux uncertainties is greater in Asia than in North America (Fig. 8e, f).
Therefore, the surface CO 2 flux uncertainty is one of the components to determine the magnitude and seasonal variation of the self-sensitivities.

Average information content
Figure 9 shows the average information content at each observation site during the experimental period.This value was calculated by averaging the ratio of information contents for each cycle at each site during the experimental period.Note that this average value is not the amount of information content extracted from observations but rather the relative ratio of each site's information content, normalized by the total influence of all observations.Because the magnitude of the information content at one observation site is proportional to the self-sensitivity and the number of observations, the observation sites with a high average self-sensitivity or a large number of observations show high information content.The number of observations at one station depends on the temporal resolution, missing rate, and total period of observations.Therefore, the observation sites located in North America and Asia generally show high average information content.
To investigate the distribution of the information content in each region, histograms of the average information content around the globe and in the NH, tropics, and SH were generated (Fig. 10).The average information content was 80.2 % in the NH, 13.3 % in the tropics, and 6.5 % in the SH, which implies that the observations in the NH are the most informative.This is due to the large number of observations and high self-sensitivities in the NH.Around the globe, the most informative observation site category is the Continuous category (Fig. 10a).The MBL, Continental, and Mixed site categories show a similar magnitude of information content, but the Difficult site category shows the lowest information content.As in the globe, the Continuous site category is the most informative in the NH (Fig. 10b).In the current Car-bonTracker system, the observation sites of the Continuous site category are mainly located in North America, except for the three JMA sites, which are located in Asia.There- fore, most of the information extracted from the Continuous site category is used to constrain the surface CO 2 fluxes of North America.In the tropics, the MBL and Mixed site categories provide the most information (Fig. 10c).In the SH, the MBL site category provides the most information, but information extracted from the Continental, Mixed, and Difficult site categories is rare (Fig. 10d).In addition, the information from the Continuous site category is zero because there is no Continuous data in the SH.

Time series of information content
Figure 11 shows the time series of the weekly averaged information content for each site category in each region.In the globe, the proportion of the information content of the Continuous site category increases steadily over time (Fig. 11a), which is associated with the steady increase in the number of observations of the Continuous site category over time.In the NH, the increase of the proportion of the information content and the number of observations of the Continuous site category is more readily apparent (Fig. 11b).In the tropics, the MBL and Mixed site categories provide the most information, while the Difficult site category provides limited information from 2004 onward (Fig. 11c) because, after this date, observations from only one Difficult observation site (Bukit Kotobang (BKT), Indonesia, 0.2 • S, 100.32 • E, 864 m) are used in the data assimilation.In the SH, most information is extracted from observations made in the MBL site category (Fig. 11d).Because the number of observations in the SH is much lower than in the other regions, the information content extracted from the observations made in this region is also lower.The information content in summer is greater than in winter in the SH owing to the seasonal variability in self-sensitivity.
To investigate the regional distribution of the information content in the NH, the time series of the information contents in Asia, North America, and Europe are shown in Fig. 12.The information content in North America is greater than that in the other regions because the self-sensitivities are high and the number of observations increases with time in North America.However, the rate of increase in the information content is lower than that of the number of observations because self-sensitivity decreases as the number of observations increases in North America.

Relationship between the information content and the optimized flux
Because CarbonTracker is a system that optimizes the surface CO 2 flux using measurements of surface CO 2 concentrations, the effect of the observations on the optimized surface CO 2 fluxes is important.To investigate the relationship between the information content and the optimized surface CO 2 fluxes, the root mean square differences (RMSDs) between the optimized surface CO 2 fluxes and the background fluxes were calculated (Fig. 13).The surface CO 2 fluxes predicted by the dynamic model in Eq. (2) (i.e., background) show a high RMSD in the NH, with the highest values in North America, followed by Asia (Fig. 13a).In terms of seasonal variation, the impact of the observations in JJA is greater than in DJF (Fig. 13a, b).The large difference between the prior fluxes and the surface CO 2 fluxes predicted by the dynamic model implies that the assimilation of previous observations substantially affects the results.The RMSD of the analyzed surface CO 2 fluxes constrained by 1 week of observations from the background fluxes in JJA is greater in the NH compared with the other regions.The JJA RMSD value for North America (especially in the midcontinental region of the US) is the highest in the NH (Fig. 13c).Although the RMSD of North America in DJF is lower than that in JJA, the RMSD of North America is still greater than that of other regions in DJF (Fig. 13d).The regions with a high average information content are consistent with the regions with a high RMSD (compare Figs. 9 and 13), which implies that the observations from North America provide more information in the first cycle than those from other regions because the observations in North America are characterized by high self-sensitivities and abundant observations.By contrast, the RMSD values obtained in the first cycle in other regions are not as high as those in North America.The RMSD in Asia and other regions increases after 5 weeks of optimization (Fig. 13e, f).In particular, the magnitude of the RMSD in the Eurasian boreal region increases after 5 weeks of optimization (Fig. 13e), which implies that, by the transport of the CO 2 concentrations, the observation information  from remote regions affects the optimization of the surface CO 2 fluxes in the Eurasian boreal region.This remote influence is due to the absence of observations in this region.In addition, the 5-week assimilation lag is effective in optimizing the surface CO 2 flux in this region.Therefore, a longer, smoother window is necessary to optimize the surface CO 2 flux in Asia, where there are sparse observations; this may imply that in the current version of CarbonTracker, the uncertainty of the surface CO 2 flux in Asia may be reduced when using a longer, smoother window than that used for North America.A study on the effect of various assimilation window and ensemble sizes on the estimation of the surface CO 2 flux in Asia is under way to investigate which lag window and ensemble sizes are appropriate for Asia in Carbon-Tracker.

Summary and conclusion
In this study, the effect of observations of CO 2 concentrations on the optimized surface CO 2 fluxes in CarbonTracker was evaluated by calculating the influence matrix for a 10-year period from 2000 to 2009.CarbonTracker is a system used to optimize the surface CO 2 flux using EnKF as a data assimilation algorithm.Most of the calculated influence values were in the range of the theoretical limit, from 0 to 1, which makes it possible to objectively diagnose the performance of the data assimilation system used in this study.
The average global self-sensitivity is 4.8 %, which implies that the impact of the background on the optimized flux is 95.2 %.The value of 4.8 % obtained in CarbonTracker is lower than the 15 % value obtained from NWP models, as reported by Cardinali et al. (2004) and Liu et al. (2009).However, as indicated by Cardinali et al. (2004), the background fluxes predicted by the dynamic model already include information extracted from earlier observations used in previous cycles.Because the state vector used in CarbonTracker includes 5 weeks of lag, the cumulative impact of the observations on the analysis is greater than the impact calculated for a single assimilation cycle.The cumulative impact over 5 weeks is 19.1 %, much greater than 4.8 %, and the large cumulative impact is confirmed by the RMSD of the surface CO 2 fluxes associated with each assimilation process.
The self-sensitivity and spatial coverage of the observation sites are inversely correlated in the NH, whereas these factors are not apparently related in the tropics and SH.The lower correlation between the self-sensitivity and the spatial coverage of the observation sites in the tropics and SH is attributed to either the sparseness of the observation sites or the locations of the observation sites which are not appropriate for detecting the variability of CO 2 concentrations with a high temporal resolution but are appropriate for detecting the global trend of the background CO 2 concentrations.By contrast, the observation sites near the Eurasian boreal region show high self-sensitivity because there are no available observations in the Eurasian boreal region.
The self-sensitivity time series is characterized by seasonal variations.In both hemispheres, the self-sensitivity in summer is greater than in winter, which is clearly evident in the Mixed and Continuous site categories and is associated with the background surface CO 2 flux uncertainties.
The number of observations used in data assimilation increases over time, which causes the average self-sensitivities to decrease.The decreasing trend of the self-sensitivity over time for the Continuous site observations in North America may indicate the limited impact of additional observations in this region.Schuh et al. (2013) reported that additional tower measurements (i.e., observations in the Continuous site category) in the Corn Belt region of the US did not significantly alter the surface CO 2 flux estimates for 2008, which is consistent with the low self-sensitivity detected over North America in the same period.Therefore, under the current CarbonTracker framework, to obtain the beneficial effect of additional observations on the surface CO 2 flux analysis, new observations should be added in regions with a low spatial density of observation sites (e.g., Asia).
The observation sites with a high average self-sensitivity and a small number of observations show low average information content, whereas the observation sites with a low average self-sensitivity and a large number of observations show high average information content because the range of average self-sensitivity is bounded from 0 to 1, but the range of the number of observations is large.Therefore, the Continuous site category shows high average information content.In general, the information extracted from observations is concentrated in the NH, especially in North America.A strong correlation exists between the information content and the optimized surface CO 2 fluxes.The high information content found in regions with a large number of observations implies that much of the information is extracted from observations, and as a result, the fluxes are optimized quickly in a relatively short period.However, the surface CO 2 fluxes in regions without local observation sites (e.g., Siberia) are optimized by remote observations during relatively long assimilation windows with a lag.
The effect of various observations on the analyzed surface CO 2 fluxes can be calculated using the method suggested in this study.More CO 2 observations become available in data assimilation for estimating the surface CO 2 fluxes.These additional sources include CONTRAIL data, which are aircraft observations (Machida et al., 2008); column-averaged CO 2 concentrations retrieved from the Japanese Greenhouse gases Observing SATellite (GOSAT) (Yokota et al., 2009); and data from the Total Carbon Column Observing Network (TC-CON), which are observed by ground-based Fourier transform spectrometers (Wunch et al., 2010).As a next step, the impact of various observations on the optimization of surface CO 2 fluxes can be evaluated using the method suggested in this study.

Figure 1 .
Figure 1.Schematic diagram of the assimilation process employed in CarbonTracker.In each analysis cycle, observations made within 1 week are used to update the state vectors with a 5-week lag.The dashed line indicates how the simple dynamic model uses analysis state vectors from the previous 1 and 2 weeks to produce a new background state vector for the current analysis time.The TM5 model is used as the observation operator to calculate the model CO 2 concentration for each corresponding observation location and time.

Figure 2 .
Figure 2. Schematic diagram of calculating cumulative impact in CarbonTracker.S b (•) indicates the analysis sensitivity to background at each analysis cycle within 5 weeks of lag, where • denotes each week from 1 to 5. S o (•) indicates the analysis sensitivity to observation at each analysis cycle.

Figure 3 .
Figure 3. Observation network of CO 2 concentrations around the globe and the nested domain of the TM5 transport model over Asia (dashed box).Each observation site is assigned to different categories ( : MBL; : Continental; : Mixed land/ocean and mountain; : Continuous; : Difficult).

Figure 5 .
Figure 5. Histograms of the average self-sensitivity for each observation site category from 2001 to 2009 (a) around the globe and in the (b) Northern Hemisphere, (c) tropics, and (d) Southern Hemisphere.N(obs) in the upper right corner represents the number of observations used in data assimilation.

Figure 6 .
Figure 6.Time series of the average self-sensitivity (red solid line with blue dots) and the number of observations (black solid line) with a weekly temporal resolution (a) around the globe and in the (b) Northern Hemisphere, (c) tropics, and (d) Southern Hemisphere from 2001 to 2009.The dashed lines represent the regression lines for the average self-sensitivity (red dashed line) and the number of observations (black dashed line).

Figure 7 .
Figure 7. Time series of the average self-sensitivity (red solid line with blue dots) and the number of observations (black solid line) with a weekly temporal resolution for the (a) MBL, (b) Continental, (c) Mixed, (d) Continuous, and (e) Difficult observation site categories from 2001 to 2009.The dashed lines represent the regression lines for the average self-sensitivity (red dashed line) and the number of observations (black dashed line).

Figure 8 .
Figure 8.Average standard deviation of background biosphere and ocean fluxes in (a) JJA and (b) DJF; the posterior biosphere and ocean fluxes optimized by 1-week observations in (c) JJA and (d) DJF; and the posterior biosphere and ocean fluxes optimized by 5 weeks of observations in (e) JJA and (f) DJF.The units are in grams of carbon per square meter per week (g C m −2 week −1 ).

Figure 9 .
Figure 9. Average normalized information content for each observation site from 2001 to 2009.The overlapping observation sites at the same locations or at close locations are distinguished using different sizes of circles.

Figure 10 .
Figure 10.Histograms of the average information content for each observation site category (a) around the globe and in the (b) Northern Hemisphere, (c) tropics, and (d) Southern Hemisphere from 2001 to 2009.N (obs) in the upper right corner represents the number of observations used in data assimilation.

Figure 11 .
Figure 11.Time series of the average information content for each observation site category (a) around the globe and in the (b) Northern Hemisphere, (c) tropics, and (d) Southern Hemisphere from 2001 to 2009.

Figure 12 .
Figure 12.Times series of the (a) weekly averaged information content and (b) number of observations in Asia (black line), Europe (blue line), and North America (red line) from 2001 to 2009.

Figure 13 .
Figure 13.RMSD between the background flux and prior flux in (a) JJA and (b) DJF; RMSD between the background flux and posterior flux optimized by 1-week observations in (c) JJA and (d) DJF; and RMSD between the background flux and posterior flux optimized by 5 weeks of observations in (e) JJA and (f) DJF.The units are g C m −2 week −1 .

Table 1 .
Information on the observation sites used in this study.MDM represents the model-data mismatch, which is the observation error.

Table 2 .
Observation site categories and corresponding MDM values.

Table 3 .
Information on the observation sites located in Asia, including the number of observations, number of rejected observations, MDM values, innovation χ 2 statistics, and the average bias of the model CO 2 concentrations calculated by optimized fluxes.For the TAP_01D0 site, the numbers in parentheses are values used in previous studies, and the numbers without parentheses are the modified values based on the innovation χ 2 statistics in this study.