On the statistical optimality of CO 2 atmospheric inversions assimilating CO 2 column retrievals

Introduction Conclusions References


Introduction
CO 2 surface fluxes at the Earth's surface can be inferred from accurate surface measurements of CO 2 concentrations, but the sparseness of the current global network still leaves the flux horizontal and temporal gradients, and even their latitudinal distribution, very uncertain (Peylin et al., 2013).This limitation has provided a major incentive to develop the monitoring of CO 2 concentrations from space.First retrievals were obtained from existing instruments measuring either the thermal infrared radiation emitted by the atmosphere (Chédin et al., 2003) or the reflected sunlight in the near-infrared (NIR)/shortwave infrared (SWIR) spectral regions (Buchwitz et al., 2005).The latter technique allows retrieving the column-average dry air-mole fraction of CO 2 (XCO 2 ) while the former is not sensitive to CO 2 in the lower atmosphere, near the CO 2 sources and sinks.Since active (lidar) measurement techniques for XCO 2 from space are still in development (e.g., Ingmann et al., 2009), NIR/SWIR measurements currently offer the best prospect to provide "retrievals of CO 2 of sufficient quality to estimate regional sources and sinks", as phrased by objective A.8.1 of the Global Climate Observing System programme (GCOS, 2010), in the short term.However, they are hampered by uncertain knowledge about scatterers in the atmosphere at the corresponding wavelengths (aerosols and cirrus clouds) with an effect that varies with surface albedo, which is itself uncertain (e.g., Aben et al., 2007).Such interference in the XCO 2 signal seen in the NIR/SWIR measurements is of concern because even sub-ppm systematic errors (corresponding to less than 0.25 % of the signal) can severely flaw the inversion of CO 2 surface fluxes (Chevallier et al., 2007;Miller et al., 2007).This risk motivated dedicated developments of the retrieval Published by Copernicus Publications on behalf of the European Geosciences Union.F. Chevallier: On the statistical optimality of CO 2 atmospheric inversions algorithms in order to de-convolve the spectral signatures of the involved compounds as much as possible (e.g., Reuter et al., 2010;Guerlet et al., 2013b).
The Japanese GOSAT, launched in January 2009, and the USA second Orbiting Carbon Observatory (OCO-2), launched in July 2014, observe the NIR/SWIR radiation with unprecedented spectral resolution in order to specifically address this remote sensing challenge.The GOSAT archive already covers 6 years and can provide good insight into the adequacy of NIR/SWIR retrievals for CO 2 source-sink inversion.In terms of random errors, raw GOSAT retrievals now reach single shot precision better than 2 ppm (1σ ) in fair measurement conditions (e.g., Nguyen et al., 2014).This performance is better than what pre-launch studies suggested: for instance Maksuytov et al. (2008) expected 2.5-10 ppm single shot precision only.Systematic errors are difficult to quantify or else they would be removed.They are likely state-dependent with absolute values varying in time and space about the ppm before any bias correction (Nguyen et al., 2014).They also depend on the retrieval algorithm (e.g., Oshchepkov et al., 2013).As expected, the remaining uncertainty has a profound impact on CO 2 source-sink inversions (Basu et al., 2013;Chevallier et al., 2014), but XCO 2 retrievals have already served as a basis to study the carbon budgets of some regions (Guerlet et al., 2013a;Basu et al., 2014;Reuter et al., 2014).For instance, 25 scientists analysed several XCO 2 retrievals over continental Europe and concluded that the current understanding of the European carbon sink brought by bottom-up inventories had to be revisited (Reuter et al., 2014).
This paper aims at contributing to the debate about the relevance of current GOSAT retrievals for atmospheric inversions.Our starting point is a critical review of the basic principles behind the current processing chains that go in successive steps from GOSAT measured radiance spectra to surface flux estimates (Sect.3).We then focus on the GOSAT retrievals provided by NASA's Atmospheric CO 2 Observations from Space project (ACOS, build 3.4, described in Sect.2) for the period between June 2009 and May 2013.They are of particular interest because they have been processed in a way that prefigures the official OCO-2 retrievals in terms of spectral bands and available simultaneous observations (O'Dell et al., 2012).In Sect.4, we analyse the residuals between the ACOS-GOSAT retrievals and the simulated CO 2 concentration fields of the Monitoring Atmospheric Chemistry and Climate atmospheric inversion product (MACC, version 13r1, also described in Sect.2) that assimilated surface air sample measurements from various networks.Concluding discussion follows in Sect. 5.

ACOS-GOSAT retrievals
GOSAT is a joint venture by the Japan Aerospace Exploration Agency (JAXA), the National Institute for Environmental Studies (NIES) and the Ministry of the Environment (MOE) in Japan.This spacecraft is operated in a sunsynchronous polar orbit that crosses the Equator at about 13:00 local time during daytime and that repeats every 3 days.As described by O'Dell et al. (2012) and Osterman et al. (2013), the ACOS algorithm retrieves XCO 2 from a selection of GOSAT measurements of reflected sunlight made in the same spectral bands than OCO-2.Over land, such measurements are made by pointing the instrument to the Earth on both sides of the satellite track.Given the low reflectivity of water surfaces, ocean measurements are only possible when the instrument is pointed to the sun-glint spot, which is only done within 40 • from the Equator in the summer hemisphere.GOSAT also carries a cloud and aerosol imager that can help filtering difficult scenes out, but unlike other GOSAT retrieval algorithms, ACOS does not use it since OCO-2 does not contain a similar instrument.
Following Boesch et al. (2006) and Connor et al. (2008), the ACOS algorithm relies on optimal estimation (i.e.Bayesian methods) to retrieve the vertical profile of the CO 2 dry air mole fraction together with variables interfering in the measurements: the surface pressure and the surface albedo, some variables describing temperature, water vapour, clouds and aerosols in the atmosphere, and channel offsets for the instrument.The retrieved XCO 2 is simply obtained by integrating the retrieved CO 2 profile.In this Bayesian formulation of the retrieval, prior information about CO 2 is given an artificially small weight in order to maximize the observation contribution to the result: for instance, the standard deviation of the uncertainty assigned to the prior XCO 2 is larger than 10 ppm (O'Dell et al., 2012), i.e. larger than typical variations of XCO 2 at the continental scale (e.g., Keppel-Aleks et al., 2011).We will discuss the impact of this choice later and for simplicity, we will call XCO b 2 and XCO a 2 the prior (background) and the retrieved (analysed) XCO 2 , respectively.XCO a 2 can be compared with model simulations, as will be done here, or with other measurements via the associated CO 2 averaging kernel profiles and prior profiles (e.g., Connor et al., 1994).For nadir viewing, XCO a 2 is representative of a volume that has a circular footprint at the Earth's surface of diameter about 10 km.
Previous comparisons between XCO a 2 and model simulations or reference ground-based XCO 2 measurements from Total Carbon Column Observing Network (TCCON) highlighted some systematic dependency of the error of XCO a 2 as a function of a series of internal variables of the algorithm (Wunch et al., 2011b).This feature reveals some limitations of the algorithm but also allows for correcting them empirically, for instance before they are assimilated in atmospheric inversion systems (Crisp et al., 2012).We will call XCO a,c 2 the bias-corrected retrievals.

MACC CO 2 inversion
Since year 2011, the MACC pre-operational service (www.copernicus-atmosphere.eu)has been delivering a CO 2 inversion product with biannual updates.Release 13r1 primarily describes the CO 2 surface fluxes over more than 3 decades, from 1979 to 2013, at resolution 3.75 • × 1.9 The three databases include both in situ measurements made by automated quasi-continuous analysers and irregular air samples collected in flasks and later analyzed in central facilities.The detailed list of sites is provided in Tables S1  and S2 in the Supplement.
The MACC Bayesian inversion method is formulated in a variational way in order to estimate the CO 2 surface fluxes at the above-described relatively high resolution over the globe (Chevallier et al., 2005(Chevallier et al., , 2010)).For v13r1, the system used a single 35-year inversion window, therefore enforcing physical and statistical consistency in the inverted fluxes.Fluxes and mole fractions are linked in the system by the global atmospheric transport model of the Laboratoire de Météorologie Dynamique (LMDZ, Hourdin et al., 2006) with 39 layers in the vertical and with the same horizontal resolution than the inverted fluxes.LMDZ is nudged to ECMWF-analysed winds for flux inversion.
The MACC inversion product also contains the 4-D CO 2 field that is associated to the inverted surface fluxes through the LMDZ transport model.Simulating the GOSAT retrievals from this field is nearly straight-forward.The only difficulty lies in the interpolation from the LMDZ 39-level vertical grid to the 20-level vertical grid of the retrievals, before the retrieval averaging kernels are applied.Indeed, the model orography at resolution 3.75 • × 1.9 • significantly differs from the high-resolution orography seen by the retrievals.For the interpolation, we assume that CO 2 concentrations vary linearly with the pressure in the vertical.When the model surface pressure is smaller than the retrieved surface pressure, the profile is artificially extended at constant value below the model surface.In the opposite case, model levels below the sounding surface are ignored.We compensate this artificial change of mass in the profile by systematically adjusting the interpolated profile so that its drypressure-weighted mean equals that of the profile before the interpolation.

Theoretical aspects
Like the other retrieval and inversion systems (see, e.g., Oshchepkov et al., 2013;Peylin et al., 2013), ACOS-GOSAT and MACC both follow the Bayesian paradigm in its Gaussian linear form (e.g., Rodgers, 2000) in order to estimate the most likely state, in a statistical sense, of the CO 2 profile and of the CO 2 surface fluxes, respectively.In mathematical terms, given x the vector that gathers the variables to be inferred (either a 1-D CO 2 profile or 2-D + 1-D CO 2 surface fluxes), given x b an a priori value of x (coming from a climatology or from a model), and given y the vector that gathers all relevant observations (either radiances or retrievals), the most likely state of x is written H is a linearized observation operator that links variables x and y (i.e.essentially a radiative transfer model or a transport model).K is the following "Kalman gain" matrix: B and R are the error covariance matrices of x b and y, respectively.
The error covariance matrix of x a is obtained by with I as the identity matrix with appropriate dimension.For simplicity, Eq. ( 1) does not make other variables that are simultaneously inferred appear, like clouds, aerosols or surface variables for the retrievals, or the 3-D state of CO 2 at the start of the assimilation window for the inversion.
The current processing chains that go from radiances to surface fluxes are two-step processes (let aside some attempts to introduce an additional intermediate step in the form of a short-window analysis of the 3-D concentrations; Engelen et al., 2009).We now distinguish the retrieval process and the inversion process by putting breves on all symbols related to the former and hats on all symbols related to the latter.In a first step, the CO 2 profiles and their uncertainty { xa , Ȃ} are retrieved for each sounding { y, Ȓ} separately.The resulting ensemble forms the observations to be simultaneously assimilated { y, R} for the second step.The presence of prior information x b in both steps complicates the transition between the two.Following Connor et al. (1994) and the current practice, we can technically eliminate the influence of xb (but not of its uncertainty) by the following adaptation of Eq. ( 1 The retrieval error variances should consistently be reduced (e.g., Connor et al., 2008, paragraph 37) and is then called R hereafter.
For simplicity, and without loss of generality in our linear framework, let us consider the assimilation of a single sounding { y, Ȓ} using its averaging kernel.By definition, given the changes made to H and R, the gain matrix changes as well and we call K the new one.By applying Eq. ( 1) in this configuration, the analysed surface fluxes can be directly expressed in a concise form: This equation has the desired shape of Eq. ( 1), i.e. the sum of the prior value and of a linear function of model-minusmeasurement misfits.By construction, it does not depend on the retrieval prior xb .However, to follow the optimal estimation framework, we still need to be able to develop the product of the gain matrices consistently with Eq. ( 2), i.e. like (neglecting errors in the observation operators): In practice, we see that In the usual case when H = I, Eqs. ( 5)-( 6) can be made consistent in general provided and (by developing H and using Eq.7) Equation ( 7) simply expresses consistency between the prior error statistics within the information content of the retrievals: the uncertainty of the retrieval prior and of the flux prior should be the same in radiance space.This condition is not achieved by current satellite retrieval algorithms, at least because they artificially maximize the measurement contribution in the retrievals through the use of very large prior error variances (see Sect. 2.1 or Butz et al., 2009;Reuter et al., 2010).However, if enough intermediate variables were saved by the retrieval schemes, it would be possible to reconstruct the retrievals with appropriate prior error variances and correlations.
Equation ( 8) can be satisfied in general if the retrieval averaging kernel K H is close to unity.In practice, the retrieval averaging kernel for profiles is far from unity because current radiance measurements do not provide any vertical resolution for CO 2 .The situation is better if the state vector x is the integrated column (in that case H includes an operator to distribute the column in the vertical).
As a consequence of deviations from Eqs. ( 7)-( 8), the effective gain matrix K K significantly differs from the optimal one for GOSAT, resulting in a wrong balance between prior flux information and measured radiances.Overall, K pulls too much towards the measured radiances and K pulls too much towards the prior.This suboptimality very likely flaws the 4-D information flow from the radiance measurements to the surface flux estimates.In particular, the sub-optimality of K affects the retrieval averaging kernel, that may not peak at the right height.Migliorini (2012) proposed a sophisticated alternative to the averaging kernel assimilation of Connor et al. (1994), where the retrievals are assimilated after a linear transformation of both the retrievals and the observation operator.The transformation is heavier to implement than the above approach because it involves the retrieval signal-to-noise matrix Ȓ−1/2 H B1/2 .It avoids the requirement of Eq. ( 8), but still requires consistent prior error statistics (Eq.7).
The situation complicates even further if we account for the facts that inversion systems assimilate bias-corrected retrievals (thereby implicitly re-introducing xb that had been neutralized by the use of averaging kernels, in the equations), that H and H are imperfect operators, the uncertainty of which should be reported in Ȓ, following Eq.( 5), and that H is usually non-linear.The need to report all observation operator uncertainties in Ȓ means that retrieval configuration should in principle be tailored to the retrieval endapplication, i.e. to the precision of the observation operator that is used in this end-application.For flux inversion, the transport model uncertainty in XCO 2 space is about 0.5 ppm (1σ , Houweling et al., 2010).When optimizing parameters of a flux model rather than for the fluxes themselves (in Carbon Cycle Data Assimilation Systems, Rayner et al., 2005), the uncertainty of the flux model equations has also to be reported in Ȓ: when projected in the space of XCO 2 , they are comparable to transport model uncertainties (Kuppel et al., 2013).

Practical aspects
Given the particular concerns raised about the optimality of XCO 2 retrievals and of their averaging kernels in the previous section, we now focus on one specific retrieval product, ACOS-GOSAT, in order to look for some practical evidence of this sub-optimality.

Mean differences
Figure 1 shows the mean bias-corrected retrievals XCO a,c 2 and the mean corresponding posterior XCO 2 field of the MACC inversion over the June 2009-May 2013 period per 3.75 • × 1.9 • grid cell.All retrievals are used, provided they are found good by the ACOS standard quality control.The data density (Fig. 2b) follows the frequency of favourable retrieval conditions: more sunlight in the Tropics, less cloud over desert areas or over subsidence ocean regions.The longterm mean of the retrieval-minus-model differences (Fig. 2a) is usually about 1 ppm.Interestingly, it appears to be organized spatially.Over land, large positive values (> 0.5 ppm, ACOS-GOSAT being larger) are seen over boreal forests, over most of South America, over grassland/cropland regions in Central Africa and over the West coast of the USA.Negative values occur over most of the other lands, with smaller values (up to ∼ −1 ppm) mostly over South and East Asia.Over the oceans, values are mostly positive north of 30 • N and south of 10 • S, and negative in between.Both errors in ACOS-GOSAT and errors in the model simulations contribute to these differences, which complicates the interpretation of Fig. 2a.For instance, the zonal structure of the differences over the oceans could well be caused by the model, either because of too few surface air-sample sites in the Tropics or because the LMDZ transport model would not represent the inter-hemispheric exchange well enough (Patra et al., 2011).Alternatively, misrepresented clouds around the convergence zones could also induce them in the retrievals.Some of the patterns of Fig. 2a are similar to the surface cover, like the gradient between the Sahel and the African savannahs, or the one between the equatorial Atlantic and the African savannahs, while we expect the true XCO 2 fields to be first driven by large-scale horizontal advection (Keppel-Aleks et al., 2011).The main local spatial gradients in the mean differences are also seen on monthly means despite less data density (Fig. 3).They mostly reflect the retrieval gradients (Fig. 1a), because the model XCO 2 simulation is spatially smoother (Fig. 1b), even though it uses the retrieval averaging kernels (that change from scene to scene as a function, among other factors, of surface conditions) and even though it is sampled like the retrievals (i.e. with a spatially heterogeneous data density, also varying as a function, among other things, of surface conditions).
The jump of the long-term mean difference from the African savannahs to Sahel or equatorial Atlantic (while there is no jump between subtropical Atlantic and Western Sahara for instance) mostly corresponds to data from March (Fig. 3a), at the end of the savannah burning season (e.g.van der Werf et al., 2010).The model shows elevated values (Fig. 1b), but much less than the retrievals (Fig. 1a).If the model was underestimating the intensity of the fire, we would expect the mean difference to take the shape of a plume, i.e. to spread downstream the source region, but this is not the case.This suggests that the retrievals are affected by systematic errors over this region.
The positive differences of Fig. 2a in Eurasia notably follow the boreal forests, while negative values are found over the neighbouring regions of sparse tundra vegetation north of Siberia, or those of grassland/cropland south of them.The same remark applies to North America.The link with boreal forests is less obvious when looking at one isolated year because of the relatively small number of retrievals in these regions (not shown).The misfit pattern in Siberia and in North America contains many values larger than 1 ppm corresponding to relatively large retrieved XCO 2 (Fig. 1a).These large values are all the more surprising that retrievals in these high latitudes are obtained during the growing season and that boreal forests in Eurasia are identified as large carbon sinks by bottom-up inventories (Pan et al., 2011;Dolman et al., 2012).By comparison, we can look at agricultural regions, where the model could miss gradients during crop growth, both because the MACC inversion prior fluxes do not explicitly represent agricultural practices and because the location of the assimilated surface air-sample measurements only provides rough information about crop fluxes: the differences are marginal (−0.1 ppm on average, whether we compute the mean at the global scale or only for latitudes above 40 • N) for retrievals located in crop regions, as identified by the high-resolution land cover map of ESA's Land Cover Climate Change Initiative project (http: //www.esa-landcover-cci.org/).In the Corn Belt, the intensively agricultural region in the Midwest of the USA, dif-ferences are negative, but they are much smaller in absolute value (the differences are larger than −0.4 ppm) than over the boreal forests, and the Corn Belt boundaries do not sharply appear, in particular on its eastern side.The Corn Belt does not particularly appear in monthly means either (e.g., Fig. 3b).These elements suggest that the long-term mean differences over boreal forests come from a retrieval artifact rather than from the MACC inversion product.
From a radiative transfer point of view, boreal forests are largely covered with needle-leaved trees with low albedo in the strong CO 2 spectral band of GOSAT near 2.1 µm (Fig. 4): these low values hamper the XCO 2 retrieval.O'Dell et al. ( 2012) already showed that large positive biases can occur for needle-leaved evergreen forests, with the retrieval exchanging surface albedo for very thin cloud or aerosol.Extreme cases are filtered out by the ACOS-GOSAT quality control, but Fig. 2a suggests that the remaining retrievals over boreal forests, including the region in Siberia East of 100 • E which is dominated by deciduous needle-leaved trees with slightly larger albedos, are still biased.In temperate regions, south of 50 • N, the differences for needle-leaf cover (mainly in Southeast USA and Southeast China) have the opposite sign, but they do not show up distinctly in the difference map like the boreal forests.Tropical forests in South America and in Africa also have low albedo and correspond to negative  differences.They are more identifiable in Fig. 2a, but could be explained by an insufficient carbon sink in the model as well as by a retrieval artifact.

Link to the retrieval increment
We now look at the XCO 2 misfit statistics over land and for the high-gain mode as a function of the size of the retrieval increment to its prior information (XCO a 2 −XCO b 2 ) in Fig. 5.We look at the misfits of the model to XCO a 2 , to XCO a,c 2 and to XCO b 2 , in order to visualize the added value brought by the retrieval process and by the bias correction, successively, on top of the prior estimate.This prior estimation about atmospheric CO 2 has been provided to the retrieval scheme by a data-driven empirical model (Wunch et al., 2011a).In Fig. 5, each bin along the abscissa encompasses a large diversity of times during the 4 years and a large diversity of locations over the globe, over which the model simulation should be overall more accurate (smaller root mean square error) than XCO b 2 , XCO a 2 and even XCO a,c 2 (Chevallier and O'Dell, 2013).Further, we expect the model error to be un- The model values are raw pressure-weighted columns and do not account for the averaging kernels in order not to correlate the two axes (in practice, using the averaging kernels actually does not significantly affect the standard deviations shown).The grey shade shows the distribution of the retrieval density (axis not shown).
correlated with the error of XCO b 2 , XCO a 2 and XCO a,c 2 so that a smaller standard deviation of the misfits (e.g., using XCO a 2 rather than XCO b 2 ) can be interpreted in terms of better precision of the corresponding retrieval quantity.
The mean difference significantly varies with the increment size: starting at 0.7 ppm for the smallest increments it reaches about 2 and −1 ppm, for XCO b 2 and XCO a 2 respectively.As expected, the mean difference is systematically better with XCO a 2 than with XCO b 2 .The bias correction (XCO a,c 2 ) further reduces the mean difference to a small extent.
The standard deviation for XCO b 2 is 1.1 ppm for small increments and smoothly increases to 2 ppm for retrieval increments of size 6 ppm.This trend demonstrates some skill of the retrieval algorithm to characterize the error of XCO b 2 from the GOSAT radiances and to generate a sizeable increment accordingly.By comparison, the model variability for a given increment size over the 4 years ranges between 3 and 4 ppm (1σ ), the prior variability is about 3 ppm and the retrieval variability ranges between 3 and 7 ppm.The standard deviation that uses XCO a 2 is 1.1 ppm for small increments.It smoothly increases to 4 ppm for retrieval increments of size 6 ppm: it is systematically larger than the standard deviation that uses XCO b 2 (despite a smaller mean difference).The standard deviation that uses XCO a,c 2 is also 1.1 ppm for small increments and is also systematically larger than the standard deviation that uses XCO b 2 , but it performs better than XCO a 2 .The worse standard deviation of the misfit of XCO a 2 and XCO a,c 2 to the model compared to XCO b 2 cannot be explained by a common lack of variability in the model and in XCO b 2 (that would correlate the model error with the that of XCO b 2 ), because (i) at the large scale, thinning the retrievals (for instance by keeping only one retrieval for every nine model grid boxes for a given day) only marginally changes the figure (not shown), and (ii) at the sub-grid scale, the variability of XCO 2 is usually well below the ppm (Alkhaled et al., 2008;Corbin et al., 2008), i.e. 1 order of magnitude smaller than the prior-to-retrieval degradation.Some, but not all, of the degradation is purely random and disappears after enough averaging (see Fig. 6 of Kulawik et al., 2015).
The fact that the standard deviation smoothly increases with increment size suggests that the increment size is systematically overestimated.Figure 6 presents a simple test where we halve the retrieval increments, without any bias correction: we call XCO a,r 2 = XCO b 2 + (XCO a 2 − XCO b 2 )/2 the result.The reduction is seen to cancel most of the dependency of the statistics of the observation-minus-model misfits to the increment size: the standard deviation and the mean are then stable around 1.1 and −0.3 ppm, respectively for increments up to 4 ppm without any bias correction.The standard deviation is systematically better than for XCO b 2 , which shows added value brought by the radiance measurements, in contrast to the previous results.This result also empirically confirms that the initial increments are in the right direction but are too large.
For the medium-gain retrievals (Fig. 7) and for the ocean glint retrievals (Fig. 8), the standard deviation of the misfits using XCO a,c 2 is not significantly larger than that using XCO b 2 .

Discussion and conclusions
Small uncertainties in aerosols, cirrus cloud or surface albedo are known to heavily affect the quality of the XCO 2 satellite retrievals and to propagate into biases in the fluxes inverted from them, even when the parasite signal in XCO 2 is sub-ppm.This weakness lead the science team of NASA's OCO, a satellite that failed at launch in February 2009, to conclude that the space-based NIR/SWIR measurements of XCO 2 could not be used alone for CO 2 source-sink inversions and that they had to be combined with observations from a reasonable number of surface stations (Miller et al., 2007).However, so much improvement has been obtained in these issues by various institutes during the last few years, that it is sometimes thought that the space-borne XCO 2 retrievals have reached sufficient quality for source-sink inversion.The present paper discusses where we stand in this respect both from general theoretical considerations and from one of the most advanced GOSAT retrieval products.
From the theory, we have shown that a two-step approach to infer the surface fluxes from the GOSAT measured radiances, with XCO 2 retrievals as an intermediate product, can-not be optimal.This suboptimality corrupts the 4-D information flow from the radiance measurements to the surface flux estimates.It is amplified by the current retrieval strategy where prior errors are much larger (by an order of magnitude in terms of variances) than the performance of prior CO 2 simulations used in atmospheric inversions.Indeed, the use of averaging kernels makes atmospheric inversion insensitive to the choice of a particular retrieval prior CO 2 profile (Connor et al., 1994) if retrievals are assimilated without any bias correction, but it does not make the retrieval prior error statistics disappear from the inverse modelling equations.The current strategy likely generates retrieval averaging kernels that are inappropriate for atmospheric inversions in their default configurations, with a wrong vertical distribution and an excessive weight towards the measured radiances.Paradoxically, empirical bias correction of the retrievals (e.g., Wunch et al., 2011b) also contributes to the degradation of the 4-D information flow, because it carries the imprint of the retrieval prior and of the retrieval prior error statistics.Direct assimilation of the measured radiances would solve the inconsistency, but would increase the computational burden of atmospheric inversions by several orders of magnitude.Alternatively, we could adapt the inversion systems to the current retrieval configuration by using minimal prior information about the surface fluxes, typically a flat prior flux field, but the result would still over-fit the measured radiances due to the absence of other (compensating) information.
We have compared the ACOS-GOSAT retrievals with a transport model simulation constrained by surface air-sample measurements in order to find some evidence of retrieval sub- optimality.Flaws in this transport model and in these inverted surface fluxes necessarily flaw the simulation in many places over the globe and at various times of the year.We therefore carefully selected some of the relatively large discontinuities in the XCO 2 fields that the simulation unlikely generated.We found some evidence of retrieval systematic errors over the dark surfaces of the high-latitude lands and over African sa-vannahs.We note that the mean differences over the African savannahs during the burning season could be explained by retrieval averaging kernels not peaking low enough in the atmosphere further to the assignment of inappropriate prior error correlations.Biomass burning aerosols that would not be well identified by the retrieval scheme could also play a role.We also found some evidence that the high-gain retrievals over land systematically over-fit the measured radiances, as a consequence of the prior uncertainty overestimation and of an underestimation of the observation uncertainty (as seen by the underlying radiative transfer model).This over-fit is partially compensated by the bias correction.An empirical test indicates that halving the retrieval increments without any posterior bias correction actually cancels the dependency of the statistics of the observation-minus-model misfits to the increment size and makes the standard deviation systematically better than for the retrieval prior XCO b 2 , which shows added value brought by the radiance measurements, in contrast to the previous results.We argue here that the optimalestimation retrieval process and, consequently, its posterior bias correction need retuning.
Given the diversity of existing satellite retrieval algorithms, our conclusions cannot be easily extrapolated to other GOSAT retrieval products and even less to XCO 2 retrievals from other instruments, but we note that such mistuning like the one highlighted here is common practice, both because the errors of the retrieval forward model are difficult to characterize and because satellite retrievals are usually explicitly designed to maximize the observation contribution, at the risk of over-fitting radiance and forward model noise.A primary consequence of this mistuning is the usual underestimation of retrieval errors: for instance, O'Dell et al. ( 2012) recommended inflating this error by a 2-fold factor for ACOS-GOSAT b2.8.More importantly, our results show that the mistuning generates excessive (unphysical) space-time variations of the retrievals up to ∼ 1 %.This noise level would not matter for short-lived species, but for CO 2 it is enough to significantly degrade the assimilation of the retrievals for flux inversion and may explain some of the inconsistency seen between GOSAT-based top-down results and bottom-up results for CO 2 (Chevallier et al., 2014;Reuter et al., 2014).Therefore, with the current mistuning, we reiterate previous recommendations to take GOSAT-based CO 2 inversion results particularly cautiously.But we also suggest that this situation may dramatically improve by simply retuning the retrieval schemes.Ultimately, internal statistical consistency of the retrievals and of the inversion schemes is needed to establish the credibility of their end product.
The Supplement related to this article is available online at doi:10.5194/acp-15-11133-2015-supplement.
) in the second step: we assimilate y = xa − (I − K H) xb = K y F. Chevallier: On the statistical optimality of CO 2 atmospheric inversions rather than y and change the observation operator from H to H = K H H. K H is called the retrieval averaging kernel.

Figure 1 .
Figure 1.(a) Mean ACOS-GOSAT bias-corrected retrievals in the model grid over 4 years (June 2009-May 2013).(b) Corresponding mean CO 2 4-D field associated to the MACC CO 2 inversion (computed using the averaging kernels and the prior profiles of the retrievals).

Figure 2 .
Figure 2. (a) Mean difference between the maps of Fig. 1 (retrievals minus model).(b) Corresponding number of retrievals.

Figure 3 .
Figure 3. Same as Fig. 2a (retrievals minus model), but focussing on the months of March and June.

Figure 4 .
Figure 4. Mean surface albedo retrieved in the strong CO 2 band by ACOS-GOSAT in the model grid over 4 years (June 2009-May 2013).The blue scale focuses on the values less than 0.1.

Figure 5 .
Figure5.Mean and standard deviation of the retrieval-minus-model misfits between June 2009 and May 2013 for the high-gain mode retrievals over land as a function of the retrieval increment size.The statistics are also shown for the prior-minus-model misfit.The model values are raw pressure-weighted columns and do not account for the averaging kernels in order not to correlate the two axes (in practice, using the averaging kernels actually does not significantly affect the standard deviations shown).The grey shade shows the distribution of the retrieval density (axis not shown).

Figure 6 .
Figure6.Same as Fig.5(high-gain mode over the lands) but we reduce the retrieval increment size by 50 % without any bias correction (we call XCO a,r 2 the result).The abscissa shows the unperturbed increment.