Diagnosing spatial error structures in CO2 mole fractions and XCO2 column mole fractions from atmospheric transport

Atmospheric inversions inform us about the magnitude and variations of greenhouse gas (GHG) sources and sinks from global to local scales. Deployment of observing systems such as spaceborne sensors and ground-based instruments distributed around the globe has started to offer an unprecedented amount of information to estimate surface exchanges of GHG at finer spatial and temporal scales. However, all inversion methods still rely on imperfect atmospheric transport models whose error structures directly affect the inverse estimates of GHG fluxes. The impact of spatial error structures on the retrieved fluxes increase concurrently with the density of the available measurements. In this study, we diagnose the spatial structures due to transport model errors affecting modeled in situ carbon dioxide (CO2) mole fractions and total-column dry air mole fractions of CO2 (XCO2). We implement a cost-effective filtering technique recently developed in the meteorological data assimilation community to describe spatial error structures using a small-size ensemble. This technique can enable ensemblebased error analysis for multiyear inversions of sources and sinks. The removal of noisy structures due to sampling errors in our small-size ensembles is evaluated by comparison with larger-size ensembles. A second filtering approach for error covariances is proposed (Wiener filter), producing similar results over the 1-month simulation period compared to a Schur filter. Based on a comparison to a reference 25member calibrated ensemble, we demonstrate that error variances and spatial error correlation structures are recoverable from small-size ensembles of about 8 to 10 members, improving the representation of transport errors in mesoscale inversions of CO2 fluxes. Moreover, error variances of in situ near-surface and free-tropospheric CO2 mole fractions differ significantly from total-column XCO2 error variances. We conclude that error variances for remote-sensing observations need to be quantified independently of in situ CO2 mole fractions due to the complexity of spatial error structures at different altitudes. However, we show the potential use of meteorological error structures such as the mean horizontal wind speed, directly available from ensemble prediction systems, to approximate spatial error correlations of in situ CO2 mole fractions, with similarities in seasonal variations and characteristic error length scales.


Introduction
Atmospheric carbon dioxide (CO 2 ) mole fraction has been increasing steadily since the first industrial revolution, primarily due to fossil fuel emissions and land use change (IPCC, 2015).Recent estimates of sources and sinks at the global scale suggest a coincidental reinforcement of natural sinks balancing the continuously increasing anthropogenic emissions (Le Quéré et al., 2016;Keenan et al., 2016).Therefore, the fraction of fossil fuel CO 2 remaining in the atmo-Published by Copernicus Publications on behalf of the European Geosciences Union.
T. Lauvaux et al.: CO 2 and XCO 2 error structures sphere 1 was kept constant at 2 ppm yr −1 , excluding shorttime anomalies such as El Niño events (Feely et al., 1999;Kim et al., 2016).In the objective of characterizing the natural sink mechanisms, atmospheric inversion methods have provided some evidence of a fertilization effect possibly increasing the effective absorption by plants of the exceeding CO 2 in the atmosphere (Schimel et al., 2015).But large uncertainties still affect atmospheric inversions of CO 2 fluxes and limit the interpretation of continental-scale CO 2 budgets (Peylin et al., 2013).Therefore, more robustness in these findings first requires better characterization of the error affecting inverse estimates (Baker et al., 2007;Stephens et al., 2007;Díaz-Isaac et al., 2014).
Atmospheric inversions of greenhouse gases (GHG) are now widely used to infer surface fluxes from natural (e.g., Enting, 2002;Gurney et al., 2002;Lauvaux et al., 2012;Peylin et al., 2013) and anthropogenic (e.g., McKain et al., 2015;Lauvaux et al., 2016) sources at global, regional and local scales.However, key information in carbon cycle science lies in multiyear timescales, therefore confining the development of inverse methodologies to cost-effective approaches (e.g., Bruhwiler et al., 2005).Based on similar methodologies to those of meteorology or geophysics, atmospheric inversions have used primarily fast approaches to produce multi-decadal fluxes such as variational approaches (Baker et al., 2006;Chevallier et al., 2010), avoiding large ensembles of simulations based on Monte Carlo formulation (Evensen, 1994).In parallel, assumptions made in prior flux errors and transport errors impact the inverse solution in similar ways (Engelen et al., 2002).Concerning the prior flux errors, few studies have proposed to constrain the spatial and temporal structures more rigorously (Wu et al., 2013;Ganesan et al., 2014), some of them based on terrestrial biogeochemical models and eddy-covariance flux measurements to estimate the spatial structures in the prior flux errors of CO 2 (e.g., Chevallier et al., 2006;Hilton et al., 2013).For transport errors, correlations remained small at the global scale, primarily due to sparse atmospheric GHG observation networks.However, denser tower networks (Andrews et al., 2014) and recent satellite missions have significantly increased the sampling density (e.g., the Greenhouse gases Observing SATellite (GOSAT; Yokota et al., 2009;Houweling et al., 2015) and the Orbital Carbon Observatory (OCO-2) missions; Crisp et al., 2004) requiring the characterization of their correlated errors in inversion systems.
The increased density in existing tower networks and the availability of fine-scale satellite retrievals raised concerns about spatial and temporal structures in transport model errors (Rayner and O'Brien, 2001;Lauvaux et al., 2009;Miller et al., 2015).The proximity of the measurements (e.g., a couple of kilometers between OCO-2 retrievals) means that spatial correlations in model errors are significant and can 1 http://www.esrl.noaa.gov/gmd/ccgg/trends/,last access: 29 August 2019.
no longer be ignored (Chevallier, 2007).This issue becomes critical to greenhouse gas inversion problems when applied to urban scales (Lauvaux et al., 2016) but remains poorly studied to date.Recent deployment of path-integrated instruments also increased the complexity of the problem from the ground when trying to invert for emissions from single facilities such as a large dairy (Viatte et al., 2017).
Ensemble approaches are useful to describe flowdependent errors (e.g., Anderson, 2001;Evensen, 2003) but remain computationally expensive due to the number of model simulations required to correctly represent model error statistics (Houtekamer and Mitchell, 1998).In general, a small number of members leads to incomplete descriptions of error structures which require the use of localization to avoid spurious correlations due to sampling errors (Houtekamer and Mitchell, 2001;Raynaud and Pannekoucke, 2013).But small-size ensembles are efficient computationally and able to provide information on flow-dependent error structures compared to prescribed static error structures (Brousseau et al., 2012).With the development of new perturbation methods, the number of members may decrease significantly thanks to optimal perturbations combining physics, parameter sensitivity and energy-based perturbations (Jankov et al., 2017).In any case, small-size ensembles remain affected by sampling noise, which has to be removed before extracting spatial structures, either by modeling (Pannekoucke et al., 2008;Lauvaux et al., 2009) or by filtering unphysical structures (Hamill et al., 2001;Houtekamer and Mitchell, 2001).Here, we apply a newly developed approach based on local filtering and a localization technique (Ménétrier et al., 2015a, b).There are only a few approaches for the optimal localization of covariance matrices in the field of data assimilation for the geosciences (Lei and Anderson, 2014;Flowerdew, 2015;De La Chevrotière and Harlim, 2017).To our knowledge, the method is the only one so far which is both (i) mathematically consistent and (ii) a priori, i.e., not based on learning on past or present datasets.Besides, in spite of its sophistication, the filtering approach is rather straightforward to implement.
In this study, we apply the filter of variances and the covariance localization developed in Ménétrier et al. (2015a) and propose an additional solution to using the optimality condition, both for Gaussian and non-Gaussian error statistics cases.The filter is applied to several calibrated ensembles of different sizes to evaluate the impact of our filter on small (5 members) to larger (25 members) to the full ensemble (45 members) based on multi-physics simulations (Díaz-Isaac et al., 2018a).Results are presented for in situ CO 2 mole fractions, XCO 2 dry air mole fractions, mean horizontal winds and planetary boundary layer height.We discuss the results in Sect. 4.

Calibration of WRF-CO 2 ensembles
We generate an ensemble using the Weather Research and Forecasting (WRF) model version 3.5.1 (Skamarock et al., 2008), including the chemistry module modified in this study for CO 2 (Lauvaux et al., 2012).The ensemble consists of 45 members that were generated by varying the different physics parameterization and meteorological data.The land surface models, surface layers, planetary boundary layer schemes, cumulus schemes, microphysics schemes and meteorological data (i.e., initial and boundary conditions) are alternated in the ensemble (Díaz-Isaac et al., 2018b).All the simulations use the same radiation schemes, both long-and shortwave.All simulations were run using the one-way nesting method, with two nested domains.The coarse domain uses a horizontal grid spacing of 30 km and covers most of the United States and part of Canada.The inner domain uses a 10 km grid spacing, is centered in Iowa and covers the Midwest region of the United States.The vertical resolution of the model is described with 59 vertical levels, with 40 of them within the first 2 km of the atmosphere.This work focuses on the WRF simulation with the higher resolution; therefore only the 10 km domain will be analyzed.Simulations were performed from 27 June to 21 July 2008, with a 10 d spin-up for initial conditions.The CO 2 fluxes for summer 2008 were obtained from NOAA Global Monitoring Division's CarbonTracker version 2009 (CT2009) data assimilation system (Peters et al., 2007; with updates documented at http://carbontracker.noaa.gov,last access: 30 January 2019).The different fluxes that CT2009 propagates into the models are fossil fuel burning, terrestrial biosphere exchange and the exchange with oceans.The CO 2 lateral boundary conditions were obtained from CT2009 mole fractions.Only the meteorological transport fields vary between each model configuration or ensemble member.
The ensemble was calibrated over the Midwest US using the available meteorological observations and the 10 km model simulation as described in Díaz-Isaac et al. (2018a).The measurements used included balloon soundings collected over the Midwest region (http://weather.uwyo.edu/upperair/sounding.html, last access: 20 April 2017) for 14 rawinsonde stations.The ensemble was calibrated for three different meteorological variables: wind speed, wind direction and planetary boundary layer height (PBLH) in the late afternoon data (i.e., 00:00 UTC) from the different rawinsondes.Daytime data were used to represent well-mixed conditions, at the selected time when CO 2 mole fractions are assimilated into atmospheric inversions to avoid stable conditions near the surface.The calibration algorithm is described in Garaud and Mallet (2011), selecting optimal ensembles of different sizes using simulated annealing and genetic algorithm techniques.The metric used in Díaz-Isaac et al. (2018a) is the flatness of the rank histogram, which is a mea-sure of the ensemble dispersion.By eliminating members with redundant information, smaller ensembles were able to better match the variability in the observations.We refer to Díaz-Isaac et al. (2018a) for a full description of the calibration process and the final selection of optimal ensembles.
Here, we will compare the different ensembles generated in Díaz-Isaac et al. (2018a) from 5 to 8 to 10 members.An additional ensemble was created for our study with a larger number of members in order to address the potential lack of representativeness of model errors with small-size ensembles.Therefore, we generated a 25-member ensemble and applied the same calibration process.This ensemble has not been documented in Díaz-Isaac et al. (2018a) but is described here in Appendix A. We compare the results of the filtering for small sizes to the 25-member calibrated ensemble instead of the original 45-member ensemble that was not calibrated.et al. (2015a, b) have proposed a new theory for the optimal filtering of sample variances and covariances.These are defined by the following empirical second-order moment statistics.Assume we have an ensemble of states x k ∈ R n for k = 1, . .., N of mean x, from which to infer the statistics.Define the associated anomalies δx k = x k − x, also called perturbations.Then, the sample covariance matrix is

Ménétrier
which is an unbiased estimator of the true covariance matrix B ; i.e., E B = B where E is the expectation operator over the reference distribution from which the x k are sampled.In the following, we denote by B ij the entries of this covariance matrix.
Filtering of variances and covariances is made necessary because of the finite size of the sample ensembles, which can generate significant sampling errors.The sampling errors can be filtered out by applying a linear filter to the variances and covariances.The most general linear filter is of the form where B is the filtered error covariance matrix.Typical examples are the application of a convolution to the vector of variances to smooth them out or the application of a Schur product with a nondegenerate, short-range correlation matrix to the sample covariance matrix.The linear filter often requires parameters, a correlation length typically, that must be tuned for the filter to be optimal.The theory proposed in Ménétrier et al. (2015a) to achieve optimality of the filter is based on three key ingredients.
1.The first one consists in requiring that the residual sampling error be minimal.Assume that we have an estimator x of some statistics of a reference distribution with true statistics x , obtained from sampling from this distribution.We regularize x with a linear filter F (a matrix here) in order to minimize the sampling error: x = F x.
A typical criterion to minimize is The variation of this criterion with respect to a variation δF of F is δL(F) = −2Tr δF T E (x − F x) x T , which implies, that, at the minimum, we have an optimality condition in the form of an orthogonality of random vectors: This is a linear equation in F, whose solution is If F is a Schur filter, i.e., x = f • x, given by the Schur or Hadamart product (which is a subcase of the above problem) -hence F is now a vector f -then the solution has the form where the division of vectors is component-wise.
Eqs. ( 5) and ( 6) can be applied to the filtering of B, storing the entries B ij in x.Hence, they provide optimality conditions for linear filtering of B. They are known in the signal scientific community as Wiener filters.We note that F , or f , still depends on the unknown true statistics x 2. The second ingredient is to exploit the structure relationships that bind the moments of sample estimators of the reference distribution.For any reference distribution (referred to as the non-Gaussian case in the following), the second-order moments of the sample covariances B ij are functions of the second-order and fourth-order moments of the reference distribution.If, in addition, one assumes this reference distribution to be a Gaussian one, then the covariances of the sample covariances B ij are only functions of the second-order moments of the reference distribution.This will be naturally referred to later as the Gaussian case.For instance, in the Gaussian case, the relation has the well-known form: 3. In spite of the above key ideas, some local spatial averaging will additionally be needed to obtain robust estimators for the filters and their correlation lengths.Such averaging can be justified by ergodic assumptions regarding the statistics of the errors.
In the following, we make the difference between the cases where the true distribution is assumed to be Gaussian or not, since we saw it has an impact on the structure function such as Eq. ( 7) and could yield distinct optimal filtering results.

Gaussian case
It turns out that it is more convenient to filter the variance and the correlation independently, in particular using a general linear filter for the variances and a Schur filter for the correlation (Ménétrier et al., 2015a).
We denote v the vector of variances; i.e., v i ≡ B ii .Combining the optimality criterion (Eq.4) with the structure relationship (Eq.7), without reference to any explicit filter at this stage, the filtered and the sampled variances are related by (see Eq. 50 of Ménétrier et al., 2015a): with C G i = 0 the optimality criterion in Gaussian conditions.If we filter the covariances with a Schur filter, i.e., B = F • B, then one obtains (see Eq. 64 of Ménétrier et al., 2015a):

Non-Gaussian case
In the non-Gaussian case, the structure relationship incorporates a term that depends on the fourth-order moments ij kl of the true error statistics.Using these relationships and the optimality criterion (Eq.4), without reference at this stage to any particular filter, one obtains (see Eq. 48 of Ménétrier et al., 2015a): Again, but in the non-Gaussian case, if we regularize the covariances with a Schur filter, i.e., B = F • B, then ones obtains the optimal filter (see Eq. 62 of Ménétrier et al., 2015a): For the localization of the covariances and hence the correlations, Eqs. ( 9) and ( 11) provide the optimal Schur localization.For the filtering of the variances, one uses Eqs. ( 8) and ( 10) but still needs to specify a filter, such as a convolution with a short-range kernel of correlation length l.Then Eqs. ( 8) and ( 10) are implicit equations for l, which can be solved iteratively using, for instance, a fixed-point method.
We note that all these formulae still depend on some statistical expectation, such as E[ B 2 ij ].To make those formulae practical, we identify these expectations as local, if not global, spatial averages.

Wiener filter
There is an alternative to using the optimality condition (Eq.4) in conjunction with the structure relationships of the moments of B. We propose to solely use the optimality condition (Eq.4) and upon choosing the generic form of the filter use the optimal filters given by Eqs. ( 5) or ( 6).We will call them Wiener filters in the following.
For instance, assuming Schur regularization, we obtain the following Wiener filter: Using the sample estimator we obtain Both Wiener and Schur filters will be applied to subdomains defined around instrumented tower locations measuring continuously CO 2 mole fractions in the US Upper Midwest (Miles et al., 2012).The sub-domains cover an area of 400 km × 400 km around each site (here seven sites across the domain), which also correspond to the spatial extent of the local spatial averaging (third item in Sect.2.2).Due to computational limitations, we performed additional experiments with larger sub-domains for our 25-member ensemble, as shown in Sect.3.5.

Meteorology and CO 2 error structures
We want to explore the relationships between the different variables especially in situ mole fractions of CO 2 , totalcolumn XCO 2 and PBLH.We will compare both the error variances and covariances to identify possible links between error structures in PBLH and CO 2 /XCO 2 mole fractions.We will explore the spatial correlation lengths for CO 2 mole fractions, mean horizontal wind (zonal and latitudinal components) and PBLH to quantify and possibly utilize error structures in meteorological fields to generate CO 2 and XCO 2 error structures.Most ensemble prediction systems (EPSs) provide spatial error correlations for meteorological variables which could be used to construct error covariances for CO 2 and XCO 2 .Error covariances of CO 2 mole fractions depend on the CO 2 fluxes, but error structures in the atmospheric models should remain independent of the CO 2 flux distribution.Díaz-Isaac et al. (2018a) show that first-order discrepancies in PBLH seem related to large errors in CO 2 .Here, we investigate further the links between errors across different variables.We present the results in Sect.3.4 for the variances and in Sect.4.3 for the error correlations.

Sampling noise due to ensemble size
We computed the sample variances over the domain from the 5-, 8-, 10-and 25-member ensembles as shown in Fig. 1.The increase in variances and the presence of additional finescale structures are visible in small-size ensembles (5 to 10 members) compared to the 25-member ensemble.Fine-scale structures reflect the sampling noise in the small-size ensembles, reaching a maximum in the five-member ensembles (see Fig. 1d).These spurious structures appear with small-size ensembles and are to be filtered later.The range of values for error variances increases for small-size ensembles, independently of the calibration process.In addition, the variances in calibrated ensembles with more members are smaller because the inflation of the variance is a direct consequence of removing members.Hence, the calibration process better inflates the deviation of members from the mean for small ensembles.Díaz-Isaac et al. (2018a) have shown that the calibration process yields smaller-size ensembles to better represent model errors.Here, the variance from the 25-member ensemble remains harder to inflate by the calibration process and also less affected by sampling noise, which is likely to be less representative of the actual transport model errors.

Filtering of sampling noise
We show here the values of the length scales in our filter resulting from the optimality criteria, applying both Gaussian (see Eq. 8) and non-Gaussian (see Eq. 10) filters to the raw variances.We implemented the dichotomy algorithm proposed in Ménétrier et al. (2015b) to obtain the optimal length scales of the filter, dividing (or multiplying) the length scale by a factor of 2 until convergence.The algorithm solves for the optimal length scale of the filter by scanning the space of solutions iteratively (applying a multiplicative factor at each time step to minimize the cost function).For all our cases, we defined the upper bound of the diagnosed length scale at 750 km, to represent about half the size of our simulation domain (square of 1600 km wide).This large value means that the extent of noisy structures would encompass the entire domain and therefore would not be recoverable.Here, the sampling noise is characterized by length scales of sizes ranging from a few kilometers to several hundred kilometers.In practice, the algorithm always converges but length scale might be larger than the domain size, meaning that all spatial structures in the variances are regarded as noise.This situation happens for two main reasons: spatial structures in the noise are similar to spatial gradients in the true variances, or  the noise is larger than the true variances.Hence, the length scale falls beyond the limit of our simulation domain.In the method described by Ménétrier et al. (2015b), the ergodic assumption is necessary to diagnose robust estimators of the filter (see Sect. 2.2).
For CO 2 mole fractions (see Fig. 2), the algorithm for the calibrated 25-member ensemble systematically converges to small length scales (< 50 km), indicating that noise structures are very small in our optimal ensemble.When using the Gaussian filter, the algorithm systematically converges below our 750 km threshold for all cases except for 30 % of the days with the smallest ensemble (five members).In the non-Gaussian case, the filter converges to larger length scales beyond our threshold with the 5-and 10-member ensembles for less than 30 % of the days.Typically, length scales beyond our 750 km threshold are temporally coherent over periods of several days suggesting weather-related structures possibly inherited from synoptic-scale systems.These periods might be caused by high sampling noise compared to the true variances or by similar scales in spatial structures for both noise and true variances.Overall, the non-Gaussian filter shows a lower rate of convergence below 750 km compared to the Gaussian filter for CO 2 mole fractions.
For XCO 2 column mole fractions (see Fig. 3), the optimal length scales are larger and the non-Gaussian filter converges beyond our threshold more frequently (about 50 % of the days for eight members or less).Even with the optimal 25-member ensembles, error structures of about 50 to 200 km are filtered out, significantly larger than for the CO 2 mole fractions.We discuss in Sect.4.2 the possible physical reasons behind these larger length scales, possibly due to large-scale structures in the free troposphere or to the complexity in noise structures as XCO 2 data integrate noises from different altitudes.Figure 4 shows the results of the PBLH for which both filters converge beyond our threshold for half of the days in the Gaussian case.However, the filter converges more often below our threshold with the non-Gaussian filter applied to 10-member ensembles.Variance noise for PBLH present skewed distributions (not shown here) requiring the use of a non-Gaussian filter.We conclude here that 8-and 10-member ensembles are the minimum sizes with which sub-threshold convergence can be obtained on most days for most variables.Five-member ensembles will still be studied later on for covariances, as the localization of covariances does not depend on the filtered variances, but the low rate of convergence below 750 km might limit the use of the filtered variances.

Variance filtering and ensemble sizes
The filtered variances shown in Fig. 5, here with the Gaussian filter, for the different ensemble sizes are in better agreement with the variances of the 25-member ensemble both in term of spatial patterns and magnitudes among the different ensembles.The filter successfully removed noisy structures, therefore decreasing the dependence on the number of members used in each case.Despite length scales beyond our threshold for 30 % of the days with the five-member ensemble, filtered variances at the monthly timescale show similar structures to 8-or 10-member ensembles, with nearly all of the noisy structures being removed by the filter.Compared to earlier results, the ensemble size does not seem to fundamentally limit the capacity of the filter to remove the  noise, despite days with convergence beyond 750 km.The variance magnitude remains slightly larger for 10 members or less, with a relative overestimation of about 15 %.Our 5member ensemble provided the best match with only 10 % higher than the 25-member filtered variances.The averaging over a whole month compensates for days with convergence beyond 750 km, producing reasonable estimates of the optimal variance even with five members.This result suggests that climatological error variances from small-size ensembles can be a good first approximation of the true variance when filtered correctly over most days.

Error variances in CO 2 , XCO 2 and PBLH
We show in Fig. 6 the spatial distribution of error variances from the 25-member calibrated ensemble for in situ CO 2 mole fractions in the planetary boundary layer (PBL; 100 m a.g.l.), in situ CO 2 mole fractions in the free troposphere (about 5 km a.g.l.), total column of XCO 2 dry air mole fractions and PBLH (in m a.g.l.).The four variables display very distinct spatial patterns.XCO 2 variance spatial patterns (see Fig. 6c) exhibit distinct maximum values located in the eastern and southeastern part of the domain, whereas high CO 2 variances are observed in the northeastern part of the domain for free-tropospheric CO 2 (see Fig. 6b) or centrally located for CO 2 variances in the PBL (see Fig. 6a).Finally, PBLH variances (see Fig. 6d) show no indication of direct relationship between large errors in the western part of the domain and the other three CO 2 variables.We conclude here that no direct relationship can be utilized to construct CO 2 variances based on PBLH.Similarly, maximum variances  among the three CO 2 variables are also significantly different in distribution and magnitudes.

Covariance localization: Schur and Wiener filters
Error covariances in CO 2 mole fractions scale with the magnitude of the surface CO 2 fluxes.They are therefore difficult to interpret.Instead, we present here the error correlations to highlight the spatial structures inherited from the transport models, independent of the magnitude of the underlying CO 2 surface fluxes.We show in Fig. 7 the hourly correlation structures from the full 45-member ensemble (ac) and our 25-member calibrated ensemble (d-f) at one of the instrumented towers in the US Midwest, i.e., Centerville, Iowa.We applied both Schur (see Fig. 7b, e) and Wiener (see Fig. 7c, f) filters to compare the impact of both filters on the raw correlations (see Fig. 7a, d).The Schur filter has less impact on the correlations compared to the Wiener filter which attenuates significantly the magnitude of the correlations for both ensembles.In Sect.2.2, the local averaging of the optimal length scale assumes that the sampling noise is spatially homogeneous (third ingredient of the methods).This homogeneity assumption is required for the ergodicity assumption to apply, therefore yielding a domain-averaged filtering approach.The sub-domain used for the covariance filtering (here 400 km × 400 km) limits the spatial extent to 200 km around the observation location.The size of the domain was defined primarily for computational efficiency and based on the size of correlation structures, usually of about 100-200 km in length scale.To evaluate this assumption, we compared the size of the sub-domain to filter the covariances with a large area of 900 km × 900 km for the 45member ensemble to a smaller area of 400 km × 400 km for the 25-member ensemble.Filtered correlations show similar results for Schur and slightly larger values for the smaller sub-domain when applying the Wiener filter.We conclude here that the spatial local averaging has a minor impact on the results and that our 25-member ensemble has similar spatial structures to the original 45-member ensemble, with larger correlations at short distances.We extend this analysis to the monthly timescale by showing monthly averaged error correlations, superimposed from different tower locations on the same map to aggregate the results at multiple locations (see Fig. 8).When averaged over longer timescales (see Fig. 8), the filtered correlations become isotropic, distributed around each location.The magnitudes remain larger with the Schur filter (see Fig. 8b) compared to the Wiener filter (see Fig. 8c) but the differences are noticeably smaller.The unfiltered correlations (see Fig. 8a) are noticeably larger due to noisy structures.After filtering, the spatial structures are distributed around the observation locations following a pseudo-Gaussian pattern.The magnitude of the error correlations, i.e., the length scale of the errors, is reduced in both cases compared to the raw correlations (see Fig. 8a).This result confirms that the sub-domain used here (400 km × 400 km) is sufficient to represent the error correlation structures around each measurement location and describes fully the error structures.
We show in Fig. 9 the results for the different ensemble sizes using the Schur filter.The 10-and 8-member ensembles show similar magnitude and patterns for the different sites, but the correlations are smaller than with the original ensemble.In comparison, the Wiener filter (see Fig. 10) generates consistent patterns with 25-, 10-and 8-member ensembles.In both cases, the filters decrease significantly the correlations in the five-member ensemble, revealing the inability of the filter to separate the noise from the actual error correlations.We present the localized correlation length scales for each tower and for each day in Fig. 12a.For both Center and Mead, length scales are noticeably larger than for the other towers and decrease rapidly until 2 July, before converging back to the same values diagnosed for other measurement sites.The differences across towers suggest local differences in error correlations, even across the same region for a single day (up to 70 km across our sites).These differences correspond to the beginning of summer, when both weather and ecosystem fluxes vary rapidly especially in agricultural areas.We discuss in Sect.4.3 the relationship between meteorological and CO 2 variables for both error variances and error correlations.

Minimum size of calibrated ensembles
We discussed in Sect.3.3 the dependence of the success of the variance filtering on the ensemble size.From these results, an ensemble of at least 8 to 10 members seems required to reach convergence below our 750 km convergence and hence filter the error variances.However, the spatial representation of the averaged filtered variances using a calibrated five-member ensemble (see Fig. 5) indicates a reasonable recovery of the error variances at the monthly timescale but not at the daily timescale.Theoretically, the minimum number of members for the covariance filtering is four, based on Eq. ( 11) with a factor (N − 3) in the denominator, which shows that five-member ensembles are close to this limit and are not recommended in a more general context.The application here suggests 5-member ensembles are acceptable, but 8-to 10-member ensembles would be a minimum both in practice and in theory over different seasons and regions.Convergence beyond our threshold over single days has a limited impact on the monthly mean filtered variances.We conclude here that the filter produces satisfactory results to generate first-order estimates of the CO 2 mole fraction errors.To achieve a systematic daily convergence below threshold, we recommend a larger number of members in the ensemble.One important point here is the calibration step performed before filtering, which optimizes the information content in each member relative to the other members.Therefore, a randomly generated ensemble may require additional members in order to represent the actual error variances.We tested the filtering technique on a random ensemble (i.e., uncalibrated) and found that beyond-threshold convergence is more frequent with 10 or less members (not shown here).

Impact of calibration on small-size ensembles
We have explored the impact of the calibration process on the error variances and covariances by filtering non-calibrated sub-ensembles of 8 and 10 members (see Fig. 11).These random ensembles have no member in common with their calibrated counterpart and are composed of simulations using various physics configuration randomly selected among the 45 original model configurations.The optimal length scale of the variance filter is systematically lower for non-calibrated ensembles (see Fig. 11, in gray and light blue) compared to calibrated ensembles (see Fig. 11, in black and royal blue), suggesting lower levels of noise with larger spatial structures, similar to larger ensemble sizes (25 members or more).Be-cause members of the calibrated ensembles were selected to maximize the information content, calibrated-ensemble members differ more from each other than non-calibrated ensemble members.As shown in Díaz-Isaac et al. (2018a), one member with higher or lower PBLH statistics (usually the monthly mean model estimate) is systematically selected in order to generate calibrated ensembles with enough variance, and therefore capture the spatial and temporal variability in observed PBLH's from 14 radiosondes.Calibration might suggest different combinations of model physics for every single time period.In future studies, we recommend a combination of several preselected model physics with added perturbations to produce a sufficient ensemble spread but not perform the calibration for every time period.The calibration procedure might still be applied to existing ensembles but remain insufficient to sample the full spatial and temporal variability as discussed in Díaz-Isaac et al. (2018a).Some of these members introduce different spatial structures compared to the original ensemble, increasing the spread significantly.However, the small size of our ensembles with a larger variance may affect the sampling of these structures and hence our ability to differentiate noise from actual error structures.The approach proposed by Ménétrier et al. (2015a) was initially designed for ensembles with independently distributed members, which is not the case here.Some true correlation structures in the calibrated ensemble may be regarded as noise by the filter.Another approach would generate an ensemble based on the localized ensemble-based error covariance matrix.This approach will be developed further in future studies but is beyond the scope of this paper.

Spatial structures in CO 2 and meteorological errors
In Fig. 6, significant differences in filtered error variances were observed between in situ CO 2 mole fractions (at different altitudes), XCO 2 mole fractions and PBLH.Therefore, we conclude here that transport errors of meteorological variables are not transferable to CO 2 and XCO 2 error variances.This finding is in agreement with Miller et al. (2015), who found no direct relationship between errors in the meteorology and in situ CO 2 mole fractions.However, considering the covariances, the spatial structures in CO 2 mole fraction errors inherited from transport model errors exhibit welldefined patterns (e.g., Fig. 10).By fitting a simple Gaussian function in the form of e − x L to the filtered covariance fields, we diagnosed the characteristic length scale of the spatial error structures for the different variables, here CO 2 100 mhigh mole fractions, the mean horizontal zonal wind component and PBLH. Figure 12 shows the daily length scales at the seven measurement locations over the simulation period (i.e., 27 June to 21 July).The length scales L for the three variables increase rapidly between 27 June and 4 July from less than 100 to 150 km or higher.As there is no long-term spin-up (re-initialization of the perturbations every 5 d), the asymptotic behavior is most likely due to the seasonality of the errors from early to late summer.The seasonal changes in the atmospheric dynamics impact the spatial structures in the errors for the three variables.Across the seven sites, the characteristic length scale of spatial error structures also vary significantly, in particular for PBLH with large differences across sites.Both CO 2 mole fractions and the mean zonal wind component reach a maximum value over July: respectively, 140 and 120 km.The comparison of the mean length scales (Fig. 12d) highlights the differences between the three variables.Both the CO 2 100 m high mole fractions and the mean horizontal zonal wind component show similar variations and converge towards the same values but differ significantly from 29 June to 10 July.We conclude here that firstorder estimates for CO 2 spatial error correlations may be derived from meteorological error structures, in particular from the mean horizontal wind speed.But these approximations may be valid only for specific time periods.As presented in Sect.3.4, error variances in CO 2 and XCO 2 mole fractions are decoupled from PBLH errors.Here, we suggest that error correlations may be derived from wind errors, but error variances should still be computed independently.

Evaluation and modeling of error correlations
This study presents a methodology to filter the noise in error structures from a small-size ensemble.The evaluation of the filtered structures would benefit from dense measurement campaigns sampling spatial structures across large domains, such as the Atmospheric Carbon and Transport (ACT)-America campaigns.2Previous studies have shown the utility of aircraft measurements to diagnose error correlations (Gerbig et al., 2003), but the separation of spatial structures induced by surface flux errors and atmospheric transport errors remains challenging in order to construct observation error covariance matrices.The combination of ensemble systems such as ensemble Kalman filter (EnKF) systems and intensive aircraft campaigns will provide additional insights to evaluate filtering approaches (e.g., Chen et al., 2019).To introduce the findings of our study into an atmospheric inversion system, an additional step would be required in order to construct a regularized error covariance matrix.In this study, we acknowledge here that we have applied Schur and Wiener filters using the raw filtering matrices (Eqs.9, 11 and 6), which may not produce semi-positive definite covariances.Future studies should include an additional step by adding a regularization of the covariances before filtering.For example, Lauvaux et al. (2009) proposed to model the error structures using a diffusion equation able to represent anisotropic structures, following the methodology described in Pannekoucke et al. (2008).Here, we presented a local filter able to remove the noise in the error structures.The diag-  nosed error structures can be approximated by different spatial functions of varying degrees of complexity, further regularized to generate positive-definite error covariance matrices.This next step is beyond the scope of this paper but will be conducted in the future to generate an efficient model of the corresponding error covariances based on our current results.

Conclusions
We have diagnosed the error variances and the spatial error structures from our mesoscale transport models at daily and monthly timescales.Applied to both CO 2 mole fractions and meteorological variables, we implemented a costeffective filtering technique currently used in meteorologi-cal data assimilation systems (Ménétrier et al., 2015b) to describe spatial error structures using a small-size ensemble.The approach remains affordable for multiyear inversions of sources and sinks at continental or regional scales.The removal of noisy structures in our small-size ensembles is evaluated by comparison with larger-size ensembles, both the original 45-member ensemble and our optimal calibrated sub-ensemble of 25 members.A second filtering approach for error covariances was successfully applied using the Wiener filter, producing similar results compared to the Schur filter over the 1-month simulation period.Differences were noticeable at shorter timescales (i.e., daily).The spatial distribution of error variances and spatial error structures are recoverable from small-size ensembles of 8 to 10 members, daily for in situ CO 2 mole fractions and monthly for total-column XCO 2 , providing a more realistic representation of transport errors in future mesoscale inversions of CO 2 fluxes.We noted that error variances of in situ CO 2 mole fractions and total-column XCO 2 differ significantly, even when varying the altitudes or considering PBLH error structures.We conclude that error variances for remote-sensing observations need to be quantified independently of PBL or free-tropospheric mole fractions.
We have discussed the potential use of meteorological error structures such as the mean horizontal wind to approximate spatial error correlations of in situ CO 2 mole fractions.The seasonal variations in wind, PBLH and in situ CO 2 mole fractions are highly correlated, while the typical length scales in error structures vary from 100 to 150 km in the middle of summer depending on the variable.We conclude here that meteorological error structures may provide a first-order estimation of correlation length scales in CO 2 inversions when no ensemble of CO 2 simulations is available.

Appendix A: Calibration of the reference 25-member ensemble
In this study, we generate a reference ensemble to evaluate the sampling noise in the ensembles of smaller sizes.The original 45-member ensemble, uncalibrated, cannot be used as a reference as it underestimates model errors.Yet, the reference ensemble needs to include enough members to limit the sampling noise.Based on the same original ensemble of 45 members as in Díaz-Isaac et al. (2018a), we reduce it to a calibrated 25-member ensemble using the simulated annealing (SA) algorithm.The selection of the optimal calibrated ensemble is based on three meteorological variables (i.e., wind speed, wind direction and PBLH) and follows the same procedure described in Díaz-Isaac et al. (2018a).The SA for 25 members uses 40 000 iterations to reach convergence, significantly larger than the 20 000 iterations for 10-, 8-and 5member ensembles.The same criterion used by Díaz-Isaac et al. (2018a) was applied to the selection process of the calibrated 25-member ensemble, improving the flatness of the rank histograms (Fig. A1).This selection is based on two criteria.First, we selected all the 25-member sub-ensembles with a rank histogram score smaller than six for each individual meteorological variable.In a second step, we filtered out the 25-member ensembles accepted by the SA algorithm but corresponding to a bias (i.e., mean model-data mismatch over 25 d) larger than the bias in the original 45-member ensemble.These criteria are applied to the three meteorological variables.This procedure is described in more detail in Díaz-Isaac et al. (2018a).The rank histograms in Fig. A1 show a limited under-dispersion of the 25-member ensemble, significantly reduced after calibration from 6.1, 6.2, 3.2 to 5.1, 4.9, and 2.5 for wind speed, wind direction and PBLH, respectively.

Figure 2 .
Figure 2. Length scale (in km) of the variance filter for in situ CO 2 mole fractions at about 100 m a.g.l.using Gaussian (solid lines) and non-Gaussian (dash lines) equations from 27 June to 21 July 2008.

Figure 3 .
Figure 3. Length scale (in km) of the variance filter for XCO 2 total-column dry air mole fractions using Gaussian (solid lines) and non-Gaussian (dash lines) equations from 27 June to 21 July 2008.

Figure 4 .
Figure 4. Length scale (in km) of the variance filter for planetary boundary layer heights using Gaussian (solid lines) and non-Gaussian (dash lines) equations from 27 June to 21 July 2008.

Figure 7 .
Figure 7. Hourly raw (a, d) and filtered correlations using Schur (b, e) and Wiener (c, f) filtered correlations of in situ CO 2 mole fractions at the Centerville tower (at about 100 m a.g.l.) on 26 June 2008.The correlations are based on the original 45-member ensemble (a, b, c) with a large subdomain (900 km × 900 km) and the 25-member ensemble (d, e, f) over the reference subdomain (400 km × 400 km).The highlighted domain also corresponds to the domain used for the spatial averaging of the correlations in the optimality conditions.

Figure 8 .
Figure 8. Raw (a) and filtered correlations of CO 2 in situ mole fractions at about 100 m a.g.l.based on 25 members using the Schur localization (b) and Wiener and Schur filter (c).

Figure 11 .
Figure 11.Length scale (in km) of the Gaussian variance filter for 10-member random (in gray) and calibrated (in black) ensembles, and 8-member random (in light blue) and calibrated (royal blue) ensembles, for in situ CO 2 mole fractions at about 100 m a.g.l.using Gaussian equations from 27 June to 21 July 2008.

Figure 12 .
Figure 12.Characteristic length scales L of filtered error correlations (from the 25-member ensemble) using an exponential function (e − x L ) for CO 2 100 m high mole fractions (a), the mean horizontal zonal wind component (U ; b), PBLH (c) and the mean values for the three variables (d) for every day of the simulation period (27 June to 21 July).

Figure A1 .
Figure A1.Rank histograms of the 25-member ensemble after calibration for wind speed (a), wind direction (b) and planetary boundary layer height (c) over 35 d (18 June to 21 July 2008) using 14 rawinsonde sites available over the US Midwest.The horizontal dashed line corresponds to the ideal value for a flat rank histogram with respect to the number of members.The rank histogram score (δ) defines the flatness of the rank histogram, 1 being the ideal value.