Atmospheric inversions are widely used in the optimization of
surface carbon fluxes on a regional scale using information from
atmospheric

The continuous rise of the abundance of greenhouse gases in the
atmosphere, especially due to fossil fuel combustion, alerted the
scientific community to systematically monitor these emissions. The
challenge is not limited only to revealing the spatial distribution of

An atmospheric inverse modeling system provides the link from atmospheric concentrations to surface fluxes. However, the limited number of observations available for solving the system for quite a number of unknowns (spatially and temporally resolved fluxes) makes the inverse problem strongly under-determined. To solve the inverse problem, the system incorporates Bayes' theorem and uses a priori knowledge, provided by biosphere models and emission inventories accompanied by corresponding uncertainty estimates. Then, the system optimizes the a priori fluxes by minimizing the difference between model predictions and observed concentrations. For the current study only the biospheric fluxes were optimized, and emissions from fossil fuel combustion are assumed to be known much better, as is the case in almost all published regional inversion studies. Inversion systems have been extensively used to derive spatiotemporal flux patterns on global (e.g., Enting et al., 1995; Kaminski et al., 1999a; Gurney et al., 2003; Mueller et al., 2008) and regional scales (e.g., Gerbig et al., 2003a; Peylin et al., 2005; Lauvaux et al., 2012; Broquet et al., 2013).

The challenge in regional inversions is to reconstruct, at high
resolution, the spatiotemporal flux patterns, usually of the net
ecosystem exchange (NEE). For that purpose currently deployed global
or regional inverse modeling schemes use different state spaces
(i.e., the set of variables to be optimized through the inversion
process). Peters et al. (2007) split the domain of interest into
regions according to ecosystem type. Subsequently, fluxes are optimized
by using linear multiplication factors to scale NEE for each week and
each region. The pitfall of this system is that a zero prior flux has
no chance to be optimized and remains zero. Zupanski et al. (2007)
divided the NEE into two components, i.e., the gross photosynthetic
production (GPP) and ecosystem respiration (

Introducing proper prior flux uncertainties is crucial for meaningful
posterior estimates, as these uncertainties weight the prior knowledge
between different locations and times, as well as with respect to the
data constraint. The uncertainties have the form of a covariance
matrix and can be categorized into uncertainties of the prior fluxes
and uncertainties of the observational constraint, which includes
measurement and transport model uncertainties. While the measurement
uncertainty in the observational constraint is usually defined with
the main diagonal of the covariance matrix representing the
uncertainty of the observations and the model at a specific time and
location, our knowledge for the prior uncertainty is limited,
especially regarding temporal and spatial correlations that
effectively control the state space. Early inversions assumed fully
uncorrelated flux uncertainties (Kaminski et al., 1999b), while
spatial and temporal correlations were used later by Rödenbeck
et al. (2003), who investigated the autocorrelation of monthly

Daily NEE flux residuals from model–data comparisons showed
temporal correlations of up to 30 days but very short spatial
correlations of up to 40

This study primarily aims to use the information extracted from the
model–EC data residuals (spatiotemporal error structure) to define
a data-driven error covariance rather than simply assuming one,
adopting a conservative one or an expert knowledge solution. For that,
we implement our previous methodology and findings regarding the prior
uncertainty to atmospheric inversions following Kountouris
et al. (2015). As explained above, we implement two uncertainty terms:
the first one to reflect the true spatiotemporal error structure and
the second term to reflect a bias term. We use the Jena inversion
system (Rödenbeck, 2005; Rödenbeck et al., 2009) for the
regional scale consisting of a fully coupled system as described in
Trusilova et al. (2010) that couples the global three-dimensional
atmospheric tracer transport model TM3 (Heimann and Körner, 2003)
and the regional stochastic Lagrangian transport model STILT (Lin
et al., 2003). This scheme allows the retrieval of surface fluxes at a much
finer resolution (0.25

This paper is structured as follows. In Sect. 2 we present the inversion scheme and introduce the settings of the atmospheric inversions. In Sect. 3 we present the results from a synthetic inversion experiment aimed to assess the prior error setup, considering it as a step towards atmospheric inversions using real atmospheric data with an objective, state-of-the-art prior error formulation. Discussion and conclusions follow in Sects. 4 and 5, respectively.

The Jena inversion system (Rödenbeck 2005; Rödenbeck et al.,
2009) was used for the current study. The scheme is based on the
Bayesian inference and uses two transport models, the TM3 model
(Heimann and Körner, 2003) for global simulations and the STILT model (Lin
et al., 2003) for regional simulations. The advantage of the system is
that it combines a global transport model with a regional one without
the need of a direct coupling along the boundaries. The global transport model is used
to calculate fluxes from the far field (outside of the regional domain
of interest), and subsequently this information can be used to provide
lateral boundary information for the regional model. The primary input of
the system is the observed mixing ratios

In the following, we briefly describe the inverse modeling approach. For more details the reader is referred to Rödenbeck (2005).

In grid-based atmospheric inversions the number of unknowns (spatially
and temporally resolved fluxes) is larger than the number of
measurements (hourly dry mole fractions at different sites), making
the inverse problem ill-posed. In the Bayesian concept this can be
remedied by adding a priori information. This information can be
written as

The inversion system seeks to minimize the following cost function
that combines the observational constraint (Eq. 3) and the prior flux constraint

The a priori

Domain of the inversions (dashed rectangle). Locations of the atmospheric measurement stations are shown with blue marks.

Optimized VPRM parameters

Additionally, biogenic

The a priori flux in a real-data inversion would have three components
including fossil fuel and ocean fluxes:

The EC station locations used for this analysis were exactly the same as in Kountouris et al. (2015), ensuring similarity in the derivation of the error structure for the synthetic data inversions. Following this approach, apart from the similarity, we also ensure that results from the synthetic experiment would be informative for a real-data inversion by using exactly the same information to characterize the prior uncertainties. Of note is that for the synthetic data inversions, prior fluxes from VPRM model were not optimized against GBIOME-BGCv1 true fluxes.

The implicitly defined prior error covariance matrix contains diagonal
elements of 1.45

The inversion system optimizes additive corrections to 3-hourly
fluxes in a sense that the posterior flux estimate can be given by the
sum of a fixed a priori term (first term of the right-hand side in
Eq. 8) and an adjustable term (second term in Eq. 8). The latter has
a priori a zero mean. The biogenic fluxes can be defined as follows:

Note that the a priori error covariance matrix (

For the S1 case the posterior flux estimates can be expressed by
adding the optimized bias flux field to Eq. (8)

Following Rodgers (2000), the posterior flux uncertainties are contained
in the covariance matrix of the posterior probability distribution,
which can be estimated from Eq. (

The observation vector

Monthly data coverage plot for the atmospheric stations used in the regional inversions. The left column shows the code name and the right columns show the station class and the assigned uncertainty in units of ppm. “C” stands for continental sites near the surface, “T” for continental tall towers, “S” for stations near shore, “M” for mountain sites, “MU” for mountain sites with diurnal upslope winds and “UP” for urban pollutant.

Information on the stations used for the regional inversions. Same network applied for the synthetic data inversions and the real-data inversions in Kountouris et al. (2018). In the first column the term “type” stands for continuous (C) or flask (F) data.

The model–data mismatch uncertainty associated with each measurement is expressed as a diagonal covariance matrix and contains measurement errors and errors from different components describing the modeling framework (i.e., model errors due to imperfect transport, aggregation errors) (Gerbig et al., 2003b). For the current study, all sites are classified according to their characteristics (e.g., tall tower, mountain sites), and uncertainties were defined depending on the site class (Fig. 2, legend on the right). The uncertainties are considered as representative for current inverse modeling systems. Although the measurement error covariance is a diagonal matrix, transport error correlations might be present. Although we do not explicitly introduce off-diagonal terms in the measurement error covariance matrix, we do consider for temporal correlations via a data density weighting function that inflates the uncertainty. (see Sect. 2.1 and more information in Rödenbeck, 2005).

For the synthetic data study only the regional atmospheric model STILT
was used to create the observations with a forward run and to perform
the inversion. This was feasible since the synthetic

With respect to the assumed model height, STILT uses surface elevation
maps from ECMWF (European Centre for Medium-Range Weather Forecasts)
with a resolution of

The purpose of the synthetic study is to evaluate the system setup with a realistic approach. To evaluate the ability of the system to retrieve the synthetic true fluxes, we visualize spatially distributed fluxes and we study spatially integrated (domain and national scale) as well as temporally (annual and monthly scale) integrated fluxes.

A comparison of true and modeled

Daily nighttime (23:00–04:00 UTC) averages for prior, true
and posterior

Taylor diagram for daily averaged modeled and measured
time series (annual basis) of

To estimate the goodness of fit we consider the station-specific

Another important aspect is the reduced

RMSD (first column in ppm) and correlation coefficients (second
column) between known truth and prior and posterior

Annual spatial distribution for the prior, true and
posterior biogenic flux estimates for the two synthetic inversions
S1 and B1

Annual integrated influence for 2007 of the current
atmospheric network. Footprint influence is presented on
a logarithmic scale and units are in

Monthly and annual carbon flux budget, integrated over the European domain. Note that both inversions share the same annual prior uncertainty but monthly uncertainties differ. Blue and red error bars denote the prior uncertainty for the B1 and S1 scenarios, respectively.

In flux space, we evaluate the inversion performance by comparing the retrieved flux estimates against the synthetic fluxes (true) on different temporal and spatial scales: annually and monthly integrated fluxes, domain-wide and on a country scale. In particular we are interested in capturing the true fluxes down to country scale. For that we assess monthly posterior retrievals, which we compare to reference data (true fluxes), country aggregated, using a Taylor diagram. This diagram provides a concise statistical summary of how well patterns match each other in terms of their correlation and the ratio of their variances.

The spatial distributions of the annual biosphere–atmosphere exchange
fluxes for the prior, the known truth and the posterior cases are
presented in Fig. 5. Note that annual fluxes between the two biosphere
models used for prior fluxes and true fluxes are substantially
different. The inversion significantly adjusts the spatial flux
distribution mainly in central Europe and in southern Scandinavia,
where a denser atmospheric network exists. The absolute annual mean
difference in fluxes (

Performance of the two error structures expressed as the spatial
RMSD of the optimized monthly and annual NEE fluxes compared to the truth
for the whole domain in

Temporal evolution of monthly NEE for selected European countries for the synthetic data inversion.

Overview of the model performance (S1 case) summarized in a Taylor diagram. Posterior and prior monthly- and country-scale aggregated biospheric fluxes are compared against the reference fluxes (true). Each line corresponds to a different country. The starting point of each arrow shows the prior and reference comparison and the ending point shows the posterior and reference comparison. Ideally the ending point should coincide with the green point, which represents the reference model.

We are specifically interested in the ability of the inversion system to capture integrated fluxes over time and space. Figure 7 shows an overview of the domain-integrated fluxes on monthly and annual scales. Despite the remarkably larger a priori (VPRM) sink compared to the synthetic truth (GBIOME-BGCv1) during the growing season, both inversions, with and without the bias term, produce posterior flux estimates that fully capture the true monthly and annually integrated fluxes. While the monthly posterior estimates give no clear evidence on which inversion performs better, retrievals on an annual scale slightly favor the inversion without the bias term (B1 case). A difference was observed in the prior uncertainties between the two inversions. While both were scaled to have the same prior annual uncertainty, the B1 inversion has systematically larger prior monthly uncertainties than the S1 as a result of the inflated spatiotemporal component of the prior error covariance. Posterior uncertainties were found to be similar and include or are close to including (S1 case) the true flux estimates. The uncertainty reduction for annually and domain-wide integrated fluxes, defined as the difference between prior and posterior uncertainties normalized by the prior uncertainty, was found to be 73 and 69 % for S1 and B1, respectively. Note that whilst the prior uncertainty refers only to the flux space, the posterior uncertainty depends on the uncertainty of prior fluxes, measurements and transport.

In order to assess how well the posterior estimates agree with the true fluxes, RMSD between true and posterior monthly integrated gridded fluxes was computed (Table 4). Both B1 and S1 inversions show a similar reduction in the RMSD values compared to the prior. The same picture emerges for the annually integrated fluxes.

Of particular interest is the performance of the system on a regional scale, specifically at national level. Figure 8 shows monthly fluxes for selected European countries, including the prior, true and posterior estimates with the corresponding uncertainties. Both error structures show a similar performance. Despite the large prior misfit, the system succeeded in retrieving monthly fluxes at country level. Better constrained regions mainly located in central Europe show the ability to broadly capture the temporal flux variation on a monthly scale. Figure 9 summarizes in a Taylor diagram the inversion performance for the S1 case and for each EU-27 country, showing the improvement of monthly and country aggregated fluxes (a perfect match would be if the head of the arrow were to coincide with the reference point marked as a green bullet). It is worth mentioning that for regions that are less constrained by the network, such as Great Britain, Spain, Poland and Romania, the inversions also still improved the posterior estimates compared to the prior estimates (see also Fig. 9).

Mean monthly NEE averaged over the 53 different eddy
covariance site locations as reported in Kountouris
et al. (2015). A priori (black), true (green) and posterior fluxes
for scenarios B1 (blue) and S1 (red) are shown. Units are in

In order to investigate the potential of using EC
measurements for evaluating the retrieved

Results from the synthetic experiment showed the strengths but also the weaknesses of the system to retrieve the true spatial flux distribution. Although the error structure applied to this experiment was statistically coherent with the mismatch between prior and true fluxes, we note a limited ability of the current atmospheric network to retrieve fluxes on local scales. For coarser spatial scales (country level) the carbon budget estimates in the synthetic inversion showed a quite good performance on monthly and annual temporal scales. Further, we observed an average reduction of the monthly uncertainties of 65 % for the B1 case and 64 % for the S1 case. In combination with the fact that the flux estimates reproduce the truth within the posterior uncertainties, this gives us confidence in the accuracy of our estimates.

The current study does not focus on the transport error quantification but rather includes it as diagonal elements in the measurement error covariance, which is typical in atmospheric inversions. The chi square values confirm that there is no underestimation of the uncertainties. We note though that erroneous flux estimates are likely to be estimated, especially on finer spatial scales on which the transport model is not able to resolve the real transport (e.g., individual eddys, complicated terrain). However, for coarser spatial scales transport models are expected to perform better, which seems to be in line with comparisons of the prior and posterior with the true flux estimates, which better agree on largely aggregated scales.

Prior error correlation in time and space limits the scale on which
information can be retrieved from the inversion. The spatial
correlation of several hundreds of kilometers implies that fluxes on
scales smaller than this cannot be significantly improved by the
inversion, as the results clearly showed. To assess this more
quantitatively, the spatial correlation between a priori or retrieved
and true monthly fluxes is calculated for different spatial
aggregation scales (starting at 0.25

The annual spatial flux distribution of the B1 and S1 cases was found
to be quite similar, indicating that inflating the uncertainty by
a factor of 1.5 (B1 case, see also Sect. 2.2.1) or adding a bias
component to compensate for the inflation (S1 case) lead to a similar flux
constraint. This could be explained due to the long correlation length
(566

The true fluxes were used to validate the posterior flux
estimates. In this synthetic experiment, both fluxes share the same
spatial resolution (25

The high RMSD reduction in combination with the high correlation values and the captured variability between posterior and true dry mole fractions in the synthetic experiment suggest a good performance of the inversion system to retrieve the true mixing ratios. Nevertheless, this is not surprising, as the atmospheric data are fitted by the inversion. Furthermore, the forward and the inverse runs used identical transport, without any impact from imperfections in transport simulations.

The uncertainties in the flux space are statistically consistent with
the model–model flux mismatch. However the reduced

This technical note describes the setup and the implementation of
prior uncertainties as derived from model–eddy covariance data
comparisons into an atmospheric

Significant flux corrections and error reductions were found for larger aggregated regions (i.e., domain-wide and countries), giving us confidence on the reliability of the results for a real-data inversion at least for aggregated scales up to the country level. We found a similar performance for both error structures. A more detailed analysis of the spatial and temporal scales, on which the inversion provides a significant gain in information on the distribution of fluxes, clearly confirms that (a) fluxes on spatial scales much smaller than the spatial correlation length used for the a prior uncertainty cannot be retrieved; (b) the inversion performs best on around monthly temporal scales; and (c) especially the small spatial scales need to be realistically represented in the a priori fluxes.

The Jena Inversion system is available from Christian Roedenbeck upon request (christian.roedenbeck@bgc-jena.mpg.de). The prior terrestrial fluxes (VPRM and GBIOME-BGC models) are available from Christoph Gerbig upon request (cgerbig@bgc-jena.mpg.de).

This work contributed to the European Community's Seventh Framework Program (FP7) project ICOS-INWIRE, funded under grant agreement no. 313169. The authors would also like to thank the Deutsches Klimarechenzentrum (DKRZ) for using the high-performance computing facilities. This publication is an outcome of the International Space Science Institute (ISSI) Working Group on “Carbon Cycle Data Assimilation: How to consistently assimilate multiple data streams”. Edited by: Yafang Cheng Reviewed by: three anonymous referees