A comparison of different inverse carbon flux estimation approaches for application on a regional domain

We have implemented six different inverse carbon flux estimation methods in a regional carbon dioxide (CO2) flux modeling system for the Netherlands. The system consists of the Regional Atmospheric Mesoscale Modeling System (RAMS) coupled to a simple carbon flux scheme which is run in a coupled fashion on relatively high resolution (10 km). Using an Ensemble Kalman filter approach we try to estimate spatiotemporal carbon exchange patterns from atmospheric CO2 mole fractions over the Netherlands for a two week period in spring 2008. The focus of this work is the different strategies that can be employed to turn first-guess fluxes into optimal ones, which is known as a fundamental design choice that can affect the outcome of an inversion significantly. Different state-of-the-art approaches with respect to the estimation of net ecosystem exchange (NEE) are compared quantitatively: (1) where NEE is scaled by one linear multiplication factor per land-use type, (2) where the same is done for photosynthesis (GPP) and respiration ( R) separately with varying assumptions for the correlation structure, (3) where we solve for those same multiplication factors but now for each grid box, and (4) where we optimize physical parameters of the underlying biosphere model for each land-use type. The pattern to be retrieved in this pseudo-data experiment is different in nearly all aspects from the first-guess fluxes, including the structure of the underlying flux model, reflecting the difference between the modeled fluxes and the fluxes in the real world. This makes our study a stringent test of the performance of these methods, which are currently widely used in carbon cycle inverse studies. Correspondence to: L. F. Tolk (lieselotte.tolk@acaciawater.com) Our results show that all methods struggle to retrieve the spatiotemporal NEE distribution, and none of them succeeds in finding accurate domain averaged NEE with correct spatial and temporal behavior. The main cause is the difference between the structures of the first-guess and true CO 2 flux models used. Most methods display overconfidence in their estimate as a result. A commonly used daytime-only sampling scheme in the transport model leads to compensating biases in separate GPP and R scaling factors that are readily visible in the nighttime mixing ratio predictions of these ystems. Overall, we recommend that the estimate of NEE scaling factors should not be used in this regional setup, while estimating bias factors for GPP and R for every grid box works relatively well. The biosphere parameter inversion performs good compared to the other inversions at simultaneously producing space and time patterns of fluxes and CO 2 mixing ratios, but non-linearity may significantly reduce the information content in the inversion if true parameter values are far from the prior estimate. Our results suggest that a carefully designed biosphere model parameter inversion or a pixel inversion of the respiration and GPP multiplication factors are from the tested inversions the most promising tools to optimize spatiotemporal patterns of NEE.


Abstract.
We have implemented six different inverse carbon flux estimation methods in a regional carbon dioxide (CO 2 ) flux modeling system for the Netherlands. The system consists of the Regional Atmospheric Mesoscale Modeling System (RAMS) coupled to a simple carbon flux scheme which is run in a coupled fashion on relatively high resolution (10 km). Using an Ensemble Kalman filter approach we try to estimate spatiotemporal carbon exchange patterns from atmospheric CO 2 mole fractions over the Netherlands for a two week period in spring 2008. The focus of this work is the different strategies that can be employed to turn first-guess fluxes into optimal ones, which is known as a fundamental design choice that can affect the outcome of an inversion significantly.
Different state-of-the-art approaches with respect to the estimation of net ecosystem exchange (NEE) are compared quantitatively: (1) where NEE is scaled by one linear multiplication factor per land-use type, (2) where the same is done for photosynthesis (GPP) and respiration (R) separately with varying assumptions for the correlation structure, (3) where we solve for those same multiplication factors but now for each grid box, and (4) where we optimize physical parameters of the underlying biosphere model for each land-use type. The pattern to be retrieved in this pseudo-data experiment is different in nearly all aspects from the first-guess fluxes, including the structure of the underlying flux model, reflecting the difference between the modeled fluxes and the fluxes in the real world. This makes our study a stringent test of the performance of these methods, which are currently widely used in carbon cycle inverse studies.
Correspondence to: L. F. Tolk (lieselotte.tolk@acaciawater.com) Our results show that all methods struggle to retrieve the spatiotemporal NEE distribution, and none of them succeeds in finding accurate domain averaged NEE with correct spatial and temporal behavior. The main cause is the difference between the structures of the first-guess and true CO 2 flux models used. Most methods display overconfidence in their estimate as a result. A commonly used daytime-only sampling scheme in the transport model leads to compensating biases in separate GPP and R scaling factors that are readily visible in the nighttime mixing ratio predictions of these systems.
Overall, we recommend that the estimate of NEE scaling factors should not be used in this regional setup, while estimating bias factors for GPP and R for every grid box works relatively well. The biosphere parameter inversion performs good compared to the other inversions at simultaneously producing space and time patterns of fluxes and CO 2 mixing ratios, but non-linearity may significantly reduce the information content in the inversion if true parameter values are far from the prior estimate. Our results suggest that a carefully designed biosphere model parameter inversion or a pixel inversion of the respiration and GPP multiplication factors are from the tested inversions the most promising tools to optimize spatiotemporal patterns of NEE. derived from observations using Bayesian statistical methods to minimize the difference between model predictions and observations. Recent inverse modeling studies include for instance efforts to derive the net carbon exchange across the globe or at smaller scales from mixing ratio observations of CO 2 (e.g. Bousquet et al., 2000;Gurney et al., 2003;Rödenbeck et al., 2003;Mueller et al., 2008;Lauvaux et al., 2009;Ciais et al., 2010;Göckede et al., 2010) or attempts to constrain biophysical parameters from eddycovariance methods (Papale and Valentini, 2003;Carvalhais et al., 2008).
One recurring issue in inverse studies is the large number of choices to be made concerning amongst others, the selection and weighting of observations, the magnitude and correlations of the uncertainties, the treatment of the time and space domain, and even which unknowns to solve for in the application. As a result, no two inverse studies use the same assumptions and the outcome of inverse studies always needs to be evaluated within the limits of the modeling framework chosen.
Many authors have used atmospheric CO 2 mole fraction observations from around the globe to reconstruct the spatiotemporal patterns of net ecosystem exchange (NEE), each in a different way. Peters et al. (2007) used NEE multipliers across large areas of similar vegetation to scale calculated NEE from a biosphere model over each week over many years. Lokupitiya et al. (2008) estimated NEE for a similar time frame, but estimated separate multipliers for simulated gross photosynthetic production (GPP) and ecosystem respiration (R), and for each model grid box. Rayner et al. (2005) in contrast chose to modify a set of physical parameters in the underlying biosphere model directly, thus adjusting NEE to match with observed CO 2 mole fractions. That study however did not estimate separate parameters for separate parts of the globe. The large resulting NEE differences between these three estimates shows that methodology used is an important part of the final result. Clearly, the question which method is most appropriate to estimate NEE has not been yet been resolved (if it ever will) and remains one of the critical issues in estimating source and sink distributions from observations.
In this paper, we want to further investigate the impact of different methodological choices on estimated reanalysis of NEE. The application we chose for this purpose is a regional inversion of CO 2 mole fractions using a highresolution transport model, and a realistic spatiotemporal distribution of NEE. We plan to use this framework for actual NEE inversions at a later stage, after determining the optimal approach through a set of pseudo-data inversions where the true answer is known. The regional character of this inversion allows us to disregard some of the issues related to carbon cycling on longer time scales, and from long-range atmospheric transport of CO 2 . The four methods we want to test are related to the studies mentioned in the previous paragraph: (1) where we estimate NEE multipliers per vegetation type, (2) where we estimate GPP and R multipliers per vegetation type, (3) where we estimate GPP and R multipliers for each grid box, and (4) where we estimate biophysical model parameters for each vegetation type. The specific questions we want to address are: what is the best strategy to determine the spatiotemporal pattern and magnitude of NEE in our domain? What are the strengths and weaknesses of each method?
Pseudodata studies always carry the danger of oversimplifying the real problem, or to be designed in a way to favor one outcome over another. We have tried to design our study to prevent this issue by using a "true" CO 2 exchange distribution from a different biosphere model (FACEM; Pieterse et al., 2007) than the one we use to retrieve the exchange patterns (5PM; Groenendijk et al., 2011). Differences exist between the models in physical formulations, plant functional types, driving parameter values, and in driving meteorological input data, minimizing the a-priori expected similarity. However, both models are based on similar principles and equations and even though they do not share the same landuse map for the domain, the prescribed land-use maps in each model are realistic and thus similar.
New in our approach is that we test all inversion approaches, each with different underlying assumptions, at a high resolution with the same meteorology, whereas in previous comparisons (e.g. Gurney et al., 2003) both the inversion method and the transport model could differ. This enables us to isolate the impact of the inversion methodology. We expect our results to be applicable to similar setups (short time periods, large flux heterogeneity, large CO 2 variations, small transport errors) but caution against extrapolation to the larger scales.
After describing the details of each of the methods included in our tests in Sect. 2.1, we will describe the general characteristics of all inversions in Sects. 2.2 to 2.5. We present our results next in Sect. 3, using a set of five metrics applied to each solution. Special attention is given to the non-linearity of the biosphere model parameter inversion. The strengths and weaknesses revealed in the result sections are further discussed in Sect. 4. Finally, we revisit our research questions and we present general conclusions and recommendations in Sect. 5.

Inversion methods
In this study we compare four different optimization methods that are used in current state-of-the-art inverse systems for CO 2 . We will briefly describe their main characteristics, which are also summarized in Table 1. For each inversion we applied the same methodology (Ensemble Kalman filter). This approach was necessary as the biosphere model parameter inversion is non-linear and could not be solved Atmos. Chem. Phys., 11, 10349-10365, 2011 www.atmos-chem-phys.net/11/10349/2011/  Peters et al. (2005) and references therein for a description of additional details of the inversion procedure.
The first inverse method is one where pre-calculated patterns of NEE from a biosphere model are linearly scaled across larger areas, similar to the CarbonTracker system . This can be denoted as: where β is a scaling factor for each land-use type (e), with an a-priori value of 1.0. NEE prior (x,y,t) is a high resolution NEE field from a biosphere model. This method has as advantage that it is straightforward to implement and needs little extra assumptions to stabilize the solution. Disadvantage is that the β factors offer little possibility to change sources into sinks (the sign of β then needs to change), or to scale small (near zero) fluxes to large fluxes (β needs to change a lot).
To overcome some of these disadvantages, systems were suggested that linearly scale gross fluxes instead (Zupanski et al., 2007;Lokupitiya et al., 2008;Schuh et al., 2009): R and GPP, which are large and stem from mostly independent processes at short time scales, then each carry a scaling factor β. An advantage is that this system does more justice to the actual processes in the carbon cycle, but a disadvantage is that the separation of the large and opposing fluxes using atmospheric CO 2 is very difficult, and might need extra regularization of the solution in the form of prescribed covariances. In addition to the uncorrelated version of this inversion, we therefore also test this option with correlations of 0.5 and 1.0 between β resp and β GPP for each land-use type.
A final variant of the approach above is to make the β resp and β GPP spatially explicit: NEE post (x,y,t) = β resp (x,y)R prior (x,y,t) −β GPP (x,y)GPP prior (x,y,t) This offers the advantage that the β resp and β GPP spatial patterns are allowed to vary within each ecoregion. However, additional regularization is necessary because the number of unknown bias parameters is too large to estimate from the limited atmospheric observations. In this study, we apply a spatial covariation between all grid points in the same landuse type that decreases exponentially with distance, similar to methods used in Rödenbeck et al. (2003), Peylin et al. (2005), Peters et al. (2005), Schuh et al. (2009) with a smaller length scale (L = 100 km) to fit with the more detailed regional setup of the inversion. Another class of inverse methods uses atmospheric CO 2 not to constrain the surface exchange patterns, but to directly optimize the parameters of the underlying biosphere model Scholze et al., 2007). The optimized parameters then control the new surface CO 2 exchange. An advantage of this approach is the seamless extrapolation of information across the space and time domain and the physical relevance of the optimized result (new biosphere model parameter values instead of scaling factors). Disadvantages come in the form of several pitfalls: limitations in the model structure are difficult to overcome, the model parameters are rarely directly constrained by atmospheric CO 2 and aliasing of information into the wrong parameter is possible. In addition, the biosphere model and often contains non-linearities that conflict with the inversion assumptions, as we will discuss elaborately in Sect. 3.1. The biosphere model optimization can be written as: where we have selected to optimize 4 parameters for each land-use class: β E 0 is a scaling factor for the respiration activation energy E 0 , β R10 is a scaling factor for the respiration rate at reference temperature, β V c max scales the carboxylation capacity, and β α scales the quantum yield for light limited assimilation. The first two parameters are used to adjust respiration while the latter two control photosynthesis (see next section).
The resulting six inverse methods described above will be referred to as the β NEE inversion, the βRG0.0 inversion (no GPP and R correlations), the βRG0.5 inversion (correlations of 0.5), the βRG1.0 inversion (fully correlated GPP and R), the βRGpixel inversion (estimates for each pixel), and the parameter inversion. Note that the βRG1.0 inversion is not the same as the β NEE inversion because the flux covariance is distributed differently in space and time.
For each inversion method the degrees of freedom are estimated with the simple formula from Patil et al. (2001) that considers the number of significant eigenvalues in the correlation matrix (normalized covariance matrix). For each of these inversions, the d.o.f. is relatively small compared to the number of observations (336). The estimated number of degrees of freedom is given in Table 1.

Modeling system, simulation period and domain
All inversions are based on simulation with the same model setup in which the non-hydrostatic mesoscale model RAMS (Pielke et al., 1992) is used to simulate the atmospheric transport. The version used in this study is B-RAMS-3.2, including adaptations to ensure mass conservation (Meesters et al., 2008). The prior NEE flux estimates are calculated with the simple biosphere model 5PM (Groenendijk et al., 2011) in which photosynthesis is calculated following Farquhar et al. (1980) and respiration is calculated with the relationship by Lloyd and Taylor (1994). The input for the biosphere model is summarized in Table 2. Further details on the transport and biosphere modeling system can be found in Tolk et al. (2009).
The simulations are performed for an area of 400 × 400 km at a relative high resolution of 10 km, centered in the Netherlands at 52.25 • N and 5.2 • E ( Fig. 1).
The Corine 2000 land use maps are used (http://dataservice.eea.europa.eu). Most of the area consists of cultivated land, in which the most abundant land use type is "Agricultural areas with complex cultivation patterns", which is further referred to as "crops 1". Second most abundant is grassland, referred to as "grass", and the third is "Agricultural land with significant areas of natural vegetation", referred to as "crops2". The simulation period 19 May 2008-2 June 2008 was selected to contain various weather types and thus flux regimes, including cloudy, rainy, and sunny days at the beginning of the growing season.

Control inversions
The simulations are performed in a pseudo-data environment, so that the "true" fluxes are known and the results of the different inversion methods can be evaluated against them. As a check on the inversion system, and as a reference for the performance of the inversions with a perfect or near perfect structure of the NEE flux pattern, we performed two control simulations. In the first, pseudo data were created Table 2. Summary of the differences between the FACEM biosphere model ("true") and the 5PM biosphere model (a-priori). In our pseudo-data study these differences mimic the expected structural differences between real CO 2 fluxes and those simulated with a biosphere model, and limit the performance of the inversions substantially. Hengelman; east). The area displayed, shown as lat long on the x and y axis is used in the RAMS meso-scale transport model, and is resolved on 10 × 10 km spatial resolution.
as a multiplication of the prior NEE fluxes. This pattern is within the solution capacity of each inversion option and it could be retrieved by all inversions, confirming that the inversion system worked correctly. In addition, we created a flux field based on a simulation with 5PM with parameters that were a realization of the apriori covariance of the parameters. The flux field is therefore fully within the statistical properties of each inversion method (see also Sect. 2.5), and the spatial structure is represented perfectly in all methods. This is the control inversion referred to in the rest of the text and serves in our setup as a reference for the performance of the inversions in absence of spatial biosphere model structure differences.

True fluxes and pseudo observations
Usually, in pseudo-data studies true fluxes are chosen as some realization of the underlying biosphere model (Zupanski et al., 2007;Schuh et al., 2009), as done in our control inversions. This choice is not very realistic when the structure of the underlying biosphere model itself is part of the inverse problem, and different structures will work better with different optimization strategies. For example, a parameter inversion when the truth was created with perturbed parameters will perform better than a β NEE inversion against the same truth. To prevent this in our study, and also to make the pseudo-data study more realistic, we have used as "true" fluxes those from a different biosphere model. Hourly biogenic respiration and photosynthesis flux fields from the FACEM model (Pieterse et al., 2007) were used. These were calculated at 6 resolution and have a different underlying land-use description, different soil type and LAI map, and different meteorological driver data as summarized in Table 2. As a result, all six test inversions have to overcome a difference in model structure that causes the simulated pseudo-CO 2 time series to never perfectly match with the true NEE distribution (Fig. 2a, b). This situation mimics reality in which a biosphere model never grasps the full complexity nor heterogeneity of the true NEE distribution, which can be an important source of error (Kaminsky et al., 2001;Gerbig et al., 2006;Carvalhais et al., 2008).
To create pseudo CO 2 mixing ratio data, the true CO 2 fluxes are coupled to the atmospheric model RAMS. The simulated 3-D atmospheric CO 2 field is sampled at the locations where also in reality CO 2 mixing ratios observations are available ( Fig. 1), where the highest observation level at the towers is used (Cabauw, 200 m; Lutjewad, 60 m; Loobos, 24 m; Hengelman; 18 m). Real observations will be used in a companion paper to obtain a real flux estimate. The inversions use hourly CO 2 mixing ratios sampled from the well mixed PBL between 11:00 and 16:00 UTC. In the control inversion we applied a model-data-mismatch of 0.2 ppm and with the FACEM truth we assumed a standard deviation of 1.2 ppm to account for the possible differences in the biosphere structures. The observation selection and uncertainty are the same in all different inversion methods. The total number of observations assimilated is 4 towers times 14 days times 6 h, equaling 336.

Prior flux covariance
A correct comparison between the different inversion options requires that the overall prior covariance of all options is equal. We require that the NEE, integrated over ecoregion and time, has the same ensemble-variance for all inversion options, and that the ratios between variance of ecoregiontime integrated respiration and GPP are also the same for the inversions. The standard is the parameter inversion, for which mean and variances are prescribed based on our pre- vious work (Tolk et al., 2009;their Table 2), which was in turn based on an eddy covariance study (Groenendijk et al., 2011). For the other inversions, the prior covariance is scaled such that the above-mentioned similarity between the inversions is satisfied (see Appendix A for details).
This approach is an important choice in our experiment design. It is interesting to note that equal variance of the time/ecoregion integrated NEE does not ensure the same uncertainty in the inversions at each point in space and time. Our choice of covariance treatment has ensured that (a) the inversions have equal covariance in the quantity that matters most to our CO 2 observations (NEE), (b) the spatial gradients in NEE variance between land-use types is conserved, and (c) the time integral covariance over the inversion are conserved suggesting that all inversions had an equal chance to find the mean NEE of the truth.
The χ 2 metric compares the a-priori model performance to the specified error structure by dividing the squared forecast residuals (y − Hx) 2 by the total covariance (HPH T + R) of fluxes and measurements. It is thus a measure of the balance between expected skill and achieved skill. An innovation chi-squared of close to 1.0 indicates a correct balance, while smaller chi-squared values suggest that the model performed better than specified in the covariance structure and hence the inversion was conservative in its prescription of covariance (Michalak et al., 2004).The innovation χ 2 statistics indicate that flux and observation uncertainties were well balanced. The χ 2 values range from 0.34 to 0.78 (Table 1) indicating that the model skill was high relative to the assumed uncertainty. The inversions were thus conservative in their flux adjustments and not over-constrained.

Results
In this section we present our results according to the performance of all the different inversion methods against a set of different metrics that each highlight a particular aspect of the inverse results (Sects. 3.2-3.6). Before the overview of the performance of all methods is presented, the special behavior of one of the methods which is partly non-linear (the parameter inversion) is highlighted in Sect. 3.1. An overall assessment of each individual method is given in the discussion section.

Optimizations and non-linearity
Five out of six inversion systems used in this study are linear, in the sense that a Gaussian set of parameters will translate to a set of similar Gaussian set of CO 2 flux fields. The exception is the parameter inversion. Out of the four parameters chosen for optimization only the reference rate for respiration (R ref ) is linear, while the other ones are nonlinear. In addition, the model follows the Farquhar et al. (1980) photosynthesis limitation principle, in which either light or carboxyl becomes limiting for photosynthesis. The transition from one regime to another presents an important nonlinear step in the simulation of NEE. Figure 3 shows the distribution of fluxes resulting from a chosen distribution of parameter values. It shows that the activation energy parameter (E 0 ) in particular affects the fluxes in a nonlinear fashion when the chosen value approaches zero. The carboxylation capacity (V c max ) and quantum yield (α j v ) parameters are weakly nonlinear across the chosen range, and the reference respiration rate (R 10 ) is fully linear.
The nonlinearity in the parameter inversion should in principle be dealt with in the ensemble system as it implicitly linearizes over the full model (H ). However, we could clearly see the effect of imperfect linearization in our results. When we fed the posterior parameter set back into our flux model, and consequently propagated the solution through the RAMS transport model (H (x a )) we did not obtain the distribution of CO 2 mixing ratios that the linearized inverse solution (H x a ) had predicted. Generally, the propagated mean was further away from the observations than the linearized mean, and the propagated standard deviation of the ensemble was larger than the linearized one.
We explored this further with an offline inverse system that had only three parameters, as is further shown in Appendix C. A simple equation in which we varied the degree of non-linearity related the parameters to observations. The parameters were optimized against a truth obtained with one realization of the parameters, and some additional noise. We observed in our simple non-linear system that the propagated posterior parameter spread always correctly included the true parameter value. Additionally, the simplified model showed that introducing a non-linear parameter does not affect the ability of the inversion to return the correct values for the other, linear parameters in the model. Nonetheless, this simplified model showed the same behaviour with a poorer match to observations, and a larger spread in the concentration when using the propagated posterior parameters instead of the linearized model. This degraded performance of the linearized model is caused by the tails of the a-priori parameter probability density function (PDF), which do not follow the linearized propagation of the mean, or values close to that mean. This results in the observed larger spread in the non-linearly propagated CO 2 , and also in the overconfidence of the posterior parameters (the linearized ensemble lacks spread). We found that this effect could be reduced in several ways: (1) by reducing the degree of non-linearity in the model, (2) by starting with a good a-priori parameter value around which the model is linearized, or (3) by limiting the uncertainty of the non-linear parameter to a space where the effect is mostly linear. These three strategies may be generally applicable in future studies attempting non-linear inversions.

Metric 1: domain integrated fluxes
One of the important metrics in evaluating the performance of the inversions is the ability to retrieve the NEE summed over time and space. This is a final goal of the inversions, but not the only one as more detail may be desired, and the summed values may hide opposing errors. These issues are addressed in the other metrics in the next sections. All except one of the inversions have managed to find an improved posterior time average flux for the whole domain (Fig. 4). The two results closest to the truth are from the βRG inversions without correlations (βRG0.0), or with partial correlations (βRG0.5; not shown), followed by the βRGpixel inversion and the parameter inversion. The β NEE inversion was the only inversion that had a worse posterior time mean flux than prior time mean flux for the whole domain.
If we consider the root-mean-square-difference (RMSD) of the true and estimated domain average flux over time, a similar picture emerges: all inversions show an improvement (Table 3, first column). This agreement was expected under the current design of the study, in which many observations were available to constrain the hourly NEE. Again, the βRGpixel and βRG0.0/βRG0.5 perform best, but also the parameter inversion has captured the temporal structure of the domain total flux better than the prior. The β NEE inversion struggles not only to find an improved time mean flux, but also provides a poor match to the hourly domain averaged time series.
The degree of RMSD improvement is rather small, especially considering the large number of observations available. Further investigation showed that the main limitation to improving the flux estimate is the different structure of the a-priori model (5PM) and the truth (FACEM). In the control inversion, where this model structure difference was not present, all methods (including the β NEE inversion) gave much better time mean fluxes, and RMSD of the time series (Table 3). This suggests that synthetic studies using the same land surface model to generate and retrieve fluxes may overemphasize the ability of the model.

Metric 2: fluxes per land-use type
We can further separate the results above into each individual land-use class considered. Figure 4 shows these results. The most striking feature is the lack of improvement in mean NEE for most land-use classes in most inversions. This suggests that overall, the inversions have failed to find a correct distribution of time mean NEE within the domain (see also Fig. 2). The discrepancy is relatively weak for the parameter inversion and the βRG1.0 inversion, which show improved posterior mean β NEE in 3 out of 6 land-use classes. The two inversions that did best on the total domain average NEE (βRG0.0 and βRG0.5) now reveal a lack of improvement within the dominant land-use classes, suggesting Assessing the RMSD of true and estimated flux time series per land-use type is consistent with the domain total picture of the previous paragraph: the parameter inversion and βRGpixel inversion generally show relatively low posterior RMSD and improve in 4 out of 6 land-use classes ( Table 3). The other inversions improve substantially in RMSD in only 2 classes. An example of posterior flux performance for the largest land-use class (Crops1) is shown in Fig. 5. It shows that poor RMSD is caused mostly by an inability to capture the difference in the daytime signal at the different days, and that nighttime NEE is poorly simulated in most of the inversions.
Considering that the instantaneous fluxes per land-use class are most closely connected to the CO 2 mole fraction observations, the performance of the inversions is disappointing, and also alarming. Also in this metric the performance of the same inversion methods against the control fluxes is much better, and agrees with the expectation that domain total and land-use fluxes improve in time (both flux average and RMSD; Fig. 4 and Table 3).

Metric 3: uncertainty estimate
The contrasting performance of most inversions for individual land-use classes (bad) versus the domain integral (reasonable) suggests significant spatial correlations between classes, with canceling flux errors. Here we investigate the posterior uncertainties.  The inversions based on multiplication factors per landuse class (β NEE and βRG) all produce posterior uncertainties that are too small. This means they do at most locations not encompass the true flux within one or two standard deviations (Fig. 6). This might have been expected since the degrees of freedom in the inversion are much smaller (∼6-62) than the number of observations assimilated (∼336). However, the χ 2 of innovations suggests a fair balance between CO 2 residuals (y −H (x b )) and prescribed uncertainty (HPH T + R) and the posterior uncertainty does not scale directly with the degrees of freedom in each inversion. Thus, the relatively large number of observations does not seem the main cause of too low posterior uncertainty.
In fact, when we reduced the number of assimilated observations to 1 per day we saw only minor effects on the posterior variances. Only if we reduced the number of observations and additionally increased the model-data mismatch did we see an increase in posterior variances for the individual land-use classes. But even then the domain total variance remained much too small to accommodate the large difference in flux mean. The small posterior uncertainty is thus not simply an artifact of the inverse method setup.
If we consider the control inversion, the posterior flux estimates are much better for all inversions, and 4 out of 6 inversions report ±1 sigma uncertainties that include the truth (Fig. 4). This again points at an important role of model structure in determining the outcome of an inversion, in this case leading to overconfidence that the true value is retrieved.
The parameter and the βRGpixel inversions generally produce error estimates that are conservative and more realistic compared to the other inversions, in the sense that the posterior estimate (Fig. 6) is in large parts of the domain within 2 sigma of the truth and they encompass the truth within ±2.5 sigma in 5 out of 6 land use classes. Note that in the 6th land use class (rest) also the prior uncertainty is too low, and a realistic posterior uncertainty could therefore not be expected for this land use class.
Posterior covariances are large in all inversions. In the βRG inversions the covariances occur between β parameters for R and GPP. The covariances reveal an inability to separate the effect of photosynthesis and respiration based on CO 2 alone. This was demonstrated in earlier work too (Ahmadov et al., 2009;Schuh et al., 2009). This type of Atmos. Chem. Phys., 11, 10349-10365, 2011 www.atmos-chem-phys.net/11/10349/2011/ covariances is more pronounced than those between β's for different land-use classes.

Metric 4: match to CO 2 mole fractions
All six inverse methods reduce the mismatch between pseudo-observations and simulated CO 2 mole fractions for those observations that were assimilated. This is expected from an inverse calculation. Not all methods obtain equal RMSD though, as each inversion adjusts fluxes differently. Table 4 shows that the posterior RMSD in CO 2 mole fractions is largest for the β NEE optimization and smallest for the βRGpixel optimization. The χ 2 of innovations (Table 1), which indicates the balance between a-priori mismatch and assumed uncertainties, is smaller for the β NEE and βRG1.0 inversion than for all other optimizations per ecoregion. This is because these two inversions had a much larger uncertainty in hourly NEE than the other inversions (necessary to maintain the same uncertainty over the full period, see Sect. 2.5), translating to more simulated uncertainty in CO 2 mole fractions.
More interesting is the comparison to CO 2 mole fractions that were not assimilated as they show the performance against independent data. The second column of Table 4 shows that only three inversions perform better when assessed against the full CO 2 time series, while for three others it deteriorates. This is an important result that suggests that the estimated NEE field reflects only a limited part of the CO 2 mole fraction time series, as a result of the daytime-only subsampling we used.
Moreover, the β NEE and βRG1.0 inversion -that improve RMSD of non-assimilated CO 2 -show the opposite result in the control inversion, suggesting that the improvement in the real inversion was fortuitous. This leaves only the parameter inversion to improve the RMSD of assimilated and independent CO 2 observations, both in the control inversion and the real inversion.
For the βRG inversions the good performance on the temporal structure of NEE (Tables 3 and 4) contrasts with the poor RMSD of the non-assimilated CO 2 observations . The reason is that the posterior nighttime flux is simulated poorly which does not affect the RMSD of fluxes much (the prior was also rather poor at night, and the nighttime NEE is relatively small compared to the daytime NEE), but is strongly amplified in CO 2 concentration space due to the shallow nighttime boundary layer. Now, the posterior variances in β resp and β GPP mentioned in the previous paragraph really come to expression: they give an acceptable aggregated flux (NEE) but at the expense of incorrect nighttime CO 2 mixing ratios. The propagation of this incorrect nighttime CO 2 signal through the domain will likely result in compensating signals further downwind, or later in time. These effects are not visible yet in our short experiment though.
The results above suggest: (1) that nighttime CO 2 observations are needed to separate respiration from photosynthesis fluxes, and (2) that interpretation of these observations depends critically on the adequate simulation of the nocturnal boundary layer.

Limitations of the inversions
The NEE inversion is the most simple inversion method tested in this study, the number of degrees of freedom is with 6 the smallest compared to the other inversion methods (Table 1). The results of this inversion did not correctly fit with the true fluxes and were overconfident. This inversion appeared to lack the flexibility to capture the correct fluxes. In the other inversions tested in this study the flexibility is increased in a number of ways indicating the limitations of the inversion. In the βRG inversions the flexibility is increased compared to the β NEE inversion as the respiration and GPP can be separately optimized. These βRG inversions provided a better estimate of the domain total fluxes than the β NEE inversion (Fig. 4). The different covariance strengths between β resp and β GPP that were applied resulted in different d.o.f's. The performance of the inversions with 9 and 11 d.o.f. were comparable.
In the control inversions, where the truth fits the biosphere model inversions, the βRG inversion performed correctly improving the fluxes and the RMSE, for both the domain total fluxes and the fluxes per ecoregion, with a realistic uncertainty estimate (e.g. Fig. 4a). However, when the structure of the truth and the prior were non-similar, the flux distribution over space and time was not well captured. The flexibility of the system in time and/or space appeared to be limiting for the inversion. Below, first the temporal limitation of the βRG inversion is discussed based on an analysis of the results of the βRG inversion and the more flexible in time parameter inversion, and secondly the spatial limitations of the inversion are discussed based on the results of the pixel inversion.
The βRG inversion performed badly when looking at the CO 2 mixing ratio RMSE for all hours (Table 4). This is caused by a difficulty in capturing the nighttime fluxes (Fig. 5). The CO 2 mixing ratios during the day, which are used to constrain the fluxes, contained a mixed signal of the respiration and GPP fluxes. The results suggest that the difference between the temporal structure of the truth and the prior caused the respiration fluxes to be altered by the inversion to overcome the difference in the flux structure during daytime, but thereby the nighttime flux estimate was worsened (Fig. 5). In an additional test, in which the inversions were performed for three days only instead of a 15 day period the effect on the nighttime fluxes was smaller. In these three days the weather was steady anti-cyclonic. This sug-gests that the structure mismatch of the fluxes between the prior and the truth over the 15 days time period due to differences in the response patterns to cyclonic and anti-cyclonic weather influences the estimate of the fluxes, next to the influence of the different diurnal pattern of the fluxes between the prior and the truth In the parameter inversion, the flexibility to change the fluxes in time is increased. In this inversion four instead of the previous one (in the β NEE inversion) or two (in the βRG inversion) parameters can be altered by the inversion for each ecoregion, which means a d.o.f. of 22. The additional temporal flexibility improved the results and was able to avoid the incorrect change in the nocturnal fluxes that was seen in the βRG inversions (Fig. 5, Table 4).
In the inversions where the structure of the truth did not fit the structure of the prior, the βRG inversions showed a lack of improvement within the dominant land-use classes. This could be caused by the mismatch of variability of the fluxes within one land use type. Therefore, in the βRGpixel inversion, the flexibility of the system to alter the fluxes in space was increased. Instead of one adjustable β resp and β GPP per landuse type, these parameters could be optimized for every pixel, with a correlation length scale of 100 km, which increased the d.o.f. to 62. Despite the increased flexibility, the inversion still could not capture the fluxes per land use type. Nonetheless, the βRGpixel inversion provided acceptable results because it was realistic in its uncertainty estimate, only changing the fluxes in a more limited part of the domain.
The fact that increasing the temporal flexibility in the parameter inversion and the spatial flexibility in the pixel inversion does not fully solve the problems faced in the inversions suggests that both temporal and spatial structure differences between the truth and the prior were limiting to obtain results at a smaller scales than a few hundred kilometer. For the coarser scale, which is in this inversion the full domain, Atmos. Chem. Phys., 11, 10349-10365, 2011 www.atmos-chem-phys.net/11/10349/2011/ the inversions performed well despite these limitations. The structure mismatch is thus limiting for the detailed analysis, and not so much for the aggregated results.

Discussion
With the steady expansion of (continuous) CO 2 observation sites comes a tendency to estimate carbon exchange patterns at increasingly higher resolutions. Inversion methods are thereto equipped with state-of-the-art meso-scale transport models and detailed ecosystem models. The methodology to optimize carbon exchange at regional scales is often transferred from existing global systems, thereby inheriting their known strengths and weaknesses. At these regional scales however, other considerations come into play that potentially turn weaknesses into strengths and vice versa. An example from this work is the scaling of the diurnal cycle amplitude in the NEE inversion. This was a rather robust way to maintain a balance between respiration and photosynthesis over long time periods in global inversions , but seems to fail on the smaller scales. We suspect that this is caused by the dominance of the CO 2 diurnal cycle as a source of information for the inversion, and the sensitivity of the inversions to changes in the response to short term weather influences, while in continental scale inversions the average CO 2 mixing ratios were more controlling.
In contrast, we see that at the regional scale the shortcomings of biosphere model structure is expressed more strongly than at continental scales. The potential to alias CO 2 signals onto the wrong parameter because they are simply not reproducible with the prescribed structure was demonstrated and discussed convincingly in Carvalhais et al. (2008). Our study corroborates their conclusions, and shows that spatial mismatches in model structure can lead to incorrect mean flux estimates, with error bars that are overconfident. Since there is currently no metric, nor a place in the inversion for this type of error to be included, we suggest that model structure is assessed critically when optimizing biosphere model parameters or precalculated flux patterns on regional scales. Comparison of resulting fluxes and uncertainties against independent data (i.e. not used on the inversion) is one way to detect model structure errors for inversions using real data (e.g. Lauvaux et al., 2009).
In our assessment of the inversions based on biosphere model parameter optimizations we have seen an important role of non-linearity in the model equations, similar to previous studies (Trudinger et al., 2007;Scholze et al., 2007;Rüdiger et al., 2010). In this regional optimization based on CO 2 mixing ratio observations, we noticed that the nonlinear model parameters in particular were difficult to constrain. To use them correctly, they required a good firstguess, a small uncertainty, or a full non-linear model propagation of the solution (rather than a linearized one). These parameters were often also the ones least constrained by day-time atmospheric CO 2 , and thus likely suffered from the specific setup of our experiment. Since the estimation of nonlinear parameters did not affect the retrieval of the linear ones, we suspect that a different setup (other observations, such as water, energy, and CO 2 fluxes and isotopes, more temporal constraints) might perform better.
In contrast, estimates of bias scaling factors on photosynthesis and respiration remains linear, depend less on model structure, and have more freedom to use the diurnal cycle information on regional scales. In our studies it also proves a good alternative to the NEE scaling and the biosphere model parameter optimization. Also here, the daytime CO 2 sampling scheme used makes it difficult to confidently separate the two processes. Simply including night time CO 2 mixing ratio observations is however not an option, because of the limited skill of transport models to simulate the stable boundary layer (Tolk et al., 2009;Gerbig et al., 2008;Law et al., 2008;Steeneveld et al., 2008). Also, there is a large potential for erroneous photosynthesis and respiration bias scaling factors to propagate in time, and destabilize the inversion after a few weeks. The short time window used in this study does not incorporate this complication.
Perhaps the most important result from our pseudo-data tests is that despite the relatively large number of observations, the high resolution of the (perfect) transport model, and the increased freedom to fit spatiotemporal flux patterns, we still have not achieved a correct estimate of carbon exchange at the local scale. Similar to previous studies (Carouge et al., 2010;Schuh et al., 2009;Gerbig et al., 2006;Ahmadov et al., 2009), we find that significant aggregation of results is needed to come to robust numbers. The aggregation scale is on the order of 100 × 100 km as in previous work. This suggests that the simple translation of methods from the large scale to the small scale might not be sufficient. A re-evaluation of inversion methods might be needed, with an eye for nonlinear behavior, model structure, and multiple constraints. In that respect, recent work where model mean structure is relaxed in favor of extensive covariance structure based on multiple auxiliary datasets (Michalak et al., 2004;Gourdji et al., 2010;Yadav et al., 2010) is of great interest.

Conclusions
We started this paper asking which of six inversion approaches is the most suited for a regional inversion, and what the pitfalls are of each method. From our analysis we have learned that: -With prior fluxes that have the same structure as the true fluxes, all inversion methods improved the estimate of the NEE, both for the domain total fluxes as for the fluxes per ecoregion.
-When the structure of the priors differed from that of the truth, the full domain estimates improved with all inversions except for the β NEE option, but all inversion approaches had difficulties in obtaining the fluxes per ecoregion.
-Model structure is therefore an important consideration for inverse estimates that can lead to incorrect spatiotemporal patterns of fluxes, and overconfidence in posterior results. An assessment of model structure error, and its inclusion in the quoted uncertainty would make any regional inversion more plausible.
-Inversions that scale NEE from prescribed spatiotemporal patterns are most susceptible to these errors (which include aggregation errors), and perform worst in the realistic tests presented. We do not recommend using this method for regional NEE estimates.
-Inversions that separately estimate photosynthesis and respiration perform better on NEE, at least on these short time scales, even though they cannot obtain realistic gross flux estimates, which might lead to problems later. We recommend to use them only if the realism of the gross fluxes can be assessed after the inversion, or maintained by other means such as through nighttime observations of fluxes or CO 2 mixing ratios.
-The results with the smallest deviations from the pseudo-truth over all metrics were obtained when the land-use class concept was applied least strictly by allowing spatial variations in bias corrections on gross fluxes (RGpixel), or when the bias parameter approach was abandoned altogether such as in the parameter inversion. Nonetheless, also these inversions had difficulties in estimating the specific fluxes per ecoregion.
-The parameter optimization approach has some appealing features. However, it can only be used if the nonlinear behavior of the system is dealt with.
-When optimizing non-linear parameters we recommend to (a) start from a good a-priori mean estimate, (b) keep the uncertainty on the parameter small, and (c) check posterior results carefully using the full nonlinear model.

Appendix A Ensemble Kalman filter method
All the inversions are performed with the Ensemble Kalman filter (EnKF). In this Bayesian approach the optimum value between the prior knowledge and the information in the observations is established by minimizing the cost function: In which x denotes the state vector (the biospheric parameters or the β's in the inversions), y the observation vector (CO 2 mixing ratios), P the error covariance matrix of x prior , R the error covariance matrix of the observations. H is the observation operator, that contains the influence of the variables in the state vector (x) on the CO 2 mixing ratio at the observation locations. The optimum posterior state vector that minimizes the cost function, and its error covariance matrix are: where K is the Kalman gain matrix: In the Ensemble Kalman Filter method the information in the error covariance matrix P, and its projection in observational space HP and HPH T are not calculated based on independently determined H and P matrixes. Instead of the full calculation, an ensemble of state vectors that represent the statistical properties of x prior and P prior is used. Normally this is done to reduce the size of the matrixes, which may become very large if the amount of unknowns is large. Here, the Ensemble Kalman Filter is applied because of another advantage: this method of directly calculating HP and HPH T from x and H (x) allows the use of a non-linear relation between the parameters (x) and the CO 2 mixing ratios (H(x)) as is the case in the parameter inversion.
The ensemble of state vectors, with N ensemble members, was created such that the normalized ensemble of deviations define the columns of matrix X (Peters et al., 2005): which is the square root of the covariance matrix: In the ensemble members x contains the parameter values of the biosphere model in case of the parameter inversion, or the multiplication factors β RESP and β GPP in the βRG inversions, or the multiplication factor β NEE in the β NEE inversion. All inversions are performed with ensembles containing 100 ensemble members. For each ensemble member the corresponding CO 2 mixing ratios at the observation locations were calculated. This is done in the coupled biosphereatmosphere model (5PM+ coupled to B-RAMS3.2). Thus an ensemble of CO 2 mixing ratios was created: From the ensemble of state vectors (X) and the ensemble of corresponding CO 2 mixing ratios (H(X)) the Kalman gain Atmos. Chem. Phys., 11, 10349-10365, 2011 www.atmos-chem-phys.net/11/10349/2011/ matrix, and the posterior optimized values including their uncertainty was calculated with Eqs. (A2-A4) combined with (Peters et al., 2005): and Due to the use of an ensemble instead of the full KF, small extra covariances may be created. The impact of these small artificial covariances on the inversion result can be diminished by localization. In this study we used the localization method and values established in Zupanski et al. (2007), a threshold value for the ratio between the prior uncertainty and the posterior uncertainty of 1.05 was applied.
In the control inversions in this study the observation vector (y) was created based on one realization of the parameter state vector by selecting one of the columns of ensemble X. In the other inversions the observation vector (y) was based on the hourly NEE flux fields created by the biosphere model FACEM (Pieterse et al., 2007). In both cases the corresponding CO 2 mixing ratios were calculated with the atmospheric model RAMS in which the sea, the fossil fuel, and the boundary fluxes of CO 2 were kept constant.
The size of the state vector differed in the different inversion methods, in the parameter inversion it had a dimension of 24 (4 parameters times 6 land use classes), in the βRG inversions its dimension was 12 (twice the number of land use classes), in the β NEE inversion its size was 6 (once the number of land use classes) and in the βRGpixel inversion its size was 2218 (twice the number of land pixels). The dimension of the observation vector was the same in all inversions with 336.
For the parameter inversion, βRG0.0 and β NEE inversions all off-diagonals in P prior were zero. Additional inversions were performed with a correlation between β resp and β GPP of 0.5 and 1.0. In this case all off-diagonals were zero except the ones denoting the correlations between β resp and β GPP of the same land use type. Also in the pixel inversions only correlations within one land use type are applied with correlations calculated based on distance (D) with a length scale (L) of 100 km: By the way, a comparable correlation length (130 km) was found for the prior-truth residuals. The means and variances for the parameter inversion are prescribed based on Tolk et al. (2009). The other inversions use an ensemble of β's with mean one and variances which are scaled to achieve the required similarity between the inversions (Sect. 2.5). First the uncertainty related to the respiration fluxes on the one hand and to the GPP fluxes on the other hand were separated. This was done by running the biosphere model with two different ensembles: (1) containing only variations in the parameters determining respiration and (2) containing only variations in the parameters determining GPP. To convert this to the variance related with the β factors, each ensemble member is scaled with the flux per ecoregion, separately for respiration and GPP in the βRG inversions and for NEE as a whole for the βNEE inversion. This ensures that the ratio between the uncertainty in respiration and GPP per ecoregion is the same in all inversion options. In the inversion where applicable correlations were added to P. The new variances were subsequently scaled with a multiplication factor per ecoregion, with the same multiplication factor for β resp and β GPP . These multiplication factors were chosen such that the uncertainty in NEE integrated over ecoregion and time became the same in all inversion options, taking into account the correlations between respiration and GPP in the βRG0.5 and βRG1.0 options and the reduced correlation within one ecoregion in the pixel inversion.

Appendix B
Biosphere model 5PM The biosphere model used in this study to calculate the prior NEE fluxes is 5PM (Groenendijk et al., 2011) extended with the use of LAI to upscale the fluxes from leaf to the canopy scale. In this model the photosynthesis is calculated based on the Farquhar model (Farquhar et al., 1980) and heterotrophic respiration is based on the relationship by Lloyd and Taylor (1994).
In the Farquhar approach assimilation of CO 2 by the vegetation is either limited by the amount of radiation or by the availability of the enzyme Rubisco, which is involved in the conversion of CO 2 into glucose and oxygen. Photosynthesis is formulated as the minimum of the light limited (w j ) or enzyme limited assimilation rate (w c ), corrected for the maintenance respiration of the vegetation (R d ): The assimilation rate depends on the CO 2 concentration inside the leaf available for photosynthesis (C i ), the internal oxygen concentration (O), the compensation point for CO 2 ( * ) and the Michaelis-Menten parameters for CO 2 (K c ) and O 2 (K o ). The latter are temperature dependent. The first option, Rubisco-limited assimilation is calculated as: where V c max is the maximum carboxylation capacity (µmol m −2 s −1 ). The second option, light limited assimilation is calculated as: www.atmos-chem-phys.net/11/10349/2011/ Atmos. Chem. Phys., 11, 10349-10365, 2011 where J is the electron yield, specified by: in which I PAR is the photosynthetic active radiation (µmol photons m −2 s −1 ), J m the maximum potential electron transport rate (µmol m −2 s −1 ) and α the quantum yield (mol mol −1 ). Assumed is that the plants aim for an optimum between the energy allocated to the potential electron transport rate and to the carboxylation capacity, and J m is linked to V m by (Collatz et al., 1991): Leaf internal CO 2 is estimated with the method described in Arneth et al. (2002) in which the value for lambda was kept constant at 700 mol mol −1 . The atmospheric CO 2 mixing ratio is assumed to be 380 ppm during photosynthesis. Integration of the photosynthetic flux to the full canopy is based on MODIS leaf area index (LAI) observations (Sellers et al., 1996): where A is the assimilation, subscript n0 refers to leafs at the top of the canopy, subscript "c" refers to total canopy and k is the PAR extinction coefficient. Respiration is calculated with the temperature dependent relationship by Lloyd and Taylor (1994): where R 10 is the respiration rate at a reference temperature of 10 • C, E 0 is the activation energy divided by the universal gas constant, T 0 is a constant of 227.13 K and T is soil temperature.
In a previous study (Tolk et al., 2009;Groenendijk et al., 2011) the main parameters of this model (V c max , α, R 10 and E 0 ) were optimized for the full canopy based on a large number of Fluxnet observations (Baldocchi et al., 2001). To determine our prior fluxes, we applied parameter values optimized for the temperate zone, for the period of May-July for all years (Tabel B1).
The relationship between the parameters in the biosphere model and the NEE fluxes is for most parameters non-linear. The strength of this non-linearity was estimated for each parameter, by running the biosphere model with ensemble of Gaussian distributed parameter values and showing the accompanying distribution of the NEE fluxes (Fig. 3). For all parameters except R 10 it was shown that the resulting NEE distribution was non-Gaussian, affecting the inversions if the prior had a large deviation from the true values.

Simple non-linear model
To test the impact of non-linearity on the inversions we applied simple forward models that calculate a series of observations from a triplet (a, b, c) of arbitrary parameters. Subsequently, several inversion methods were used to estimate (a, b, c) from the observations: a regular minimum least-squares without priors, a full Bayesian solution with a Kalman filter, a serial ensemble KF, and a matrix based ensemble KF. In the Bayesian solutions, the three parameters had prior values that varied between realistic and unrealistic relative to the truth. Also, we varied the degree of nonlinearity in the forecast model (from fully linear to strongly nonlinear).
We applied all methods to a fully linear problem first, and confirmed that each estimation method gave the same (correct) result as long as enough observations were available. Prior values that were reasonably chosen (i.e. with enough uncertainty to accommodate the truth) were thus correctly modified. The statistics of the posterior solution were also as expected: uncertainty on all 3 parameters was reduced in accordance with the specified noise, and the propagated posterior solution satisfied the observations to within the specified uncertainty.
Next, we applied the ensemble KF (the only system to handle nonlinear problems) to the nonlinear function f (x) = a + sin(b)x + cx 2 . We noticed here first that the mean of the linear parameters a, c was estimated correctly, but the mean of the nonlinear one (b) was not. Uncertainty on the nonlinear parameter was also miscalculated: the truth was far outside the posterior error. Moreover, we noticed that if we placed the posterior parameter values back in the nonlinear model, the match to the observations was much worse than the statistics of the filter suggested. This was because the linearization that is contained in the ensemble method was not Atmos. Chem. Phys., 11, 10349-10365, 2011 www.atmos-chem-phys.net/11/10349/2011/ able to overcome the nonlinearity and therefore mispredicted the mixing ratios and their spread given a set of parameters.
In Fig. C1 we show this feature for the typical nonlinear estimation problem as above. The yellow line is the true function, which. This we "observe" at the 20 black dots, and then we add a little bit of noise. The noisy observations are fed to the ensemble KF to estimate the 3 parameters (a, b, c) that underly the yellow curve. The red line (±1 standard deviation) is the match to the observations that the system thinks it will achieve given would correspond to the ensemble statistics and its posterior estimate. This coincides with the true curve in yellow. But when the posterior values (a, b, c) are fed into the function f (x), the blue curve is the actual result. This is much less accurate than was predicted by the filter, and actually outside the specified uncertainty range.
The above problem could be remedied through the exact solutions we suggest in the main text: making the more problem more linear, starting from a better prior, or reducing uncertainty on the nonlinear parameter. The figure moreover led us to suggest to always use the full nonlinear model to check your the accuracy of the result, rather than the filter statistics.