Atmospheric Chemistry and Physics Vertical Mixing in Atmospheric Tracer Transport Models: Error Characterization and Propagation

Imperfect representation of vertical mixing near the surface in atmospheric transport models leads to uncertainties in modelled tracer mixing ratios. When using the atmosphere as an integrator to derive surface-atmosphere exchange from mixing ratio observations made in the atmospheric boundary layer, this uncertainty has to be quantified and taken into account. A comparison between radiosonde-derived mixing heights and mixing heights derived from ECMWF meteorological data during May–June 2005 in Eu-rope revealed random discrepancies of about 40% for the daytime with insignificant bias errors, and much larger values approaching 100% for nocturnal mixing layers with bias errors also exceeding 50%. The Stochastic Time Inverted Lagrangian Transport (STILT) model was used to propagate this uncertainty into CO 2 mixing ratio uncertainties, accounting for spatial and temporal error covariance. Average values of 3 ppm were found for the 2 month period, indicating that this represents a large fraction of the overall uncertainty. A pseudo data experiment shows that the error propagation with STILT avoids biases in flux retrievals when applied in inversions. The results indicate that flux inversions employing transport models based on current generation meteorological products have misrepresented an important part of the model error structure likely leading to biases in the estimated mean and uncertainties. We strongly recommend including the solution presented in this work: better, higher resolution atmospheric models, a proper description of correlated random errors, and a modification of the overall sampling strategy .


Introduction
Exchange of CO 2 and other greenhouse gases between the surface and the atmosphere leaves atmospheric signatures behind that can be used to retrieve information about the surface fluxes.On regional scales, at which climate anomalies (droughts, anomalies in rainfall, temperature, etc.) as well as human intervention (land use change) influence biosphereatmosphere exchange, such information is valuable for investigating biosphere-atmosphere feedback processes.Further, regional scale quantification of greenhouse gas budgets is a requirement for any carbon trading, such as is currently being implemented under the Kyoto protocol.Regional scale budgets have therefore become a research focus (Dolman et al., 2006;Lin et al., 2006;Wofsy and Harriss, 2002).
For CO 2 , biosphere-atmosphere fluxes can be assessed by a range of methods -each covering specific spatial scales, including eddy-covariance measurements (Baldocchi et al., 2001) with flux-footprints extending over ∼1 km 2 , remote sensing driven diagnostic light use efficiency (LUE) models (Lin et al., 2006;Running et al., 2004) with resolutions of several hundred meters, more process based biosphere models covering multiple scales (Moorcroft et al., 2001;Running and Hunt Jr., 1993), but also atmospheric inversions of measured trace gas mixing ratios (the so called top-down method).Inversions of background stations using global transport models are assumed to constrain fluxes on scales of several thousands of km (Gurney et al., 2002).Measurements from tall towers over continents are claimed to represent areas of roughly about 10 6 km 2 (Gloor et al., 2001), but this strongly depends on the spatial and temporal scales of the flux information to be retrieved (Gerbig et al., 2006).However, when combined with prior flux information for example from eddy flux measurements and remote sensing, regional scale inversions start to become feasible (Matross et al., 2006).
Published by Copernicus Publications on behalf of the European Geosciences Union.Measurements of atmospheric mixing ratios can provide strong constraints, but this puts strong demands on the accuracy of atmospheric transport modelling.In case of CO 2 one is for example interested in small imbalances of otherwise large fluxes of opposing sign, ecosystem respiration R and photosynthesis GEE.During growing season the net ecosystem exchange NEE (=R+GEE), i.e. the biosphereatmosphere flux, has a diurnal amplitude that is already about an order of magnitude larger than the diurnal average, and at least 2 orders of magnitude larger than the annual to decadal imbalances that contain the most relevant information on the processes involved in climate change (Goulden et al., 1996).Thus the transport model needs to provide a very tight relationship between fluxes and concentrations, with resulting biases over the relevant timescales (annual) of only a few percent or less of the corresponding fluxes.Similarly, strong spatial variability in biospheric fluxes causes strong variability in mixing ratios (Gerbig et al., 2003a;Lin et al., 2004a).This variability has to be represented in transport models with very little bias in order to utilize the information contained in atmospheric point measurements.
This requires uncertainties in atmospheric transport to be investigated quantitatively.So far, uncertainties in transport have been investigated mostly through model intercomparison studies such as Transcom 3 for global scale transport (Gurney et al., 2003).However, the tempting assumption that the spread of the model-ensemble represents the true uncertainty in the transport is false: there are many common sources of uncertainty in different models due to the large similarity in spatial discretisation and subgrid parameterizations (IPCC, 2001).Furthermore, there is the natural tendency in modelling communities to "improve" models that are outliers rather than those close to the average, which reduces the range of the ensemble.Other methods to derive uncertainties in modelled transport use a direct comparison of modelled mixing ratios with measurements (Mahowald et al., 1997), however residuals between modelled and measured values are also caused by uncertainties in the surface-atmosphere fluxes themselves.
Attempts to separate the influence from transport uncertainty as short time scale variability in the residuals from the longer time scale residuals that are assumed to be caused by uncertainties in the targeted fluxes are also questionable: covariance of the uncertainty in transport with variability of the fluxes can lead to biases when aggregating to longer time scales.A famous example for this is the so called "diurnal rectifier effect" (Denning et al., 1996), where the overestimation of mixing height during night, when CO 2 is released due to respiration, combined with little or no overestimation of daytime mixing heights, when CO 2 is taken up by photosynthesis, leads to strongly biased (underestimated) 24 h averaged near-surface concentrations.
First attempts to directly quantify transport uncertainties based on uncertainties in winds have been made by Lin and Gerbig (Lin and Gerbig, 2005), where wind errors including their spatial and temporal correlations, derived from model -radiosonde comparisons, have been propagated using the STILT model (Lin et al., 2003) to yield uncertainties in simulated CO 2 .The approach utilized the ability of the Lagrangian particle dispersion model STILT to model ensembles that correspond to not just turbulence, but also to wind errors.These uncertainties due to advection errors amounted to 5 ppm during a summer period with active vegetation, largely exceeding measurement uncertainties, which currently are targeted at 0.1 ppm.However, this is by far not the only uncertainty in transport modelling.As mentioned above in reference to the rectifier effect, the imperfect representation of vertical mixing processes near the surface can cause significant biases.Mixing within the planetary boundary layer (PBL) vertically redistributes the influence from surface fluxes to an atmospheric column, whose thickness is described as mixing height z i .Uncertainties in this scale proportionally affect the transport from a source at the surface to a measurement site located within the PBL.
This paper addresses uncertainties in transport related to this uncertainty in vertical mixing.Recently a comparison with airborne CO 2 measurements revealed that all models used for global scale inversions, at least the ones investigated, misrepresent vertical mixing, since none was able to simultaneously reproduce the annual average and the seasonal cycle of measured vertical gradients (Stephens et al., 2007).An important part of this misrepresentation is probably the depth of the mixing layer.It has been widely discussed that the determination of mixing heights is associated with significant uncertainty, see for example the review by Seibert et al. (2000).
Here, in an approach similar to Lin and Gerbig (Lin and Gerbig, 2005), we use comparisons of model and radiosonde derived mixing heights to investigate uncertainties and their spatial and temporal covariances.This information is then used to propagate the uncertainty into mixing ratio uncertainties using STILT, and a pseudo data experiment is set up to test the usefulness of the approach for atmospheric inversions.For the experiment the STILT model is set up for a domain covering most of Europe and for the time period May-June 2005 that covers the CarboEurope Regional Experiment Strategy (CERES) (Dolman et al., 2006).
The outline of this paper is as follows: the methodology is presented in the next chapter, starting with the analysis of mixing height uncertainties (Sect.2.1) and spatio-temporal covariances (2.2), followed by the error propagation using STILT (2.3).Results as well as the application to atmospheric inversions in form of a pseudo data experiment are presented in Sect.3, and are discussed in Sect. 4 with some recommendations for dealing with the uncertainty in mixing.

Analysis of uncertainties in mixing heights
Offline transport simulations use profiles of temperature, humidity, and horizontal winds from forecasted or analyzed meteorological fields to determine the profile of turbulent mixing within the boundary layer, or simply to determine the mixing height, which can be regarded as the altitude up to which surface fluxes are mixed on short (hourly) timescales.For a convective boundary layer this mixing height is equivalent to the mixed layer height, which can be readily diagnosed from temperature and humidity profiles.Under stable conditions, mixing is often incomplete, making it more difficult to derive mixing heights.A method that has been suggested for both, stable and convective boundary layers is the bulk Richardson number method (Vogelezang and Holtslag, 1996).In order to assess the quality of mixing heights as they are use in offline transport models, we compare mixing heights diagnosed from analyzed meteorological fields with mixing heights diagnosed from radio soundings.Here we analyze mixing heights z i (RS) derived from radiosonde data from May and June 2005 for temperature, humidity, and winds (http://raob.fsl.noaa.gov/)from 98 radiosonding stations using the bulk Richardson number method with a critical Richardson number of Ri c =0.25.This radiosonde based estimate is compared to mixing heights z i (ECMWF) derived from short-term forecasted data from the ECMWF (12 h and 24 h, hybrid-level output), fields which are used in many transport simulations in order to simulate global and regional transport.The same method was applied to both datasets (radio soundings and ECMWF fields) to avoid any methodological bias.This largely isolates the effect of a chosen approach to derive mixing heights (which can have large biases e.g. during night time stable conditions) from effects due to differences in the meteorological profiles themselves (which will cause differences in vertical mixing in offline models, no matter how good the diagnostic method is).The ECMWF profiles are interpolated to the location of the radio soundings.When the bulk Richardson number method did not find a stable layer starting from the surface upward, the observation was not used.Thus only easily identifiable situations were compared.
The general patterns of mixing heights show agreement between radiosonde and ECMWF derived values for daytime (between 11:00 and 17:00 GMT, mostly at 12:00 GMT), with low mixing heights over oceans and locations with oceanic influence, and high mixing heights for dry and hot regions (Fig. 1).However, a closer look at the differences (Fig. 2) shows both bias and random differences between the two datasets.Relative biases, i.e. the mean of the differences between mixing heights based on ECMWF fields and those based on radio soundings normalized by the mean radiosonde derived mixing heights, z i (ECMWF)-z i (RS) / z i (RS) , calculated for each station, are in the range of ±20% for most stations in central and western Europe, except for a few stations mostly located at coastal sites.Relative random errors stdev(z i (ECMWF)-z i (RS))/ z i (RS) are in the general range from 25% to 50%, with some stations in coastal areas showing random errors of 80% and more.Overall statistics of the differences indicate large differences also for nighttime data (Table 1), with biases of 50%, and standard deviations of the residuals nearly approaching 100%.Excluding coastal sites from the analysis has only a minor impact (see Table 1, "selected").
We need to exclude trivial causes for the observed differences, namely insufficient vertical resolution in both, radiosonde profiles and ECMWF fields.Most radiosonde profiles contain more than 30 levels below 3 km, i.e. the error due to regridding can be estimated to less than 50 m.To properly test if vertical resolution of the radiosonde data poses a problem, we compared z i (RS) based on standard radiosonde data to z i (RS hr ) based to high vertical resolution (∼10 mm) radiosonde data obtained from UKMO via the  1) indicate that vertical resolution in the radiosonde data can be excluded as a dominant cause for the discrepancy between radiosonde and ECMWF data derived mixing heights.Somewhat of an exception are nighttime random differences, where resolution can explain about half of the observed discrepancy.This relates to the low nighttime mixing heights, in comparison to which the vertical spacing and thus resolution becomes more important.Vertical resolution of the ECMWF fields used for the analysis can similarly be excluded as a major cause for the discrepancy, since the profiles have about 17 levels below 3 km, with increasing density near the surface providing a spacing that starts at less than 50 m.
What remains as a possible cause is that the way the weather prediction model assimilates temperatures, humidity and winds measured by radiosondes does not ensure the same shape in the vertical profile.For example, the level at which an inversion is found in the assimilated data (and thus also in forecasts) is not necessarily the level at which an inversion was observed.Specific reasons are probably imperfection of the boundary layer scheme within the ECMWF model, land surface model, or also soil moisture fields.
For the rest of this paper, we regard the random component of the residuals between the more model based z i (ECMWF) and the measurement based z i (RS) as an estimate of the relative error in mixing heights, i.e. σ Zi,rel =stdev(z i (ECMWF)z i (RS))/ z i (RS) .Further, we neglect the bias component due to its relatively small size compared to the random part.

Spatial and temporal covariances
For atmospheric transport of tracers it is not only important to quantify the error in mixing heights at a specific location and time, but it matters how these errors are spatially and temporally correlated.A long correlation would cause bias errors in the source-receptor relationship for larger regions or for longer time periods.In order to assess the temporal and spatial scale over which the random error in mixing height is correlated, a variogram analysis was performed, similar to the analysis of wind errors in (Lin and Gerbig, 2005).A variogram is the variance of the difference of a spatial variable, i.e. var(R(x)-R(x+h)) for the variable R, as a function of the distance h (Cressie, 1993).The variogram and the covariance differ only in sign and a constant, namely the variance of the variable itself.Here we use the variogram of the residuals R=z i (ECMWF)-z i (RS).Due to the large difference in mixing heights a separation into day and night time was necessary.As expected, the difference in residuals increases with increasing distance for both, day and night (Fig. 3), with smaller variogram values indicating spatial correlation for short distances.In order to estimate the associated correlation length scale, an exponential variogram model was fitted.The exponential variogram is the one of several possible variogram models that provides the best fit to the data.For distances larger than about 1300 km the variogram estimates start increasing, which is probably related to the fact that the meteorological profiles are located within a different synoptic system.These distances were therefore excluded from the fit to allow an estimation of the local correlation scale.The correlation scale for daytime is about 100 km, which is in the range of the smallest distances of the radiosonde network.This scale is thus not very well constrained; however, it is obvious that the error in mixing height is not just local.This is supported by the enhanced variogram values at larger distances.The correlation length scale during nighttime of 230 km is somewhat better constrained, again indicating that it is not just a local effect.Given that both scales are significantly larger than the horizontal resolution of the ECMWF fields with about 35 km, we attribute the larger part of this difference to the ECMWF model rather than to a representation error caused by small scale variability in the mixing heights.
A similar analysis was performed to derive the temporal covariance scale of z i errors -i.e., the covariance of errors over time at a particular location.The temporal covariance scale was found to be 10 h for daytime, and 16 h for nighttime data.Again, these scales are not well constrained due to the lack of high frequency data within the radiosonde network; standard sites have 2 soundings per day.
In general, the correlation scales indicate that there is no regional coherence of several hundreds of kilometers, or a temporal coherence of several days scale, but it is also not a spatially and temporally local effect.

Propagating uncertainties in mixing heights into mixing ratios
To propagate the uncertainty in mixing heights to derive uncertainties in CO 2 mixing ratios, we use the STILT (stochastic time inverted Lagrangian transport) framework described in (Gerbig et al., 2003b) and in (Lin et al., 2004b).STILT was set up for a domain covering most of Europe (see Fig. 4) and run for the Bialystok tall tower in eastern Poland as a receptor over the period May and June 2005.The tower, located at 53 • 14 N and 23 • 01 E at an altitude of 180 m is an instrumented, 300 m tall tower close to the city of Bialystok, and has been operated by the Max Planck Institute for Biogeochemistry since 2005 for continuous measurement of several biogeochemical trace gases.As meteorological input for STILT we used the ECMWF fields, where the 00:00 and 12:00 UTC analysis fields are combined with short term forecasts to provide 3-hourly fields.STILT trajectory ensembles are coupled to surface fluxes on high spatial resolution (a Cartesian grid with 1/12×1/8 grid (lat.x lon.), corresponding to about 10×10 km 2 ), with biosphere-atmosphere exchange as the dominant surface flux represented with the Greatly Simplified Biosphere model (GSB (Gerbig et al., 2003b)) as temperature and radiation response keyed to different vegetation types, using the SYNMAP land cover product at 1 km resolution (Jung et al., 2006).For simplicity only the dominant vegetation classes forest and crop (Fig. 4) are used similar to the approach in Gerbig et al. (2006), and the nonlinear part of the light response is neglected.Thus the only parameters (the elements of the state vector λ) within the GSB used here are four scaling factors to adjust the light response of photosynthesis and the temperature response of respiration for the two vegetation classes.The initial light and temperature response and their uncertainty was taken from Gerbig et al. (2006), which resulted from a fit to eddy flux data.
Since the largest effect is expected for signals from surfaceatmosphere exchange in the near-field, i.e. near the measurement site, lateral boundary conditions are neglected and only anomalies due to regional fluxes are considered.
The approach to propagate the transport error is similar to the one used in (Lin and Gerbig, 2005).Here we give only give a brief description, and the reader is referred to Lin and Gerbig (2005).We use the stochastic nature of STILT to implement errors in mixing heights as a stochastic process within the transport model run.Standard runs of STILT provide a distribution of mixing ratios for an ensemble of trajectories, in which the different members differ in their realization of turbulent winds, i.e. turbulence is the only stochastic process.The width σ CO 2 ,turb of the CO 2 mixing ratio distribution then reflects the combined effect of turbulence modifying the path of each trajectory and spatial variability of CO 2 surface fluxes.We then run the model a second time with an additional stochastic process to describe the effect from errors in mixing heights: for each trajectory and each time step a random number is drawn from a Gaussian distribution with a mean of one and a standard deviation corresponding to the relative error in daytime mixing height σ Zi,rel , estimated from stdev(z i (ECMWF)-z i (RS))/ z i (RS) as 40% (Table 1, selected data not including coastal stations).Unlikely cases of values below zero are set to zero.This random number is then used to rescale the footprint (local sensitivity of mixing ratio to surface fluxes).Temporal and spatial correlations are taken into account by decorrelating the random numbers exponentially using the spatial and temporal variogram models, with a timescale of 12 h and spatial scale of 100 km as derived from the daytime mixing height residuals.This second STILT run then provides a distribution of CO 2 mixing ratios with an increased width σ CO 2 ,turb+Zi , that reflects the effect of turbulence plus the effect of the error in mixing heights.The mixing ratio error due to mixing height uncertainty can then be calculated from the broadening of the mixing ratio distribution, assuming statistical independence: This uncertainty σ CO 2 ,Zi is computed for every measurement time, providing time dependent uncertainties that can be used quantitatively in atmospheric inversions.
To a first order, this captures the effect of a changed mixing height on mixing ratios within the boundary layer.Not included with this method are secondary effects such as changes in advection, which are expected with different mixing heights and thus different turbulence profiles com-bined with different wind shear.However, we regard the modified dilution of surface fluxes into a mixing layer column with different top as the dominant effect, which provides a lower bound for the overall error.Footprints very close to the measurement site matter most (see e.g.Fig. 5 of Gerbig et al., 2003), during the first day the spatially integrated footprint values drop by about 30%.In this near-field the footprint simply scales with 1/z i (1-D case), with deeper vertical mixing causing smaller atmospheric signals given the same surface fluxes.Here our implementation of the uncertainty is fully appropriate.At upstream locations, one or several days before the measurement time, the plume of influence can be separated into two classes: a "PBL-plume" of particles that contribute to the signal from surface fluxes ( within the mixing layer, with the chance to be mixed into the surface influence zone), and particles in the residual layer or in the free troposphere, that do not contribute (the "FT-plume").The PBL-plume will be diluted, thus will get less surface influence, when the mixing height is increased.This part is correctly represented in our approach of rescaling footprints.The FT-plume will be entrained and contribute to surface flux signals when the mixing height is increased.This leads to an increase of footprint values, which is not represented in our approach.However, usually these two classes of plumes follow different paths due to windshear at the top of the mixing layer.Taking a 5m/s wind shear near the PBL top, after 6 hours the PBL-plume and the FT plume are separated by more than 100 km, the decorrelation scale for mixing height error.Thus these opposing effects on the "FT-plume" and the "PBL-plume" can not really compensate each other.Thus our simplification just neglected one additional error term (the entrainment of formerly FT-particles), thus further underestimating the final uncertainty in the modelled mixing ratio.In the real world this is slightly more complicated due to the strong diurnal variation of mixing height, but here we argue that we can reasonably only treat uncertainties in daytime mixing layer.Uncertainties in night-time mixing heights are by far larger, and more difficult to properly consider due to the much larger biases.However, this makes our estimate of uncertainty in modelled mixing ratios due to uncertainties in mixing heights even more a conservative one.

Results and application to inversions
Uncertainties for CO 2 mixing ratios are calculated using the above mentioned framework for the May-June 2005 period (Fig. 5).σ CO 2 ,Zi was on average 3.5 ppm, or 30% of the simulated CO 2 from biospheric fluxes within the near-field (up to 5 days prior to the measurements, or limited by the regional model domain).This is expected for a relative uncertainty in mixing heights (σ Zi,rel ) of 40% and a decorrelation scale of 100 km that is somewhat smaller than the footprint area (see Fig. 4), allowing the effective uncertainty in mixing ratios to become smaller due to the aggregation over the footprint area.In order to test the application of transport errors due to uncertainties in vertical mixing to atmospheric inversions, we make the following steps: first pseudo data are generated based on a "true" mixing height field different from the one assumed in the standard model, then these pseudo data are used for an inversion to retrieve the state vector λ within the GSB (i.e. the light and temperature responses of the biospheric fluxes).The inversion is done for two cases: case 1), where the state vector is retrieved without consideration of the propagated transport uncertainties, which corresponds to the standard case applied in other inversions, and case 2), where we take the transport uncertainty into account.Finally, we compare the retrievals from both cases with the known truth.

Atmos
Pseudo data are generated using a realization of relative errors in mixing heights that is consistent with the spatial and temporal covariances found in the statistical analysis above.This was used to create the "true" mixing height field, and these fields were used by STILT to calculate pseudo data for CO 2 mixing ratios, following the equation with y ps as the pseudo data, K true as the "true" Jacobian (sensitivities of measurements y with respect to the biospheric parameters λ), and the state vector λ true .K true is the combination of the "true" transport operator and the operator relating biospheric parameters to fluxes (i.e.radiation and temperature).As "true" fluxes we use the GSB model with all scaling parameters set to 1. Resulting pseudo data are shown in Fig. 5 as time series, with typical synoptic variations of about 20 ppm.The result of the "forward" model, using a Jacobian K with unmodified mixing heights, and using a priori scaling factors (state vector λ prior ) that are different from the "truth", is also shown in Fig. 5.As a priori values for λ we use for each element a random number taken from a Gaussian distribution with the prior uncertainty as the width.The forward model is strongly correlated with the true signal, but it is different due to the prior uncertainty (here a case of a stronger biospheric fluxes) as well as due to the transport uncertainty.The prior error is calculated as a projection in measurement space (the product of Jacobian K with the prior uncertainty in state space) and shown in Fig. 5.It is obvious that during times with larger differences between the forward model and the truth, the uncertainties are large, while small uncertainties are usually only found for periods with small differences.The different components of K show different temporal patterns (Fig. 5b): signals due to fluxes from crop areas usually dominate over forest signals, photosynthesis signals dominate over respiration signals.There is a high degree of correlation between all four signals, indicating a strong common influence through the modulation by transport.However, there are also significant differences left that allow separation of the different components in an inversion.Now we retrieve the state vector λ based on the pseudo observations, which are related through the measurement equation with ε y accounting for errors.Although ε y is often referred to as "measurement error", it can incorporate errors not related to instrument errors, but in the model representation (e.g., uncertainties in z i ).The optimal estimate (Rodgers, 2000) is with S ε as the error covariance corresponding to ε y , and the prior error covariance S prior .The posterior uncertainty of the retrieved state vector λ is calculated from . For case 1 that does not take into account any transport error, we use a diagonal matrix with 2 ppm uncertainty as error covariance S ε to account for uncertainties such as the insufficient grid resolution to resolve heterogeneity in surface fluxes ("representation error").In case 2 that takes into account the uncertainties in mixing height, we add to S ε a transport error with diagonal elements based on Eq. ( 1) and with off-diagonal elements that correspond to a 12 h temporal covariance scale.
The inversion of the pseudo data was conducted on a weekly time basis, allowing the state vector to adjust weekly to measurements.This reflects the fact that the biosphere model only accounts for responses to light and temperature, with constant light and temperature response.In the real world the light and temperature response changes for example due to soil moisture or to changes in the phenology, which vary on synoptic and longer timescales.
Retrieved time series of the state vector components with their uncertainties (Fig. 6) during the May-June 2005 period show that there is a strong reduction in uncertainty for both cases (with and without consideration of the transport error), with posterior uncertainties that are about an order of magnitude smaller than the corresponding prior uncertainties (also shown in Fig. 6).Although for generation of the pseudo data no temporal variation in the state vector was imposed, the retrievals show variations and differences from the truth.The retrieved scaling factors for case 1 show significant differences from the truth, indicated by the fact that the range given by the posterior uncertainty around the retrieved state (grey bands in Fig. 6) in most cases does not include the truth value (a scaling factor of one).Thus the retrieval for case 1 is biased.In contrast, case 2 has less deviation from the truth, and also the truth is included in the range of posterior uncertainties (Fig. 6, light blue bands) so that differences between retrieved state and truth are not statistically significant.This is achieved by an uncertainty for case 2 that is much larger (by about a factor two) than for case 1), but this is the price to pay in order to get a retrieval that is consistent with the truth.
It is important to note that the bias in the retrieved state for case 1 depends on the combination of spatial and temporal decorrelation scales in the mixing height uncertainty and temporal scale of the retrieved parameters that one is interested in.When aggregating parameters or fluxes to temporal scales long compared to the decorrelation scale, the resulting bias will diminish.However, given that the mixing height uncertainty that was used in STILT did not account for the much larger night time error, which included not only a random part, but also a significant bias, the estimation of the retrieval bias is on the low side.Inclusion of the bias error for nocturnal mixing heights would have shown the diurnal rectification effect (Denning et al., 1996).

Discussion and outlook
Since the uncertainties in mixing heights found in this study are quite large, it is appropriate to spend some time in discussing potential reasons as well as to examine potential approaches to deal with them.As shown in this analysis, uncertainties in mixing heights as used in atmospheric inverse studies pose a considerable problem when interpreting measurements made in the continental boundary layer.An average uncertainty of about 40% for daytime mixing heights results in a corresponding uncertainty in mixing ratios, which in case of CO 2 amounts to several ppm during the growing season, or to 30% of the regional biosphere-atmosphere signals.Together with the uncertainty in advection due to wind errors (Lin and Gerbig, 2005), this is the dominant source of uncertainty in any inverse modelling system targeted at continental measurements (Table 2).Other sources of uncertainty such as the pure measurement uncertainty are generally small compared to these transport model uncertainties.It is important to keep in mind that this analysis does not address any methodological biases that arise when the mixing height applied in the model is different from the true mixing height even when perfect knowledge of meteorological profiles is used.It only addresses the effect of the use of assimilated or forecasted meteorological profiles for offline transport simulations.Especially during night-time stable conditions methodological bias errors can be large and are an open research question (Seibert et al., 2000).
The variogram analysis of mixing height errors (Sect.2.2, also Fig. 3) shows that although the error is not uncorrelated in space, there is a significant random component that is spatially uncorrelated.Spatial variations in mixing heights respond in some degree to spatial variations of surface properties such as albedo and wetness, depending on wind speed and on scale of the surface heterogeneity (Mahrt, 2000).Scales of a few tens of km are favourable for the formation of mesoscale patterns (Chen and Avissar, 1994), so it is not really surprising that the ECMWF model at a resolution of around 35 km does not capture the full spatial variability.Small scale variability in radiosonde derived mixing heights can also be caused by broken clouds in that in-cloud and clear air profiles of temperature etc. are different.On average, this is represented in the ECMWF model, but since clouds remain unresolved there is a significant variability that is not represented in modelled fields.
There are in general three approaches that have the potential to mitigate these problems in an inversion system: 1) to allow for additional uncertainty by quantifying and propagating the error, 2) to improve the transport model, or 3) to use an approach that is less sensitive to the transport error.In the following these approaches and their benefits and limitations are discussed.
Approach 1), as has been shown in this paper, can provide unbiased inversion results.It can further be implemented for other regions and times given that the statistical analysis to quantify the error covariance for the uncertainty in mixing heights has been extended to cover these times and locations.However, it requires running a stochastic model such as STILT in order to propagate the mixing height error into a mixing ratio uncertainty.Further, significant care has to be taken to ensure that the different spatial and temporal scales are appropriately characterized.The simple exponential decorrelation assumed for the error covariance in this study might not be true, and it has impact on the corresponding scales in the retrieved state vector, especially when solving for spatially explicit fluxes.Retrievals might thus still be biased on given scales due to remaining biases in the transport representation.
The second approach (2) means that a transport model has to be applied that significantly better reproduces boundary layer mixing.This could partly be achieved by using more sophisticated boundary layer schemes and a more sophisticated land surface model.However, it is probably inevitable to use a higher spatial resolution to better represent the atmospheric circulation in the vicinity of the measurements (van der Molen and Dolman, 2007).Such approaches using mesoscale transport models embedded in inversion systems are being implemented by different groups now, and first exploratory applications were performed in the CarboEurope Regional Experiment Strategy (CERES) (Ahmadov et al., 2007;Dolman et al., 2006;Sarrat et al., 2007aSarrat et al., , 2007b)).Since there will never be a perfect model, it is obvious that also for this approach, that seeks to reduce uncertainties by improving the model, it is obvious that a detailed validation and analysis of the remaining uncertainties is required based on extensive comparisons with measurements.
Approach 3) could be realized by replacing the point measurements from tall towers by column observations.Column amounts are not as sensitive to the exact height of the mixing www.atmos-chem-phys.net/8/591/2008/Atmos.Chem.Phys., 8, 591-602, 2008 layer, since to first order a difference between true and modelled mixing height just reshuffles air between the free troposphere (FT) and the PBL, leaving the column amount constant.Such column observations can be made by aircrafts or with remote sensing techniques.A slight drawback of using column amounts is that the signature of surface fluxes (change in tracer mixing ratio) is diluted over the atmospheric column.Hence the temporal and spatial variability of column amounts is not as large, and thus the method is less sensitive to surface fluxes.Profile information would be better in this regard, since then a separation of PBL and FT remains possible and the measured vertical distribution can be adjusted by reshuffling between PBL and FT (with constant column amount) to match the thickness of the PBL in the model.The main drawback is that column or profile measurements are still limited in accuracy and precision in case of remote sensing techniques, or are still quite expensive in case of regular aircraft measurements.Satellite remote sensing techniques are being developed further, but observations will only become available within the next years, first with limited quality (Crisp et al., 2004).An increasing network of ground based FTIR measurement stations, primarily intended for validation of the upcoming satellite instrument OCO, can also be utilized for inversions, but there is significant work to be done in improving retrieval algorithms in order to provide accurate measurements (Washenfelder et al., 2006).
Another way to realize approach 3) is to use additional tracer measurements such as 222 Rn or SF 6 that are subject to the same transport, but for which fluxes are known.This would allow "calibrating" the transport, and reduce errors this way (Schmidt et al., 1996).However, this requires the fluxes to be known at least as well as we already have knowledge about the fluxes targeted in the inversion.Given the prior knowledge of CO 2 fluxes and their spatial and temporal patterns, and given the problems in properly simulating the emanation rate of 222 Rn (Olivié et al., 2004), we see only limited potential in this tracer.For SF 6 , the location and strength of local emissions are often dominating variability of measurements in the continental boundary layer (Rivier et al., 2006).A recent study also found significant uncertainties in the emission database (Hurst et al., 2006).Since detailed and accurate information on these emissions is hard to come by, this tracer is also not too promising to reduce transport errors.
The real solution to this problem is probably a combination of the above mentioned approaches.This means that the inversion system or data assimilation framework targeted at surface-atmosphere exchange would combine improved and higher spatial resolution models with quantitative information on uncertainties.This framework would then use both, tall tower measurements with good temporal coverage as well as profile or column measurements with more limited temporal but better vertical coverage.
What we regard as an indispensable addition for the observational system are devices to determine the mixing height that are collocated with tall towers and aircraft profiling sites.Such measurements could be made by remote sensing systems such as lidars, sodars, RASS, or wind profiling radars (Clifford et al., 1994).A relatively cheap and operationally feasible method is the use of ceilometers, where the backscatter profile can be used to retrieve mixing heights under stable and convective conditions (Eresmaa et al., 2006).These methods give continuous mixing height information, which is very useful to better constrain the temporal covariances discussed in Sect.2.2.Further, these data could be used in the data assimilation framework to improve the representation of atmospheric mixing where it is most relevant, at the measurement site.

CFig. 1 .
Fig. 1.Daytime mixing heights derived from radio soundings (left) and from ECMWF short term forecasts (right), averaged over the May-June 2005 period.Sounding locations are indicated by open circles, with filled circles showing the location of high resolution sounding data.Crosses indicate stations near coasts not selected for further statistical analysis.Interpolation for the contour plot was done with inverse distance weighting.

Fig. 2 .
Fig. 2. Left: temporally averaged daytime residuals between ECMWF based and radio sounding based mixing heights, normalized by the radio sounding based estimate.Right: standard deviation of daytime residuals, normalized by the radio sounding based estimate.Symbols are the same as in Fig. 1.
Fig. 3. Variogram of mixing height residuals z i (ECMWF)-z i (RS) for day (left) and night (right).The lines show a fit with an exponential variogram, for which distances larger then 1300 km where excluded.

Fig. 4 .
Fig. 4. Example of a footprint for the Bialystok tall Tower, calculated using STILT driven by ECMWF meteorology.Inserts show the fractional land coverage with forests (left) and croplands (right).
Fig. 5. (a) Time series of the CO 2 signal due to biosphere-atmosphere exchange within the model domain ("biospheric signal") for a 2 week period in May 2005.Pseudo data are shown as blue line ("Obs").Simulated values are shown in black ("Model") on top of the grey band indicating the propagated transport error ("TransErr").Also the prior errors are shown as green band ("PriorErr").(b) Time series of CO 2 signals due to different flux components from the different vegetation classes.Abbreviations: R denotes respiration, GEE denotes gross ecosystem exchange.

Fig. 6 .
Fig. 6.Retrieved weekly scaling factors for respiration (left) and photosynthesis (right) plotted against time for forest (top row) and crop (bottom).Abbreviations as in Fig. 5.The "true" scaling factor is one, plotted as red line, prior values are indicated by the dotted line, and prior uncertainties shown as error bars plotted at the left side of each graph.Retrievals without consideration of the transport error are plotted as black lines, with posterior uncertainty as grey band; retrievals taking into account the propagated transport error are plotted as blue lines, with posterior uncertainty as light blue band.

Table 1 .
Average values for relative bias and relative standard deviation between radiosonde and ECMWF derived mixing heights, separated by day/night and by data selection (all data vs. selected data, excluding coastal locations).UK Meteorological Office, 2006)as well as from Météo-France (Joel Noilhan, personnel communication 2006) for a small subset of stations.The results (also shown in Table

Table 2 .
Typical uncertainties for boundary layer CO 2 mixing ratios in an inverse modeling study.