Edinburgh Research Explorer Interpreting the variability of space-borne CO2 column-averaged volume mixing ratios over North America using a chemistry transport model

. We use the GEOS-Chem chemistry transport model to interpret the sources and sinks of CO 2 that determine variability of column-averaged volume mixing ratios (CVMRs), as observed by the SCIAMACHY satellite instrument, during the 2003 North American growing season. GEOS-Chem generally reproduces the magnitude and seasonal cycle of observed CO 2 surface VMRs across North America and is quantitatively consistent with column VMRs in later years. However, it cannot reproduce the magnitude or variability of FSI-WFM-DOAS SCIAMACHY CVMRs. We use model tagged tracers to show that local ﬂuxes largely determine CVMR variability over North America, with the largest individual CVMR contributions (1.1%) from the land biosphere. Fuel sources are relatively constant while biomass burning makes a signiﬁcant contribution only during midsummer. We also show that non-local sources contribute signiﬁcantly to total CVMRs over North America, with the boreal Asian land biosphere contributing close to 1% in midsummer at high latitudes. We used the monthly-mean Jacobian matrix for North America to illustrate that: 1) North American CVMRs represent a superposition of many weak ﬂux signatures, but differences in ﬂux distributions should permit independent ﬂux estimation; and 2) the atmospheric


Introduction
The importance of the natural carbon cycle in understanding climate is well established (IPCC, 2007). A better quantitative understanding of natural sources and sinks of carbon dioxide (CO 2 ), in particular, is crucial if CO 2 mitigation and sequestration activities relying on these natural fluxes are to work effectively. Estimation of sources and sinks of CO 2 using inverted atmospheric transport models to interpret atmospheric concentration data has been generally effective but has had varied success in the tropics where there is relatively little data (Gurney et al., 2002). Previous inversion studies have used surface concentration data (Bousquet et al., 1999), representative of spatial scales of the order of 1000 km by virtue of their location; aircraft concentration data (Palmer et al., 2006;Stephens et al., 2007) representative of spatial scales of the order of 10-100 s km, and generally only available during intensive campaign periods; and concentrations from tall towers (Chen et al., 2007), representative of spatial scales of the order of <1-10 s km. New CO 2 column data from low-Earth orbit space-borne sensors (e.g., the Scanning Imaging Absorption Spectrometer, SCIAMACHY (Bovensmann et al., 1999); the Orbiting Carbon Observatory, OCO (Crisp et al., 2004;Miller et al., 2007); and the Greenhouse Observing SATellite, GOSAT (Hamazaki et al., 2004)), measuring shortwave near-infrared wavelengths (SWIR), are sensitive to changes in CO 2 in the lower troposphere and therefore provide potentially useful data with which to estimate surface fluxes of CO 2 (Chevallier et al., 2007). One of the main advantages of spaceborne sensors is their repeated global coverage, facilitating measurements, for example, over remote tropical ecosystems that are currently poorly characterized by in situ data.
Published by Copernicus Publications on behalf of the European Geosciences Union. 5856 P. I. Palmer et al.: Interpreting column CO 2 data SCIAMACHY CO 2 data, in particular, are representative of a 60 km×30 km spatial footprint, comparable with the horizontal resolution of current generation atmospheric transport models; upcoming instruments will have better horizontal resolution. At the time of writing, SCIAMACHY is the only space-borne sensor in orbit that measures CO 2 columns sensitive to the lower troposphere. To date there have been very few model studies of SCIAMACHY CO 2 column data, which have provided only qualitative comparisons (Buchwitz et al., 2005(Buchwitz et al., , 2007; Barkley et al., 2006c). In this paper, we use the GEOS-Chem global 3-D chemistry transport model (CTM) to interpret the sources and sinks of CO 2 that determine the variability in CO 2 columns as observed by SCIA-MACHY over North America during the 2003 growing season. We focus on North America because of the extensive multi-platform measurement programme which can be used to help evaluate SCIAMACHY via the CTM, and on 2003 because at the time of writing that represented the only full year of fitted CO 2 columns using the Full Spectral Initiation (FSI) (Barkley et al., 2006a) Weighting Function Modified Differential Optical Absorption Spectroscopy (WFM-DOAS) (Buchwitz et al., 2000).
A number of studies have illustrated that the precision and accuracy of measured CO 2 columns is critical to their success in better quantifying the carbon cycle. The temporal and spatial variations in column data are much less than those in surface concentration measurements (Olsen and Randerson, 2004). Inversions of synthetic data have shown that CO 2 columns have to be retrieved with a precision of less than 1% over a 8 • ×10 • grid if they are to improve upon the existing ground-based network used for source/sink estimation (Rayner and O'Brien, 2001). Consequently, uncharacterized systematic biases will compromise this ability (Miller et al., 2007). Use of column CO 2 has the benefit of effectively reducing the potential model bias introduced by inaccurate descriptions of vertical mixing (Olsen and Randerson, 2004). Nonetheless, recent work has highlighted the requirement of using accurate, synoptic-scale atmospheric transport to interpret CO 2 column data in order to minimize errors associated with spatial sampling, particularly over geographical regions with active weather systems (Corbin et al., 2008). The GEOS-4 meteorology we use here has skill in capturing synoptic scale transport (e.g., Duncan et al., 2008;Zhang et al., 2008) but as with all global 3-D CTMs, convection remains one of the biggest weaknesses in model transport (Kiley et al., 2003). The vertically integrated CO 2 column abundance represents the sum of an age-spectrum of airmasses. Young airmasses (defined in this paper as <3 months), still bearing the signatures of surface fluxes, are subject to atmospheric dilution processes that eventually render these signatures indistinguishable from the global background whose variability is determined by atmospheric transport. We show that variability in space-borne CO 2 columns over one region is determined by both national and international surface flux signatures (local biosphere fluxes that reach 1.1% of the column-averaged volume mixing ratio (CVMR) generally represent the largest signals) that can be used to estimate flux strengths via inverse model calculations. We also emphasize that accounting for the vertical sensitivity of the satellite instrument can, in some instances, enhance surface flux signatures.
Section 2 briefly describes the SCIAMACHY retrievals of CO 2 used in this work and presents CO 2 CVMR distributions over North America. Section 3 describes the GEOS-Chem CTM used for this study and presents a comparison between model and SCIAMACHY CVMRs distribution; Appendix A presents a more detailed model evaluation of CO 2 over North America during 2003 using surface VMR, ground-based column VMR, and aircraft concentration measurements. In Sect. 4 we use the model to estimate which land-based fluxes determine the continental-scale variability of CVMRs over North America during the growing season, and look in detail at two contrasting sites over North America. In Sect. 5 we discuss how CVMRs data could be used to infer source and sink distributions. We conclude the paper in Sect. 6.

SCIAMACHY CO data
SCIAMACHY is a nadir and limb-viewing UV/Vis/SWIR solar backscatter instrument aboard the ENVISAT satellite, launched in 2002 (Bovensmann et al., 1999). It measures from 240 to 2380 nm, with a resolution of 0.2-1.4 nm depending on the channel. ENVISAT is in a near-polar sunsynchronous orbit crossing the equator at about 10:00 local solar time in the descending node, achieving full longitudinal global coverage at the equator within six days. SCIA-MACHY makes measurements in an alternating nadir and limb sequence. We use the nadir measurements that have a horizontal resolution of 60×30 km 2 (across × along track).
We include here only a short description of the retrieval of SCIAMACHY CO 2 and refer the reader to dedicated retrieval studies (Buchwitz et al., 2000;Barkley et al., 2006a). CO 2 columns are retrieved in the 1561.03-1585.39 nm wavelength window using the FSI WFM-DOAS approach (Barkley et al., 2006a;Buchwitz et al., 2000). The mean fitting uncertainty of these columns is typically 1-4% (0.8-3.2×10 20 molec cm −2 based on a fitted column of 8×10 21 molec cm −2 ), which is largely attributable to poor characterization of the atmospheric state (e.g., aerosols, cirrus clouds) (Barkley et al., 2006a). Cloudy scenes are diagnosed using the SCIAMACHY polarization measurement devices using a cloud algorithm developed by Krijger et al. (2005), as described by Barkley et al. (2006a), and excluded from subsequent analyses. We also exclude back scans, observations with solar zenith angles >75 • (Barkley et al., 2006a), and observations over ocean due to very low surface albedo. To remove artefacts introduced by surface elevation we normalize retrieved CO 2 columns using the nearest 6-hourly 1.125 • ×1.125 • ECMWF model surface P. I. Palmer et al.: Interpreting column CO 2 data 5857 pressure (Barkley et al., 2006c) to derive a CVMR. We use only column observations with a retrieval error of <5% and with a values that corresponds to CVMRs within a range of 340-400 ppmv to remove anomalous results from undetected clouds or from aerosol scattering (Barkley et al., 2006a). Previous studies have extensively evaluated FSI CO 2 data against independent measurements over the Northern Hemisphere. Comparisons between SCIAMACHY CO 2 and ground-based Fourier Transform Spectrometers (FTS) and a CTM show a negative bias of 2-4% in the absolute CVMRs magnitudes. Strong correlations between SCIA-MACHY CO 2 anomalies and aircraft and ground-based data imply that SCIAMACHY can track lower troposphere variability on at least monthly timescales, and has the potential to monitor changes in CO 2 (Barkley et al., 2006b(Barkley et al., , c, 2007. At this time, several retrieval issues (e.g., aerosol contamination) need to be resolved before the data are characterized sufficiently well for inverse modelling. Figure 1 shows monthly mean SCIAMACHY CO 2 CVMRs (ppmv) over North America during the 2003 growing season (here defined as April-September) averaged over the GEOS-Chem 2 • ×2.5 • grid (Sect. 3). The average number of individual scenes that fall into a North American 2 • ×2.5 • grid box is between 25 and 50, depending on month; this effectively reduces the random error by approximately an order of magnitude. Outside the growing season spatial coverage at high latitudes is reduced by seasonally varying solar zenith angle and persistent cloud cover. Values range from 350 to 390 ppmv with a 15-20 ppmv peak-to-peak seasonal cycle over regions with a strong biospheric signal, which is much larger than the 10 ppmv peak-to-peak seasonal cycle observed at Park Falls (Yang et al., 2007).

The GEOS-Chem forward model of CO 2
We use the GEOS-Chem global 3-D chemistry transport model (v7-03-06) to calculate column concentrations of CO 2 from prescribed surface CO 2 fluxes described in this section. We used the model with a horizontal resolution of 2 • ×2.5 • , with 30 vertical levels (derived from the native 48 levels) ranging from the surface to the mesosphere, 20 of which are below 12 km. The model is driven by GEOS-4 assimilated meteorology data from the Global Modeling and Assimilation Office Global Circulation Model based at NASA Goddard. The 3-D meteorological data is updated every six hours, and the mixing depths and surface fields are updated every three hours. The CO 2 simulation is based on Suntharalingam et al. (2004) and Palmer et al. (2006); here, we provide a description of modifications to these previous studies.
A general evaluation of model CO 2 distributions over North America during 2003 is shown in Appendix A.
3.1 CO 2 flux inventories Table 1 reports the regional monthly mean estimates of CO 2 fluxes from fuel combustion (sum of fossil fuel and biofuel), biomass burning, and the land biosphere used in GEOS-Chem. Gridded fossil fuel emission distributions are representative of 1995 (Suntharalingam et al., 2004) which we have scaled to 2003 values using regional budget estimates for the top 20 emitting countries in 2003 from the Carbon Dioxide Information Analysis Center (Marland et al., 2007), including sources from fossil fuel burning, gas flaring, and cement production. On a global scale the sum of these sources has increased by 14% relative to 1995 values. Biofuel emission estimates, taken from Yevich and Logan (2003), represent climatological values. This source of CO 2 is generally less than 1% of the total fuel source for North America and western Europe but represents up to 18% of the total fuel source for Asia. In many regions, particularly Asia, the distributions of fossil and bio-fuel emissions overlap significantly so we lump these fuel source together (FL). Monthly biomass burning (BB) emission estimates are taken from the second version of the Global Fire Emission Database (GFEDv2) for 2003 (van der Werf et al., 2006). These data are derived from ground-based and satellite observations and should describe well the burning distributions. Monthly mean air-sea fluxes of CO 2 are taken from Takahashi et al. (1999). As we show later the observed variability in SCIAMACHY data is determined largely by continental fluxes so we do not discuss further the role of ocean exchange in this study. We use daily mean land biosphere (BS) fluxes from the CASA model for 2001 , in the absence of corresponding fluxes for 2003. Year-toyear variability of CASA monthly mean land biosphere CO 2 fluxes is small (<10%) so our approach should not introduce significant error. We do not explicitly account for the contribution of fuel combustion CO 2 from the oxidation of reduced carbon species (Suntharalingam et al., 2005) as they make only a small contribution to the CO 2 column.

Model initialization
CO 2 concentrations for January 2002 were initialized from a previously evaluated model run (Palmer et al., 2006), which we integrate forward to January 2003. We include an additional initialization to correct for the model bias introduced by not accounting for the net uptake of CO 2 from the terrestrial biosphere. We make this downward correct by comparing the difference between GLOBALVIEW CO 2 data (GLOBALVIEW-CO 2 , 2006) and model concentrations over the Pacific during January 2003. Differences range from 1 to 4 ppmv with a median of 3.5 ppmv, and we subtract this value globally, following Suntharalingam et al. (2004).
From January 2003 the total CO 2 tracer becomes the "background" CO 2 concentration and is only subject to atmospheric transport. At that time, we also introduce P.  .5 • grid. The model is sampled at the time and location of the observed scenes, and using the SCIAMACHY averaging kernel as outlined in the main text. The RHS panels show scatterplots of the monthly mean data, with the total number of (black + red) data points n and associated correlation coefficient r and the model bias inset. Red data denote columns over the region defined by latitudes >50 • N and longitudes >100 • W (as shown in top LHS panel). We exclude 1) cloudy scenes, identified by instrument polarization devices, 2) scenes with solar zenith angles >75 • , 3) scenes with a retrieval errors of ≥5%, and 4) scenes that correspond to CVMRs outside of the range 340-400 ppmv. The nearest ECMWF (1.125 • ×1.125 • ) and GEOS-4 (1 • ×1.125 • ) surface pressure data are used to convert from observed and model columns to CVMRs, respectively. .5 • grid. The model is sampled at the time and location of the observed scenes, and using the SCIAMACHY averaging kernel as outlined in the main text. The RHS panels show scatterplots of the monthly mean data, with the total number of (black + red) data points n and associated correlation coefficient r and the model bias inset. Red data denote columns over the region defined by latitudes >50 • N and longitudes >100 • W (as shown in top LHS panel). We exclude 1) cloudy scenes, identified by instrument polarization devices, 2) scenes with solar zenith angles >75 • , 3) scenes with a retrieval errors of ≥5%, and 4) scenes that correspond to CVMRs outside of the range 340-400 ppmv. The nearest ECMWF (1.125 • ×1.125 • ) and GEOS-4 (1 • ×1.125 • ) surface pressure data are used to convert from observed and model columns to CVMRs, respectively. additional model tracers, initialized with a uniform value (for numerical reasons and which is subtracted in subsequent analyses), that account for the monthly production and loss of CO 2 originating from specific geographical regions and surface processes ("tagged" tracers). The linear sum of these monthly tagged tracers (and the "background") is equivalent to the total CO 2 . Figure 2 shows the tagged geographical regions for these experiments: North America (NA), Europe (EU), Asia (AS), Boreal Asia (BA), and the rest of the World (ROW). We separately account for CO 2 contributions from fossil fuel emissions (FF), biofuel emissions (BF), biomass burning (BB), the land biosphere (BS), the ocean biosphere (OC), and the inert initial conditions from January 2003. As mentioned above, FL describes the sum of FF and BF. We find the ocean flux contribution to atmospheric CO 2 columns is diffuse and is difficult to distinguish from the initial conditions and is consequently lumped with the ROW.

Modelling CO 2 columns and CVMRs from SCIAMACHY
Global 3-D model CO 2 distributions are sampled at the time and location of the SCIAMACHY scenes. We take into account the vertical sensitivity of SCIAMACHY to changes in CO 2 by using the instrument averaging kernel, A. The averaging kernel formally describes the sensitivity of retrieved CO 2 columns to changes in CO 2 throughout the column, and is a reflection of atmospheric radiative transfer at SWIR wavelengths. Figure 3 shows the mean SCIAMACHY averaging kernel, averaged over solar zenith angles ranging from 0 • to 70 • , increase in sensitivity throughout the troposphere with only a small fall-off in the last 1 km due to numerical error (Barkley et al., 2006c). As noted above, not taking A into account compromises subsequent interpretation of observed columns. Model SCIAMACHY CO 2 columns, , are given by (Rodgers, 2000): where H(x) is the GEOS-Chem forward model driven by a priori surface fluxes of CO 2 (x), x a is the a priori CO 2 concentration profile taken from climatology and also used in the SCIAMACHY retrievals (Remedios et al., 2006) and a is the associated column. The column averaging kernel a is given by t T A, where t is the column integration operator that integrates a vertical profile to a column and the superscript T denotes the matrix transpose operation. The tagged column contributions to the total CO 2 columns, corresponding to geographical regions in Fig. 2 Table 1 for latitude and longitude region definitions and associated flux estimates.
Model CO 2 CVMRs are determined by scaling each model column by its nearest GEOS-4 surface pressure value, taking into account unit changes. We used 1 • ×1.125 • GEOS-4 surface pressure data to be consistent with a) the horizontal resolution of the ECWMF surface pressure data used in the SCIAMACHY retrieval (Sect. 2), and b) the 2 • ×2.5 • GEOS-4 meteorology used in the GEOS-Chem model.

Comparison of model and observed CO 2 CVMRs
Model CO 2 columns (not shown) are generally within 3% of SCIAMACHY columns, consistent with Barkley et al. (2006c), and describe more than 80% of the observed variability. Column distributions are largely determined by changes in surface topography, and consequently a reflection of the surface pressure fields. Figure 1 shows the model and observed CVMRs (ppmv). Observed CVMRs generally show a larger East-West gradient (10-15 ppmv) than the model (5-6 ppmv), inconsistent with analyzed CO 2 distributions constrained by in situ measurements (Schneising et al., 2008). Model CVMRs generally have a narrower dynamic range compared with the observations, largely confined between 360 to 390 ppmv. Monthly measurement minus model CVMR differences are approximately Gaussian with a mean offset that varies from −6 to −13 ppmv, depending on month. Unlike SCIAMACHY CO we find no significant correlation between model and data differences and the spectral fitting uncertainty (de Laat et al., 2007). On average over North America for each month studied, the model is within 3% of the observed CVMRs, but this reflects the model being higher than SCIAMACHY at latitudes >50 • N and longitudes >100 • W (where SCIAMACHY CVMRs are typically <370 ppm) and lower than SCIAMACHY elsewhere over North America. Model bias, used throughout this paper, is defined as: for the retrieval of CO 2 from SCIAMACHY SWIR measurements (Barkley et al., 2006c) and applied to the GEOS-Chem model. Individual averaging kernels, representative of a particular SZA, have been generated brute-force by perturbing the US standard atmosphere by 10 ppmv at 1 km intervals between 10 km and at 5 km intervals above 10 km. The brute force method uses the formula where V rp is the retrieved perturbed vertical column density, V tp is the true perturbed column, V tu is the true unperturbed column, and V ru is the retrieved unperturbed column (numerically equal to V tu ) (Barkley et al., 2006b) where o is the observed column, m is the model column, and n is the number of observations. The large positive bias is likely due to the model underestimating columns over the eastern US and at higher latitudes (denoted by red data points in Fig. 1 scatterplot), where vegetation is predominant, and also to an estimated 2-5% measurement accuracy (Barkley et al., 2006b(Barkley et al., , 2007. On a continental scale, the model cannot reproduce SCIA-MACHY CVMR distributions, determined mainly by the dipole in CO 2 column oriented NW-SE, characteristic of the seasonal biospheric uptake (Barkley et al., 2006b, c;Buchwitz et al., 2007). Similarly, the model cannot reproduce SCIAMACHY CVMR data at individual GLOBALVIEW stations but is generally consistent with the magnitude and seasonal cycle of surface CO 2 concentrations over North America during 2003 (Appendix A). Previous work has shown that SCIAMACHY captures the broad monthly variability of CO 2 on a 5 • ×5 • spatial scale (Barkley et al., 2007). Model CVMRs agree with previous measurement studies of the timing and magnitude of measured values at Park Falls in later years (Appendix A), suffering from a weak biospheric drawdown during peak summer months (Yang et al., 2007). Model CO 2 concentration profiles are in broad agreement with high-frequency aircraft observations from the CO 2 Budget and Regional Airborne Study (COBRA) over North America during summer 2003 (http://geomon-wg.ipsl.  Table 1 for regional CO 2 flux estimates.
jussieu.fr/sections/aircraftcampaigns/cobra) but suffer from a 2±3.5 ppmv bias in the boundary during period of intense biospheric drawdown and a positive bias of 2 ppmv in the upper troposphere above 7 km that may be due to error in model stratosphere troposphere exchange (Appendix A). We conclude that the model has reasonable skill at reproducing observed distributions of CO 2 over North America at a spatial resolution of 2×2.5 • , and the discrepancy with SCIA-MACHY reflects not only model error but a significant retrieval error, consistent with Barkley et al. (2006bBarkley et al. ( , 2007.

What surface fluxes determine model CO 2 CVMR variability over North America?
4.1 Continental-scale distributions Figure 4 shows the land-based contributions to model CO 2 CVMRs over North America ( Fig. 1 and Table 1). Many source and sink terms show large seasonal cycles in their CVMR contributions. Background CO 2 CVMRs (January 2003 initial conditions in our calculations, Sect. 3) are typically greater than 350 ppmv (not shown). CO 2 CVMRs over North America are determined largely by local sources and sinks, as expected. The North American land biosphere (BS NA) represents the single largest contribution to total CO 2 , with a minimum and maximum of −8 ppmv and 3 ppmv, respectively, corresponding to a maximum of 1.1% of the total CVMR. This contribution, here determined by the CASA model (Sect. 3), is a source of CO 2 until late May, after which it becomes a sink peaking in July. During periods of uptake this contribution is characterized by a dipole with uptake over the North and East and a source over the arid southwestern states (Barkley et al., 2006b, c); a similar pattern is evident in model and observed CVMRs (Fig. 1). We also find during periods of BS NA drawdown that CVMRs and surface VMRs converge (not shown), as a result of the SCIAMACHY averaging kernel peaking in the lower troposphere. Fuel sources from North America (FL NA) are relatively constant in magnitude throughout the year 5862 P. I. Palmer et al.: Interpreting column CO 2 data The grey CVMR data represent the individual pixel retrievals of CO 2 from SCIAMACHY and the blue data are the associated 30-point running mean values. The horizontal dashed lines superimposed on the CVMR comparison over Park Falls represents the observed peak-to-peak range of CVMRs (Yang et al., 2007), which we add to our model mean of CO 2 CVMR at this site over 2003. (Table 1), with the largest CVMR contributions over the East coast (up to 2 ppmv). The North American biomass burning (BB NA) season starts in Canada in June reaching a peak in August with partial monthly mean columns of 1 ppmv; this contribution, in particular, is likely to be much larger on submonthly timescales and finer spatial scales.
We also show that CO 2 columns over North America are significantly influenced by Boreal Asia and mainland Asia and that in some months these column contributions are comparable in magnitude to North American fluxes. Column contributions from Boreal Asian fuel sources (FL BA) are largest over Alaska and northern Canada, reflecting the latitude of Boreal Asia and subsequent atmospheric transport. Similar spatial distributions are shown for biomass burning and the land-biosphere from Boreal Asia (BS BA), with the contribution from biomass burning peaking in mid-summer. The land-biosphere is most positive during April (1.2 ppmv) and is most negative during July (−5 ppmv). The seasonal cycle of BS BA is similar to that of the North American biosphere (BS NA), which may compromise the ability of column observations to independently estimate fluxes from the North American and Boreal Asian biospheres despite exhibiting different spatial distribution in column space but this needs to be confirmed by rigorous inversion analyses. The largest mainland Asian fuel and biomass burning contributions (FL AS, BB AS) to North American CO 2 occur in March (not shown) and April over the west Coast, consistent with current understanding of the temporal continental outflow from that region (Liu et al., 2003). The biospheric signal from mainland Asia (BS AS) is delayed relative to North America with a negative peak in August. European column contributions from fuel, biomass burning, and the land biosphere (FL EU, BB EU, BS EU) are qualititively similar to Boreal Asia, reflecting similar high latitude atmospheric transport, but they are an order of magnitude smaller.
Many of these sources and sinks will be much higher on sub-monthly temporal scales and on finer spatial scales but our results reiterate previous studies that emphasize the importance of sub-1% precision column measurements if physically meaningful surface flux distributions of CO 2 are to be estimated. Figure 5 shows the CO 2 flux signatures that determine the variability of CO 2 at two measurement sites: the WLEF television tower, 12 km east of Park Falls in Wisconsin and Wendover in Utah. In Appendix A we show that GEOS-Chem has skill in reproducing the contrasting seasonal cycle of CO 2 at these sites, but predicted premature uptake of CO 2 at the Park Falls site. We sample the model at the location of the two groundbased sites and at the 10-12 local SCIAMACHY overpass time. The WLEF site shows as observed seasonal cycle of CO 2 CVMR with a peak-to-peak range of 13 ppmv (denoted by the horizontal dashed lines), which is captured reasonably well by GEOS-Chem. The corresponding model CO 2 columns vary by 3×10 20 molec cm −2 (not shown), representing a change of order 4% in the column. SCIAMACHY reproduces the broad-scale seasonal cycle observed at the surface (and the tower data at this site (Barkley et al., 2007)) but because of noise, due to the retrieval and the relatively coarse spatial colocation (Barkley et al., 2007), it is difficult to assess whether SCIAMACHY reproduces the later onset of the uptake observed by surface measurements. We use a 30-point running mean to effectively reduce random noise. The resulting smoothed observed columns, even after accounting for the bias, show a larger drawdown of CO 2 during midsummer. Model and observed CVMRs show greater discrepancy during midsummer months. Figure 5 shows the seasonal contributions of different monthly sources and sinks to model CVMRs >0.5 ppmv at some time during the year. Fuel combustion from North America, Europe and mainland Asia increase throughout the year, as expected, with a mean gradient of 1.5 ppmv/year. The North American biosphere at this site makes a significant contribution to the total CO 2 CVMR, with smaller but significant contributions from Boreal Asia, Europe and mainland Asia. The different continental biosphere signals peak at different times, due to differences in seasonal cycles and atmospheric transport. Biomass burning from Boreal Asia plays only a small role in determining CO 2 CVMRs at this site, peaking in the Spring. Based on this calculation it is difficult to attribute differences between model and observed CO 2 CMVRs to bias in the magnitude or timing of different continental biosphere fluxes. However, as we discuss in the next section these subtle differences may help to spatially disagregate CO 2 fluxes using formal inverse models. Figure 5 also shows model and observed columns and CVMRs at Wendover, Utah. The nearest model grid location to this site also includes emissions from Salt Lake City. The observed seasonal cycle at this site is weaker than at WLEF, with a peak-to-peak range of 5 ppmv. SCIAMACHY (smoothed) columns have a negative bias similar in magnitude to observed columns at the WLEF site. Model and observed CVMRS are generally much noisier than at WLEF, reflecting rapid variations in relatively small values of GEOS-4 surface pressure (790-840 hPa compared with 960-990 hPa at WLEF). Apparent drawdown of observed and model CO 2 columns and CVMRs at this site is much weaker than at the WLEF site. Figure 5 shows the seasonal contributions of different monthly sources and sinks to model CVMRs >0.5 ppmv at some time during the year. As at WLEF there is a strong fuel signature originating from North America, Europe, and mainland Asia with a similar gradient through the year. From our analysis the weak seasonal cycle is determined by biospheric signals from North America, Boreal and mainland Asia, which is not obvious from interpreting variation in total CVMRs.

Implications for surface flux estimation
The ultimate goal of space-borne CO 2 data is to locate and quantify natural sources and sinks of CO 2 so that more detailed studies can assess their durability with changes in climate. Generally, an inverse model is required for that purpose. While such a study is outside the scope of this paper, and will be the subject of forthcoming work, we calculate the monthly mean Jacobian matrix corresponding to our forward model calculations to illustrate the ability of these column data to infer individual sources and sinks of CO 2 . In general the Jacobian matrix, describing the sensitivity of total CO 2 columns to changes in surface sources and sinks, attributes differences between forward model (GEOS-Chem) and observed quantities to specific surface sources and sinks.
For illustration only, Fig. 6 shows the monthly mean columns of the Jacobian matrix for North America, based on Fig. 4 and Table 1. The e-folding lifetime of individual flux contributions is typically 3 to 4 months, with e-folding lifetimes exceeding 6 months for Asian sources, consistent with Bruhwiler et al. (2005). All sensitivities converge to a background sensitivity (20×10 −5 ppmv/Tg CO 2 ) beyond which individual source and sink signatures are well mixed. The North America and Boreal Asia land biosphere signals are among the strongest signals that can potentially be retrieved independently, with maximum contributions of 3-4 ppmv to the column. New satellite instruments such as OCO and GOSAT (with a target column precision close to 1%) should be able to measure these biosphere signals when they are 3 or 4 months old and the signal strength is 1/e of its emitted value. Fossil fuel column CO 2 signals, however, contribute a maximum of 2 ppmv and consequently will fall below the instrument detection limit soon after a month.
While the initial goal of inversions of space-based CO 2 data may be to estimate total fluxes on a continental scale, it is clear from our analysis there are a number of individual source/sink signatures that are above the 1% level  Fig. 6. Monthly mean columns of the Jacobian matrix (ppmv/Tg CO 2 ), scaled by 10 5 for presentation, calculated using a priori flux estimates (Table 1) and the corresponding GEOS-Chem CVMR contributions, averaged on a 2 • ×2.5 • grid over North America during 2003 (Fig. 4). Colours denote specific months . Each point represents the monthly mean sensitivity of North American CO 2 columns to specific continental sources and sinks. Lines connecting the points have no physical significance. and therefore retrievable by OCO and GOSAT. Our calculations suggest that satellite observations of CO 2 CVMR provide greater constraints on land-biosphere fluxes than on fossil fuel emission. Our calculations also imply that regional biases on spatial scales of 100 s-1000 km, which will not necessarily be identified using spatially sparse, dedicated calibration-validation efforts (Miller et al., 2007), could potentially be identified using cross-validation techniques. Systematically inferring surface fluxes using successive subsamples of all available clear-sky data (e.g., over and downwind of continents) could potentially identify the extent of the bias, subject to limitations due to model transport error.
As we discussed earlier and show in Fig. 4 the distributions of many of the dominant flux signatures are sufficiently separated in space and time to permit independent estimation of individual fluxes. This needs to be confirmed by a rigorous inversion analysis of the signals, based on observed distributions of CO 2 columns. Certainly, many of the sources and sinks of CO 2 shown here will have much stronger signatures on finer temporal scales than studied in this work and should therefore be considered, e.g., the onset and decline of the growing season.

Conclusions
We have used the GEOS-Chem global 3-D CTM, driven by a priori sources and sinks of CO 2 , to interpret the sources and sinks of CO 2 that determine variability of CVMRs, as observed by the SCIAMACHY satellite instrument, during the 2003 North American growing season, accounting for the instrument averaging kernel. We have shown that GEOS-Chem has some skill in reproducing observed distributions of surface VMR at sites over North America during 2003 and is consistent with ground-based FTS CVMR measurements from later years. However, the model cannot reproduce the magnitude or variability of SCIAMACHY CVMRs, which is likely due to uncharacterized retrieval error and model error.
We have used a tagged approach to interpret variability of CVMRs in terms of individual source and sink terms. In general, we find local sources provide the largest contributions to CVMR variability, with the North American land biosphere representing more than 1% during peak growing season. Fuel sources are relatively constant, while biomass burning makes only a significant contribution in mid-burning season. Our calculations show that surface fluxes from Boreal Asia, mainland Asia and Europe also represent significant contributions to CVMR variability over North America, with, for instance, the Boreal Asia land biosphere responsible for almost 1% of the total CVMR in mid-summer. While there are significant overlaps in the CVMR distributions from local and non-local fluxes, there is also sufficient separation of these contributions in time and space that with careful analysis should permit independent flux estimation. Analysis of data from individual sites within the US provided further insight into the superposition of flux signatures. At the WLEF GLOBALVIEW site near Park Falls, Wisconsin we showed that the seasonal cycle (observed peak-to-peak CVMR of 13 ppmv, Yang et al., 2007) was driven by North American biospheric uptake (−4 ppmv peak) but also biospheric uptake signatures from Boreal Asia, Europe and to a lesser extent mainland Asia. In contrast, the site at Wendover Utah, with a smaller (model) peak-to-peak CVMR seasonal cycle of 5 ppmv had large contributions from biospheric uptake signatures originating from Boreal Asia and mainland Asia, both peaking in late summer with CVMRs of −2 ppmv.
Using the monthly mean Jacobian matrix we show that the e-folding lifetimes of individual CO 2 flux signatures are typically 3-4 months. Given the a) magnitude of these signatures and b) precision of new space-borne CO 2 instruments, biospheric signals <3-4 months old should still be measurable, while fuel signatures fall below the instrument detection limit soon after a month.
CO 2 flux estimation relies partly on quantifying the difference between model and observed CO 2 quantities. Prescribed error covariance matrices describe only the random error associated with the model and observations. Uncharacterized systematic error could be mis-attributed to surface source and sinks. Attempting to directly estimate systematic bias of satellite measurements of CO 2 with a model is of little value because our current quantitative understanding of the carbon cycle is incomplete. Dedicated calibrationvalidation efforts are underway for upcoming spaceborne missions. A particular focus, owing to spatial nature of the column data, is the estimation of regional biases (on spatial scales of 100 km), a length scale lying between undetectable effects due to noise and large-scale biases detectable with precise and accurate ground-based FTS (Miller et al., 2007). Unfortunately, no such measurements were available during 2003. Recent studies have shown that SCIAMACHY CO 2 columns VMRs during 2004 are within 2% of the groundbased FTS column measurements at Park Falls, Wisconsin, capturing only the monthly mean variability (Barkley et al., 2007). This suggests that CO 2 CVMR anomalies might be more effective than CO 2 CVMRs as the measurement vector. Figure A1 presents a comparison of model and GLOB-ALVIEW values (GLOBALVIEW-CO 2 , 2006) of surface CO 2 concentrations over North America during 2003. GLOBALVIEW concentrations generally represent smoothed values extracted from a curve fitted to measurement data that have been selected for conditions where the sampled air is thought to be representative of large wellmixed air parcels, which should be reproducible by global CTMs. Here, we have chosen measurement sites that have contrasting seasonal cycles and reasonable coverage by SCIAMACHY. For example, there is no data in early 2003 over Canada because of persistent cloud. In general, the 2 • ×2.5 • model has some skill in reproducing the in situ surface concentration data but there are some notable exceptions where the model overestimates observed concentrations by nearly 10 ppmv during periods of CO 2 uptake (Fraserdale and Harvard Forest) and mistimes the land biosphere uptake by a few weeks (Park Falls). As we show in Sect. 4 these examples of model error are not necessarily explained only by local North American fluxes but also by other continental fluxes. Figure A1 also compares model and SCIAMACHY CO 2 CVMRs over the same surface stations. The SCIAMACHY values have been smoothed using a 30-point running mean that significantly reduces the random noise on this measurement. Model CVMRs are generally higher than the smoothed SCIAMACHY values, consistent with the continental distributions shown in Fig. 1. The seasonal cycle of CVMRs are essentially a damped version of the surface concentration at each site, consistent with previous studies (Olsen and Randerson, 2004). The seasonal cycle observed by SCIAMACHY, even accounting for the estimated 2-5% bias (Barkley et al., 2006b(Barkley et al., , 2007, captures only the broad features, consistent with previous work (Barkley et al., 2007). Several studies have examined column CO 2 concentrations at the Park Falls site in years later than 2003 (e.g., Washenfelder et al., 2006;Yang et al., 2007), the peak-to-peak values of which are denoted by dashed lines. Model values are in agreement with previous studies that have shown that models generally underestimate the summer drawdown at this site by 20% due to too weak vertical mixing. We also compared the model with SCIAMACHY over Egbert, Canada, where there are FTS measurements of CO 2 during 2003. Our results (not shown) are consistent with those reported by Barkley et al. (2006c), with the model lying below the larger, noisier FTS CVMR measurements and approximately 10 ppmv above the smaller, noisier SCIAMACHY CVMRs.

Evaluation of GEOS-Chem North American surface CO 2 concentrations and CVMRs
We also compared model concentrations with aircraft measurements from the CO 2 Budget and Regional Airborne Study (COBRA, http://geomon-wg.ipsl.jussieu.fr/sections/ 3), have been sampled at the 10:00-12:00 local overpass time of SCIAMACHY. The grey CVMR data represent the individual pixel retrievals of CO 2 from SCIAMACHY and the blue data are the associated 30-point running mean values. The horizontal dashed lines superimposed on the CVMR comparison over Park Falls represents the observed peak-to-peak range of CVMRs (Yang et al., 2007), which we add to the model mean of CO 2 CVMR at this site over 2003. aircraftcampaigns/cobra) over North America during summer 2003 (not shown). The model was sampled along the aircraft flight tracks and resulting model and observed CO 2 concentrations were binned at 2 km intervals, where the number of measurements n was typically >2000. Over the entire campaign, we found that the model generally underestimated boundary layer drawdown by 2±3.5 ppmv (n=5389), a likely reflection of too weak model vertical mixing. There was a relatively small model bias in the free troposphere, ranging from −0.5±2 ppmv (2-4 km, n=4447) to 0.0±2 ppmv (4-6 km, n=1789), a vertical region that generally has only weak surface flux signatures. Above 6 km, the model progressively showed a positive bias from 1.2±1.4 ppmv (6-8 km, n=1767) to 2.3±1.8 ppmv (8-10 km, n=6463), which may reflect model error in describing stratosphere-trosphere exchange (Shia et al., 2006).