Conclusions

This work presents global tropospheric formaldehyde columns retrieved from near-UV radiance measurements performed by the GOME instrument onboard ERS-2 since 1995, and by SCIAMACHY, in operation on ENVISAT since the end of 2002. A special effort has been made to ensure the coherence and quality of the CH 2 O dataset covering the period 1996–2007. Optimised DOAS settings are proposed in order to reduce the impact of two important sources of error in the derivation of slant columns, namely, the polarisation anomaly affecting the SCIAMACHY spectra around 350 nm, and a major absorption band of the O 4 collision complex centred near 360 nm. The air mass factors are determined from scattering weights generated using radiative transfer calculations taking into account the cloud fraction, the cloud height and the ground albedo. Vertical profile shapes of CH 2 O are provided by the global CTM IMAGES based on an up-to-date representation of emissions, atmospheric transport and photochemistry. A comprehensive error analysis is presented. This includes errors on the slant columns retrieval and errors on the air mass factors which are mainly due to uncertainties in the a priori profile and in the cloud properties. The major features of the retrieved formaldehyde column distribution are discussed and compared with previous CH 2 O datasets over the major emission regions.


Introduction
Despite its short lifetime of about 1.5 h, formaldehyde (CH 2 O) is one of the most abundant hydrocarbons in the atmosphere and is an important indicator of non-methane volatile organic compound (NMVOC) emissions and pho-Correspondence to: I. De Smedt (isabelle.desmedt@aeronomie.be) tochemical activity. CH 2 O is a primary emission product from biomass burning and from fossil fuel combustion, but its principal source in the atmosphere is the photochemical oxidation of methane and non-methane hydrocarbons. Besides dry and wet deposition, the main removal processes during the day are two photolysis pathways (yielding CO+H 2 and CO+2HO 2 ) and oxidation by OH radicals (yielding CO+HO 2 +H 2 O). Globally, the methane oxidation represents the main CH 2 O source, accounting for more than half the global CH 2 O production, the remainder being produced by NMVOC oxidation . Over the continents, due to its short lifetime, formaldehyde is a good indicator for the emissions of short-lived NMVOCs, like isoprene and the pyrogenic NMVOCs. Therefore, satellite measurements of CH 2 O can be used to constrain NMVOC emissions in current state-of-the-art chemical transport models (Abbot et al., 2003;Palmer et al., 2003;Palmer et al., 2006;Fu et al., 2007;Shim et al., 2005).
The CH 2 O data presented in this study are derived from observations made by the GOME and SCIAMACHY instruments. Previous retrievals of formaldehyde have been obtained from GOME measurements by application of the Differential Optical Absorption Spectroscopy (DOAS) method, using three absorption bands of CH 2 O between 337 and 359 nm (Thomas et al., 1998;Chance et al., 2000;Wittrock et al., 2000;Marbach et al., 2004;De Smedt et al., 2006). These studies demonstrated the usefulness of satellite observations of CH 2 O columns as a means for providing constraints on the emissions of NMVOCs. For example, GOME CH 2 O columns have been used to derive the seasonal and interannual variability of isoprene emissions over North America Abbot et al., 2003;Palmer et al., 2006), over Southeast Asia (Fu et al., 2007) and on the global scale (Shim et al., 2005). SCIAMACHY provides nadir UVvisible radiance measurements suitable to extend GOME observations. However, until now, only one study reports its use to retrieve CH 2 O columns . This is 4948 I. De Smedt: 12-years of formaldehyde observation by GOME and SCIAMACHY The slant columns have been taken as 10 19 molec/cm 2 for O 3 , 5×10 16 molec/cm 2 for NO 2 , 10 43 molec 2 /cm 5 for O 2 -O 2 , 10 14 molec/cm 2 for BrO and 10 16 molec/cm 2 for CH 2 O. The vertical lines indicate the fitting intervals discussed in the text: 328.5-346 nm (green) and 337.5-359 nm (red). mainly due to the existence of a strong polarisation anomaly affecting the SCIAMACHY spectra around 350 nm, preventing the straightforward application of the CH 2 O retrieval settings derived from GOME. One way to sidestep this problem is to fit CH 2 O at shorter UV wavelengths, hence avoiding the unwanted polarisation signature. In this paper, we propose a new fitting window applicable to both SCIAMACHY and GOME, and we use it to generate a 12-years long data set of CH 2 O columns that consistently combines the two instruments. This CH 2 O product has been developed in the framework of the TEMIS (http://www.temis.nl) and PROMOTE (http://www.gse-promote.org) projects and is available on their websites. After a brief description of the satellite instruments in Sect. 2, Sect. 3 presents the retrieval method, more specifically, the slant column fitting and the determination of the air mass factors. Section 4 provides a detailed description of the error budget. Section 5 presents the global distribution and temporal evolution of the CH 2 O vertical columns derived in this study, as well as a brief comparison with previously retrieved datasets.
2 GOME and SCIAMACHY spectrometers GOME (onboard the ERS-2 satellite launched in April 1995) and SCIAMACHY (onboard the ENVISAT satellite launched in March 2002) are absorption spectrometers measuring sunlight back-scattered and reflected at Earth's atmosphere and surface as well as the direct solar irradiance spec-trum. Both instruments fly on polar sun-synchronous orbits. GOME measures in a wavelength range covering the UV, visible and near-infrared from 240 nm up to 790 nm, at the moderate resolution of 0.15 to 0.4 nm. In descending node, it crosses the equator at 10:30 h local time. The full width of a normal GOME scanning swath is 960 km, divided into three ground pixels. The scan measures 40 km in the direction of flight. Global coverage is obtained in three days at the equator (Burrows et al., 1999b). GOME has been recording spectra with global coverage until June 2003. It is still in operation but with limited Earth coverage over the Northern Hemisphere and part of Antarctica. The spectra of SCIAMACHY are recorded continuously between 240 and 1700 nm and in selected regions of the short wave infrared between 1900 and 2400 nm (resolution between 0.2 nm and 1.5 nm). SCIAMACHY has three different viewing geometries: nadir, limb, and sun/moon occultation. In its nadir view, the SCIAMACHY swath width is equivalent to GOME, but, in the nominal mode of observation in the CH 2 O retrieval window, it is divided into 16 ground pixels of 60 km over 30 km. The ground resolution is therefore seven times higher than in the case of GOME but with an equivalently lower signal to noise ratio due to the shorter integration time. As a consequence of the alternating nadir and limb views, global coverage is only achieved in six days. The crossing time at the equator is 10:00 h local time (Bovensmann et al., 1999). Both instruments have demonstrated their capability to observe atmospheric trace gas species like O 3 , NO 2 , BrO, SO 2 , OClO and CH 2 O Richter et al., 1998;Chance et al., 2000;Richter and Burrows, 2002;Boersma et al., 2004;Kühl et al., 2004;Martin et al., 2004;Liu et al., 2005;Khokhar et al., 2005;Richter et al., 2005a) and also, in the case of SCIAMACHY, infrared greenhouse gases like CH 4 , CO and CO 2 (Frankenberg et al., 2005;Buchwitz et al., 2005).

Slant column retrieval
Owing to the sharp absorption bands of CH 2 O in the UV, the differential optical absorption spectroscopy technique (DOAS) can be used to derive CH 2 O from satellite observations. This method consists of two independent steps. Firstly, the species concentration integrated along the viewing path of the satellite, also called slant column density, is derived from the satellite radiance using the Beer-Lambert theory applied in an optically thin atmosphere (e.g. Platt, 1994).
In this expression, I and I 0 are the measured earthshine radiance and solar irradiance spectra, respectively. N s i and σ i are respectively the slant column and the absorption cross I. De Smedt: 12-years of formaldehyde observation by GOME and SCIAMACHY 4949 Fig. 2. Comparison of GOME CH 2 O slant columns densities (SC) averaged over 1997-2002 retrieved using (a) the 337.5-359 nm fitting window and (b) the 328.5-346 nm interval introduced in this study. The absolute differences between the slant columns shown in panels (a) and (b) are displayed in panel (c). Panel (d) shows the relative differences between the fitting residuals obtained using the two retrieval windows. section of each molecule and a p are the coefficients of the polynomial function of order p accounting for terms varying slowly with the wavelength (Rayleigh and Mie scattering). The slant column densities are deduced by solving this system of linear equations. In practice, additional non-linear effects have to be taken into account like for example, wavelength shifts between I and I 0 due to the Doppler effect or induced by thermal instabilities in the instrument, or the possible contamination of measured radiances by spectral straylight, which requires the introduction of an intensity offset parameter. The Ring effect (Grainger et Ring, 1962), which arises due to inelastic scattering processes (Vountas et al., 1998), has a strong impact on spectroscopic measurements using the DOAS method, particularly for minor absorbers whose weak absorption features can be completely masked by its structures. In DOAS, the Ring effect is usually accounted for by means of a pseudo absorber also included in the fit .
Despite the relatively large abundance of CH 2 O in the atmosphere (on the order of 10 16 molec/cm 2 ) and its welldefined absorption bands, the fitting of CH 2 O slant columns in earthshine radiances is a challenge because of the low optical density of CH 2 O compared to other UV absorbers. As displayed in Fig. 1, the typical CH 2 O optical density is about 2.5 times smaller than for BrO and up to 50 times smaller than for NO 2 . Therefore, the detection of CH 2 O is limited by the signal to noise ratio of the measured radiance and by possible spectral interferences due to other molecules absorbing in the same fitting interval, mainly ozone. In this work, most DOAS settings are common to both GOME and SCIA-MACHY retrievals. The CH 2 O absorption cross sections used in the DOAS fit are those of Cantrell et al. (1990), convolved to the resolution of the instrument. The fitting procedure includes reference spectra for other interfering species (O 3 , NO 2 , BrO, OClO and the O 2 -O 2 (or O 4 ) collision complex). The Ring effect is corrected according to Chance and Spurr (1997). A linear intensity offset correction is further applied as well as a polynomial closure term of order 5. In order to minimise the impact of the GOME diffuser plate related artefact (Richter and Wagner, 2001) and to minimize fitting residuals, daily radiance spectra are used as the I 0 reference. These are selected in a region of the equatorial Pacific Ocean where the formaldehyde column is assumed to be low and stable in time . 4950 I. De Smedt: 12-years of formaldehyde observation by GOME and SCIAMACHY Fig. 3. In red are plotted the fitted CH 2 O slant columns (SC) averaged above the Sahara region as a function of the upper limit chosen for the fitting interval (the lower limit is fixed to 337.5 nm). These slant columns have been retrieved for latitudes between 15 and 30 • N, for the GOME orbit number 40585 of the 24th of January 2003 (20-30 • E). In blue is shown the absorption cross section of the O 2 -O 2 collision complex in the same wavelength region.
Taking into account the spectral and instrumental constraints, the fitting window has been chosen in order to (1) maximize the sensitivity to formaldehyde, (2) minimize the fitting residuals and the dispersion of the retrieved CH 2 O slant columns and (3) minimize the interferences with other absorbers. From inspection of the CH 2 O absorption cross sections in Fig. 1, it can be seen that a maximum of five characteristic absorption bands are available in the spectral interval from 325 to 360 nm. A compromise has to be found when defining the DOAS settings, since the fitting accuracy is limited at the shorter wavelengths by the increased O 3 absorption and the resulting interference with the formaldehyde fit, and at longer wavelengths by the interference with the SCIAMACHY polarisation feature around 350 nm (Fig. 1). Several fitting intervals have been tested as to their ability to produce consistent and stable retrievals from both GOME and SCIAMACHY. The best compromise, which maximizes the number of CH 2 O absorption bands and minimizes fitting errors while avoiding the SCIAMACHY polarisation peak around 350 nm, was found to lie in the 328.5-346 nm wavelength region. Despite the stronger interference with O 3 absorption at these wavelengths, the use of the 328.5-346 nm window improves the GOME slant columns compared to retrievals commonly performed in the 337.5-359 nm region. Previous studies using this latter window exhibited anomalous features like for example, excessively low (sometimes negative) columns above desert regions like Sahara and Australia (Wittrock et al., 2000;De Smedt et al., 2006), or systematic structures above oceans in apparent contradiction with tropospheric models (Abbot et al., 2003;  2006). The use of the newly proposed 328.5-346 nm interval leads to a reduction of the noise over oceans and brings the slant column values above desert regions at the level of the background ( Fig. 2a and b). The CH 2 O values retrieved using the two fitting windows are generally consistent within 2×10 15 molec/cm 2 , in particular above regions of enhanced emissions. The largest differences are found above desert regions, mainly in Africa, the Middle East and Australia, where they range between 2×10 15 and 10×10 15 molec/cm 2 . Fitting residuals are lower by 15% in the Tropics, but they increase more rapidly with solar zenith angle and exceed the fitting residuals of the 337.5-359 nm window poleward of 50 • because of a stronger interference with O 3 (Fig. 2c and d). Fitting tests have been performed, showing that the slant column underestimation above desert regions is not observed as long as the upper limit of the fitting window is kept under 350 nm. Furthermore, the magnitude of the negative bias above Sahara, represented as a function of the upper limit chosen for the fitting window (Fig. 3), correlates well with the O 4 absorption band peaking at 360 nm. This suggests that the low CH 2 O columns observed above deserts are most likely due to an O 4 -related fitting artefact rather than a real particularity of the formaldehyde distribution. Desert regions are generally cloud-free and therefore, they correspond to the highest O 4 columns observed globally. The known uncertainties of the O 4 cross sections are expected to significantly impact the quality of the CH 2 O retrievals in these regions, especially when the strong absorption band of O 4 at 360 nm is included in the fit, even partially. I. De Smedt: 12-years of formaldehyde observation by GOME and SCIAMACHY 4951 SCIAMACHY CH 2 O columns are found to be consistent with GOME ones when retrieved in the same window. Among other fitting windows tested for SCIAMACHY, the 328.5-346 nm interval displays the lowest differences with GOME, as verified over the six first months of 2003, when the two satellite measurements globally overlap. This wavelength interval also provides the lowest dispersion in the SCIAMACHY results. Examples of CH 2 O optical density fits are shown in Fig. 4 for a GOME pixel in September 1997 and a SCIAMACHY pixel in September 2006, both located over Indonesia during fire episodes. From photon statistical consideration, when the instruments perform at best, the ratio between the standard deviations should be about 0.38 because the SCIAMACHY ground pixels are 7 times smaller compared to GOME. We find a ratio of 0.4, the SCIA-MACHY RMS being 60% higher than the one of GOME. This demonstrates that the noise in the retrieved columns is consistent with the signal to noise ratio of the instruments. Figure 5 shows the slant columns for two overlapping orbits of GOME and SCIAMACHY over Eastern China, on the 14th of April 2003. The mean CH 2 O values agree very well and, in this case, the standard deviation of the SCIA-MACHY slant columns exceeds GOME by only 30%, as a result of the long term degradation of the GOME instrument from 1996 until 2003 (see Sect. 4.2).

Reference sector correction
Because a daily radiance spectrum is used as control spectrum for the DOAS fit, the retrieved slant columns actually represent the difference in slant column with respect to the slant column contained in the control spectrum. Hence the quantity of formaldehyde present in the reference spectrum must be estimated on a daily basis, based on suitable external information. Furthermore, an analysis of the raw slant column measurements reveals obvious zonally and seasonally dependent artefacts. Indeed, over the remote Pacific Ocean, the formaldehyde columns are often found to differ from the background levels attributed to methane oxidation by more than 1×10 16 molec/cm 2 . This is especially clear at latitudes higher than 60 • and in March. This behaviour is caused by unresolved spectral interferences with ozone and BrO absorptions, which are very important at high solar zenith angles in spring. Fortunately, these interferences are small in regions where the sources of formaldehyde are significant. However, to reduce the impact of these artefacts, an absolute normalisation is applied on a daily basis using the reference sector method (Khokhar et al., 2005). The reference sector is chosen in the central Pacific Ocean (140-160 • W), where the only significant source of CH 2 O is the CH 4 oxidation. The latitudinal dependency of the CH 2 O slant columns in the reference sector is modelled by a polynomial, subtracted from the slant columns and replaced by the CH 2 O background taken from the tropospheric 3-D-CTM IMAGES (Müller and Stavrakou, 2005) in the same region. A similar normalisation approach has been applied in previous studies (e.g. Abbot et al., 2003).

Air mass factors determination
The second step in the retrieval of tropospheric CH 2 O total columns is the calculation of the air mass factors necessary to convert slant columns into corresponding vertical columns. The AMF is defined by the ratio of the slant column to the vertical column. In the troposphere, scattering by air molecules and variable amounts of clouds and aerosols reduces the penetration of the UV radiation leading to complex altitude-dependent enhancement factors. Full multiple scattering calculations are therefore required for the determination of the column-averaged AMF, and the vertical distribution of the absorber has to be known a priori. Nevertheless, in the case of weak absorbers like CH 2 O, the calculation can be simplified by introducing the concept of scattering weights which are independent of the trace gas concentration profile . The total column air mass factor can be obtained from the formula: (2) where the scattering weight W represents the sensitivity of the satellite measurements to the molecule concentration in function of the altitude z. W depends on the observation geometry (µ 0 and µ are the cosine of the solar and viewing zenith angles respectively) and on the scattering properties of the atmosphere, the cloud fraction and cloud height (C f and C h ), the surface albedo (a s ) and the surface altitude (h s ). In this work, scattering weights have been evaluated from radiative transfer calculations performed with a pseudo-spherical 4952 I. De Smedt: 12-years of formaldehyde observation by GOME and SCIAMACHY version of the DISORT code (Kylling et al., 1995). The scattering properties of the atmosphere have been modelled at 340 nm for a number of representative viewing geometries, surface albedos and ground altitudes, and stored in a look-up table. A cloud correction is applied based on the independent pixel approximation . The cloud fraction, cloud height and cloud albedo (a cloud s ) are obtained from the FRESCO v5 cloud product, which provides Lambertianreflecting cloud heights (Koelemeijer et al., 2002). FRESCO is delivered on the TEMIS project website (www.temis.nl). The distribution of surface albedo is taken from the climatology of Koelemeijer et al. (2003), which provides monthly Lambert-equivalent reflectivity at 335 nm. No explicit correction is applied for aerosols, but their impact is partially accounted for by the cloud correction scheme, as further explained in Sect. 4.4. In Eq. (2), the shape factor S represents the normalised profile of the absorbing molecule. Because of the short lifetime of CH 2 O, there is no transport in the stratosphere and the shape factors vanish at lower pressures. In this study, the monthly output of an updated version of the IMAGES chemical transport model (Müller and Stavrakou, 2005) has been used to specify the vertical profile of the CH 2 O distribution. IMAGESv2 provides the global distribution of 68 chemical compounds at a resolution of 5 degrees with 40 vertical sigma-pressure levels between the surface and the pressure of 45 hPa. The chemical mechanism of the model has been optimized with respect to CH 2 O production from isoprene and from pyrogenic NMVOCs based on a detailed box model study using the Master Chemical Mechanism (MCM, Saunders et al., 2003) and an updated speciation profile of pyrogenic emissions (Andreae and Mer-let, 2001;Andreae, personal communication, 2007), as described in Stavrakou et al. (2008). The advection is driven by monthly wind fields from ECMWF analyses. The impact of wind variability at short timescales is represented using horizontal diffusion coefficients estimated from the ECMWF wind variances. The parameterization of convective transport uses the ERA40 updraft mass fluxes until 2001, and a climatological mean beyond this year. Anthropogenic emissions are taken from EDGAR v3.3 inventory for 1997 (Olivier et al., 2001). The biomass burning emissions are based on the GFEDv1 inventory, or alternatively the GFEDv2 inventory for burnt biomass (van der Werf et al., 2004 and2006). A diurnal cycle of biomass burning emissions based on Giglio (2007) is applied in the diurnal cycle calculations. Biogenic isoprene emissions are taken from the GEIA inventory (Guenther et al., 1995), or alternatively from an inventory based on the MEGAN emission model  driven by ECMWF meteorological fields, as described in Muller et al. (2007). In this work, the GFEDv1 inventory and the GEIA inventory were used to run the IM-AGES model.Vertical CH 2 O profile shapes are provided on a monthly basis between 1997 and 2006 and interpolated for each satellite geolocation. The IMAGES profiles have been validated with various aircraft measurements like the INTEX-A campaign over Northern America in July-August 2004 . In Fig. 6, the IMAGESv2 profiles are compared with the mean observed vertical distribution during this campaign. The modelled mixing ratios at altitudes higher than 1.5 km essentially lie between the values defined by the two datasets, whereas over the continental boundary layer they are more than 10% higher than the NCAR measurements .
Averaging kernels (A) are particularly useful when comparing measured columns with e.g. model simulations, because they allow removing the effect of the a-priori profile shape information used in the retrieval (see e.g. Rodgers, 2000;Boersma et al., 2004). They have been calculated following the definition for column observations of optically thin absorbers (Eskes and Boersma, 2003b). They are provided for each satellite pixel, together with an error budget (see next section).

Error analysis
4.1 Expression of the total error on the vertical column A formulation of the error can be derived analytically starting from the equation of the vertical column which directly results from the different steps detailed in Sect. 3: In this expression, N s represents the difference between the fitted slant column N s and the mean slant column in the reference sector N s0 . N v0 CTM is the IMAGES background in the same region. M is the air mass factor. As these terms are determined independently, they are assumed to be uncorrelated and the total error on the tropospheric vertical column can be expressed as : In our case, it results from Eq. (3) that the total error can be derived from the following expression: σ N s rand and σ N s syst are the random and systematic parts of the error on the slant columns. The random error is reduced when the number of measurements increases. Therefore, in case of averaging, σ N s rand can be divided by the square root of the number of satellite pixels taken into the mean (N). σ M is the error on the air mass factor evaluation and σ N v CTM 0 the error on the reference sector correction. Theses two latter sources of uncertainties have systematic but also random components that may average out in space or in time. However, these components can hardly be separated in practice and we will consider these uncertainties as systematic. The total error calculated with Eq. (5) is therefore an upper limit of the real error on the vertical columns.

Error on slant columns
In regions of weak variability in the CH 2 O column, the random error on the slant columns can be expressed by the standard deviation of measured columns around the mean. On the other hand, and provided that fitting residuals are dominated by shot noise in the detection system, the random error on the slant columns can be estimated from the properly weighted root mean square (RMS) of the fitting residuals (Taylor et al., 1982). We use RMS values as an estimate of the random error. Due to the degradation of the GOME instrument throughout the years and the resulting lower signal to noise ratio of the spectra, σ N s rand has been found to increase from 4×10 15 molec/cm 2 in 1996 up to 6×10 15 molec/cm 2 in 2003. For SCIAMACHY, σ N s rand reaches 1×10 16 molec/cm 2 because of the poorer signal to noise ratio associated to the shorter integration time of this instrument. For single pixels, the random error on the slant columns is the most important source of error on the total vertical column. It can be reduced by averaging, but of course to the expense of a loss in time and/or spatial resolution.
The systematic errors on the slant columns are largely connected to uncertainties on the absorption cross sections included in the fit, although other sources of systematic errors also contribute to the error budget, like uncertainties related to instrumental effects (wavelength calibration, spectral stray light, slit function. . . ), or systematic misfit effects when for example, the absorption of the ozone molecule becomes so strong that the DOAS approximation of an optically thin atmosphere is not fully satisfied anymore. Systematic slant column errors due to absorption cross section uncertainties and their correlations can be estimated using the Rodgers formalism (Rodgers et al., 2000) for a linear over-constrained problem like the DOAS inversion (Theys et al., 2007): In this equation, the species index j (=1,. . . ,m) runs over all molecules taken into account in the fit. G is the contribution matrix of the DOAS retrieval, constructed as G= K T K −1 K T where K (n×m, with n the number of wavelengths in the fitting interval) is a matrix formed by the absorption cross sections. N sj is the fitted slant column of the j th molecule and S bj (n×n) is the cross section error covariance matrix. For CH 2 O, the absorption cross section of Cantrell et al. (1990), recommended in the HITRAN database, has been used. This data set presents a resolution four times better in comparison to the alternative data set of Meller and Moortgat (2000). Recently, however, the use of the Meller and Moortgat (2000) cross section has been recommended by Gratien et al. (2007) based on a study of the consistency between UV and infrared absorption spectra of CH 2 O. We find that at GOME resolution and in the wavelength range of interest for this study, the Cantrell et al. (1990) and the Meller and Moortgat (2000) data sets differ by less than 12%, and we adopted this difference as an estimate of the systematic error on the CH 2 O differential cross www.atmos-chem-phys.net/8/4947/2008/ section used in our work. For the other absorbers included in the fit, only errors affecting the shape of the differential cross sections will have a significant impact on the formaldehyde column retrieval. Estimates of such uncertainties have been taken from the literature and are summarized in Table 1. Alternative cross section datasets have been selected for each molecule and sensitivity tests have been performed in order to validate the results obtained with this method. Note that the two methods (sensitivity analysis versus matrix analysis) were found to lead to consistent error estimates. However, although this formalism accounts for correlation effects between the absorption cross sections, it does not include sources of systematic error besides the cross section uncertainties. Therefore an additional contribution taken as 12% of the slant columns has been included to account for other sources of error. This 12% value has been derived from retrieval tests, where the fitting windows (328.5-346 nm and 337.5-359 nm), the calibration options (e.g. the type of the slit function), and the polynomial order for the stray light correction were varied respectively. The total systematic error budget on the SCIAMACHY CH 2 O slant columns, resolved into its various contributions, is represented in Fig. 7

Error on the reference sector correction
The last term of Eq. (5) represents the uncertainty due to the reference sector correction. It has been evaluated as the monthly averaged differences between the IMAGES columns and the columns estimated using another tropospheric model, namely the TM model (Eskes et al., 2003a), above the reference sector from 1997 to 2002. The errors range between 0.5 and 2×10 15 molec/cm 2 . Therefore, σ N v 0 CTM is small compared to other error sources (see Fig. 10).

Error on the air mass factor
Following the definitions in Sect. 3.3, the error on the column-averaged air mass factor depends on input parameters uncertainties and on the sensitivity of the air mass factor to each of them : where σ a s , σ C f , σ C h , σ S represent respectively the uncertainties on the surface albedo, the cloud fraction, the cloud height and the profile shape, all taken from the literature or derived from comparisons with independent data, as summarized in Table 2. The errors on the solar angles, the viewing angles and the ground altitude are here supposed to be negligible. In this expression, we have not considered the impact of possible correlations between uncertainties on parameters like the surface albedo and the cloud top height . Hence we implicitly make the assumption that such uncertainties are still random in their relative behaviour. The sensitivities, i.e., the air mass factor derivatives with respect to the different input parameters, have been evaluated for different representative conditions of observation and for different profile shapes. Figure 8a shows the air mass factor dependence on the ground albedo for two different formaldehyde profile shapes (in blue: profile typical of remote area, in red: typical of emission regions). The sensitivity, i.e. the slope, is almost constant with albedo, being only slightly higher for low albedo values. As expected, the air mass factor sensitivity to albedo is found to be higher for an emission profile peaking near the surface than for a background profile more spread in altitude. As illustrated in Fig. 8b, the sensitivity to the cloud fraction is significant only when the cloud lies below the CH 2 O layer. Indeed, for this figure, the cloud altitude (1 km) is located above the formaldehyde maximum in the polluted case (in red) but below the main formaldehyde layer in the background case (in blue). Figure 8c shows the air mass factor variation with cloud altitude. The slope is large when the cloud height is located below or at the  (Millet et al., 2006;Singh et al., 2006) or TRACE-A in the Tropics in 1992 (Fishman et al., 1996;Emmons et al., 2000). The contribution of each parameter to the total air mass factor error depends on the observation conditions. Figure 9 displays the total error on the air mass factor as a function of the cloud fraction in two cases: when (a) the a cloud altitude is at 8 km or (b) at 1 km. The contribution from each parameter to the total error is also displayed. For clear sky conditions, the total air mass factor error is estimated to be 18% with equal contributions from the albedo and the profile shape uncertainties. For a cloud fraction of 0.5, the total error reaches 30% for a high cloud and 50% for a low cloud. The most important error sources are uncertainties on the cloud altitude and on the profile shape for low clouds (<2 km), particularly over emission regions. Other sources, like the albedo and the cloud fraction uncertainties represent smaller contributions to the total error on the air mass factor.
In this study, we have not explicitly considered the effect of aerosols on the air mass factors. To a large extent, however, the effect of the non-absorbing part of the aerosol extinction is implicitly included in the cloud correction . Indeed, in the presence of aerosols, the cloud detection algorithm FRESCO is expected to overestimate the cloud fraction. Since non-absorbing aerosols and clouds have similar effects on the radiation in the UV-visible range, the omission of aerosols is partly compensated by the overestimation of the cloud fraction, and the resulting error on air mass factor is small, typically below 16% Millet et al., 2006). In some cases however, the effect of clouds and aerosols will not be the same. For example, when the cloud height is significantly above the aerosol layer, FRESCO clouds will be shielding while the aerosol amplifies through multiple scattering. This will result in an underestimation of the AMF. Absorbing aerosols have also a different effect on the air mass factors, since they tend to decrease the sensitivity to CH 2 O concentration. In this case, the resulting error on the air mass factor can be as high as 30% Martin et al., 2003). This may, for example, affect significantly our derivation of CH 2 O columns in regions dominated by biomass burning as well as over heavily industrialized regions.

Total error on the CH 2 O vertical column
From Eq. (5), the total error on the vertical column can be calculated for each satellite pixel. Figure 10 shows the total error on the monthly and zonally averaged vertical columns, as a function of latitude, calculated for March 2003. The number of SCIAMACHY pixels involved in these averages being generally large, the contribution from slant column random errors is below 2%. Note that the total error is generally much larger (about 70%) and dominated by the random error when considering individual pixels. The error due to the reference sector correction ranges from 5 to 12%. At low and mid latitudes, the contributions from the slant columns and from the air mass factor uncertainties are equivalent, with values between 10 and 20%. At higher latitudes, the error due to slant column uncertainties is larger (15-40%) mainly due to the typically larger ozone concentrations. The total error on CH 2 O monthly means generally lies between 20 and 40% for both GOME and SCIAMACHY data. This estimation is in agreement with the value of 25-30% deduced by Millet et al. (2006) who compared air mass factors calculated with the GEOS-Chem model and air mass factors derived from aircraft measurements over North America in summer 2004.

GOME and SCIAMACHY tropospheric vertical columns
The global distribution of the yearly averaged CH 2 O columns from GOME and SCIAMACHY between 1997 and 2007 are displayed in Fig. 11 and Fig. 12. Both instruments capture the regions of elevated NMVOC emission which are mainly located in the Tropics, in particular over Amazonia, Africa and Indonesia. The Eastern United States and South eastern Asia are also major regions of emission. The better resolution of SCIAMACHY allows observing more localized hotspots like for example the highly industrialised Highveld region in South Africa (Wenig et al., 2003). In South America, the impact of the South Atlantic Anomaly (SAA), i.e., the larger scatter in CH 2 O columns which results from exposure of the satellite instrument to enhanced radiation and high energy particles, is more pronounced in the case of SCIA-MACHY. The reason for the difference of behaviour between GOME and SCIAMACHY is currently not fully understood. It might be related to differences in the design of the detectors used in both instruments (mainly the electronic gain and the instrument throughput for photons). Alternatively differences in the instruments shielding could possibly explain the observed different sensitivities to SAA. Regional monthly means of CH 2 O columns over regions characterized by large NMVOC emissions are presented in Fig. 13 (for the American and African continents) and Fig. 14 (for Asia, Europe and Australia). The region boundaries are displayed on Fig. 15. Only pixels having a cloud fraction lower than 40% have been selected. The general features of the seasonal cycle of formaldehyde columns are generally well reproduced during the whole period, suggesting that the SCIAMACHY data consistently extends the GOME time series despite their larger random errors. In South America, Africa or tropical Asia, the agreement between the GOME and SCIAMACHY columns averaged over the whole period of measurement is better than 5%. In mid-latitudes regions, the winter CH 2 O columns of SCIAMACHY present an offset compared to GOME, the maximum offsets being found in Europe, Eastern United States, Asia or Australia (around 20%). The reason for this offset is currently unknown but it is below the estimated errors on the vertical columns which round around 30% in these regions, considering that the AMF uncertainty is common in the two retrievals. The distribution of the formaldehyde columns is detailed in the next subsections for each region of interest. A comprehensive comparison between this dataset and the formaldehyde columns simulated with the IMAGES global chemical transport model over regions dominated by biomass burning and biogenic sources is presented in Stavrakou et al. (2008).

America
Over the South Eastern United States, the formaldehyde columns show a strong seasonal variation primarily related to the oxidation of biogenic VOCs (mainly isoprene) emitted during the summer season (Abbot et al., 2003;Stavrakou et al., 2008). The monthly means range from 3×10 15 molec/cm 2 in winter to a maximum of about 13×10 15 molec/cm 2 in summer (Fig. 13a). The GOME and SCIAMACHY CH 2 O summer columns present a remarkable agreement better than 1.5×10 13 molec/cm 2 but the SCIA-MACHY winter values present a positive offset of about 0.9×10 15 molec/cm 2 compared to the GOME values. These results agree qualitatively, if not quantitatively, with the series of investigations conducted by the modelling team of Harvard University on the basis of another GOME retrieval Abbot et al., 2003;Martin et al., 2004;Palmer et al., 2006). The amplitude of the seasonal variation in the CH 2 O columns derived in our study over North America in 1997 is smaller than in the Harvard retrieval (Chance et al., 2000), with values about 2×10 15 molec/cm 2 higher in winter and 4×10 15 molec/cm 2 lower in summer. The interannual variability of the summer columns is very consistent in both datasets between 1996 and 2001, except for the 20% to 30% lower values in our data. It is worth noting that aircraft measurements performed over Texas in the summer of 2000 suggested an overestimation of the Harvard GOME data of about 5.5×10 15 molec/cm 2 (Martin et al., 2004). The GOME-derived isoprene fluxes over North America have been found to be roughly consistent with the flux estimations of the MEGAN model , although with GOME emissions typically 25% higher or lower at the beginning or the end of the growing season, respectively (Palmer  , 2006). In contrast with this result, the modelling study of Stavrakou et al. (2008) using the IMAGES model and the formaldehyde dataset presented in this study concludes that the MEGAN isoprene fluxes might be largely overestimated in this region. Although the main reason for this discrepancy is the lower CH 2 O columns derived in the present study, it is also partly due to model differences, in particular regarding the chemical mechanism and the formaldehyde yield in the oxidation of isoprene by OH, about 20-30% higher in IMAGES than in the GEOS-Chem model .
Over Tropical America ( Fig. 13b and Fig. 13c), and more generally over tropical forests and savannas, biomass burning emissions contribute significantly, although sporadically, to CH 2 O abundances.
The spectacular enhancement of the monthly averaged vertical column over Guatemala in May 1998 is an extreme case, since the formaldehyde column at this time (27×10 15 molec/cm 2 ) was a factor of 5 higher than the usual wet season values (5×10 15 molec/cm 2 between November and February), and about a factor of 2 higher than the dry season peaks of other years. Over Amazonia (Fig. 13c), the CH 2 O columns lie between 8×10 15 molec/cm 2 during the wet season and 20×10 15 molec/cm 2 during the fire season extending generally from August to November. In these two regions, no significant offset is found between the GOME and SCIA-MACHY datasets. Generally, in South America (except in the regions influenced by the South Atlantic anomaly), the retrieved formaldehyde columns show high correlation coefficient values (∼0.9) with the simulations of the IMAGES CTM when the recent MEGAN ECMWF biogenic emission inventory  and the GFEDv1 fire emissions (van der Werf et al., 2004) are used in the model 4958 I. De Smedt: 12-years of formaldehyde observation by GOME and SCIAMACHY  a, b, c) and African (d, e, f) continents. GOME data are in blue and SCIAMACHY data are in red. The monthly errors on the vertical columns are displayed in light colour. (Stavrakou et al., 2007 andMüller et al., 2008). Our retrieved columns in these regions agree within 10% with the SCIAMACHY CH 2 O columns derived at Univ. of Bremen for the year 2005 , except for the absence in our dataset of CH 2 O enhancements above the sea, like for example around Mexico (Fig. 11). However, our columns are about 30% lower than the corresponding values of the Harvard dataset used by Shim et al. (2005) to constrain isoprene emissions at the global scale. Shim et al. (2005) deduced a posteriori isoprene emissions increased by 34% compared to their prior values.

Africa
The seasonal variations of CH 2 O abundances in Northern Africa (Fig. 13d) and in Southern Africa (Fig. 13f) are quite similar, except for a time shift of 6 months. The amplitude of the seasonal cycle is larger in the Southern region, where the values increase by a factor of two between the dry season and the wet season. As in other Tropical regions, biomass burning is very probably the main cause of the dry season maximum. Over Tropical forests, however, biogenic isoprene emissions are also expected to peak during the dry season  and could therefore contribute to the observed dry season enhancement of the formaldehyde columns. In Equatorial Africa (Fig. 13e), the CH 2 O columns present two maxima every year, in February and in July-August, corresponding to an equatorial local climate with two dry seasons and two wet seasons . The interannual variability is weak, the columns being always comprised between 10 and 20×10 15 molec/cm 2 . In the Southern African region, a small offset between the GOME and SCIAMACHY columns is found during the wet sea- son (0.7×10 15 molec/cm 2 ). Our CH 2 O data over Africa are about 25% lower than in the GOME Harvard dataset between September 1996 and August 1998 (Shim et al., 2005) but 15% higher than the SCIAMACHY Bremen values in 2005 . Above the Sahara, our CH 2 O values are at the same level as the oceanic background, in contrast with previous studies which found lower values above deserts (Shim et al., 2005;Wittrock et al., 2006). This difference is seemingly due to differences in the fitting window used in the retrievals (see Sect. 3.1). This behaviour is more in agreement with simulations of tropospheric models like TM , GEOS-Chem (Shim et al., 2005) or IMAGES .

Asia
Like over the Eastern United States, the CH 2 O columns over Southern China (Fig. 14h) show a strong seasonal variation mainly associated with biogenic VOC emissions, with values ranging from 5×10 15 molec/cm 2 in winter to 12×10 15 molec/cm 2 in summer. In this region, our data are about 20% lower than the GOME Harvard dataset in summer (Fu et al., 2007). An offset is also noted between the GOME and SCIAMACHY winter values, of the order of 1.3×10 15 molec/cm 2 . Furthermore, an apparent trend in the GOME winter minima is found between 1996 and 2000, corresponding to an increase of 16% of the columns during that period. It would be tempting to attribute this increase to anthropogenic emissions, in a country where NOx emission trends could be detected from space borne NO 2 measurements (Richter et al., 2005b;. However, as seen in Fig. 14, similar positive trends of formaldehyde columns are also found over Europe and Eastern US (where anthropogenic emissions are not expected to increase), suggesting that such trends are probably artefacts, possibly due to instrumental degradation effects. The impact of the long-term degradation of GOME might possibly be enlarged over these mid-latitude regions, especially during winter, due to the high solar zenith angles encountered by the satellite and the corresponding measurement noise increase at low photon rates.
Over Thailand (Fig. 14i), the formaldehyde maxima occurring between February and April each year range most often between 15 and 20×10 15 molec/cm 2 , more than a factor of two above the levels found during the rest of the year. Over Indonesia (Fig. 14j), the CH 2 O levels show an unusually strong interannual variability related to the massive VOC emission associated to forest fires during El Niño events like in 1997-1998, 2002 and 2006. An offset of 0.9×10 15 molec/cm 2 is found between GOME and SCIA-MACHY. In spite of an excellent agreement with the IM-AGES model results regarding the timing of the maxima (correlation coefficient of 0.9, see Stavrakou et al., 2007 and, the retrieved columns during the large Indonesian fires in 1997 do not match the high level produced by the model (about 30×10 15 molec/cm 2 ). This discrepancy could result from an overestimation of the emissions in the model calculations. It is also possibly related to the effect of absorbing aerosols neglected in the air mass factors calculation, which could have a significant impact on the retrieved total vertical columns in the case of strong fires.
In India (Fig. 14g), maximum values around 12×10 15 molec/cm 2 are found in the period May-July. They are attributed to biomass burning (Fu et al., 2007). A second maximum of smaller amplitude (10×10 15 molec/cm 2 ) is found in September-October, and is attributed to biogenic emissions (Fu et al., 2007). The minimum values are of the order of 6.5×10 15 molec/cm 2 . Our yearly averaged columns over India for 2005, about 8-9×10 15 molec/cm 2 , are well above the values (<6×10 15 molec/cm 2 ) determined by Wittrock et al. (2006). Our CH 2 O values agree with the Harvard dataset (Fu et al., 2007) during winter, but they are again 20% lower in summer.

Australia
As shown in Fig. 14l, this region presents summertime maxima (November-February) typical of biogenic emissions. Their values, around 9×10 15 molec/cm 2 , are somewhat lower than in other regions dominated by biogenic emissions. A systematic offset of about 10 15 molec/cm 2 is found between the GOME and SCIAMACHY winter values. In this region, we find an important negative bias (70%) with respect to the Harvard dataset and a good agreement with the Bremen results (within 15%), except that, as previously noted in the case of Mexico, the land/sea contrast observed by Wittrock et al. (2006), with higher values over sea, is not found in our dataset.

Europe
Over Europe, the columns range typically from 3.5×10 15 molec/cm 2 in winter to 7×10 15 molec/cm 2 in summer during the GOME measurement period. The SCIAMACHY data are noisier and do not present a clearly defined seasonal cycle. Furthermore, they display an offset of 1.3×10 15 molec/cm 2 compared to GOME's. In general, the amplitude of the offset found between the winter columns of GOME and SCIAMACHY increases with latitude. The value of this offset over Europe is still largely below the estimated errors on the vertical columns. The SCIAMACHY data are possibly of less good quality at higher solar zenith angles because of the poorer signal to noise ratio. Although there seems to be a positive trend in the GOME columns, it is not present in the SCIAMACHY data.

Conclusions
A twelve year dataset of CH 2 O vertical columns has been developed on the basis of GOME and SCIAMACHY radiance measurements. The retrieval of slant columns relies on the DOAS technique. The measured radiances are fitted in a wavelength window (328.5-346 nm) shifted to shorter wavelengths compared to previously used retrieval settings. Although this window is possibly less appropriate in case of high ozone absorption (i.e. at high latitudes), it effectively reduces the impact of the polarisation anomaly affecting the SCIAMACHY spectra and it moves the fit away from an O 4 absorption band which appears to be a significant source of bias in CH 2 O retrievals. The improved quality of the resulting CH 2 O retrieval at low latitudes is demonstrated by a reduction of the fitting errors, as well as by the disappearance of disturbing features found in previous retrievals, like very low or negative columns over deserts.
The CH 2 O vertical columns are provided with their averaging kernels and an error estimate based on a detailed analysis. For individual satellite measurements, the random error on the slant column is the largest source of uncertainty. However for those applications where spatial and time resolution can be compromised, average in time and over larger regions allows reducing this contribution to a negligible value. At high latitudes, the systematic error due to the interference with ozone is dominant. At lower latitudes, the total error is mainly due to uncertainties on the formaldehyde absorption cross sections and to the impact of clouds, aerosols and profile shape uncertainties on the air mass factors. Pixels with cloud fractions above 40% and low cloud altitudes are characterized by very large errors (>50%) and should not be considered for quantitative analysis. The effect of aerosols on the error remains to be better quantified since it may have an important impact on the retrieved CH 2 O columns in case of strong biomass burning events. This dataset has been compared with previous retrievals over the major CH 2 O source regions. Over the Eastern United States and over Southern China during summertime, as well as over most Tropical forests, the columns retrieved in this study are found to be 20-30% lower than in the dataset of Chance et al. (2000). The consequences of this finding on the global budget of reactive organic compounds will be explored in further inverse modelling studies based on the dataset presented in this study.
The differences between the existing satellite datasets stress the need for a detailed comparison of the formaldehyde retrievals in order to ascertain the reliability of the inverse modelling results. Furthermore, validation using independent correlative data set is needed. Although adequate large scale validation means are currently lacking, ground-based measurements obtained with FTIR and MAX-DOAS instruments are becoming increasingly available (see e.g. Heckel et al., 2005;Jones et al., 2008) and will be used in future studies to validate the satellite results.