Assimilation of IASI satellite CO fields into a global chemistry transport model for validation against aircraft measurements

This work evaluates the IASI CO product against independent in-situ aircraft data from the MOZAIC program and the POLARCAT aircraft campaign. The validation is carried out by analysing the impact of assimilation of eight months of IASI CO columns retrieved for the period of May to December 2008 into the global chemistry transport model LMDz-INCA. A modelling system based on a sub-optimal Kalman filter was developed and a specific treatment that takes into account the representativeness of observations at the scale of the model grid is applied to the IASI CO columns and associated errors before their assimilation in the model. Comparisons of the assimilated CO profiles with in situ CO measurements indicate that the assimilation leads to a considerable improvement of the model simulations in the middle troposphere as compared with a control run with no assimilation. Model biases in the simulation of background values are reduced and improvement in the simulation of very high concentrations is observed. The improvement is due to the transport by the model of the information present in the IASI CO retrievals. Our analysis also shows the impact of assimilation of CO on the representation of transport into the Arctic region during the POLARCAT summer campaign. A considerable increase in CO mixing ratios over the Asian source region was observed when assimilation was used leading to much higher values of CO during the cross-pole transport episode. These higher values are in good agreement with data from the POLARCAT flights that sampled this plume.


Introduction
Carbon monoxide (CO) is a good tracer of atmospheric pollution as its lifetime allows plumes to be transported over long distances.It is emitted by the incomplete combustion of fossil fuel and biomass, and is also produced by the oxidation of methane and biogenic non-methane hydrocarbons.The latter production pathway contributes about half of the background CO (Duncan et al., 2007).CO impacts the overall global oxidising capability of the troposphere.It is an important precursor of the photochemical production of ozone Published by Copernicus Publications on behalf of the European Geosciences Union.
in polluted air, and it is an important sink of the hydroxyl (OH) radical in low NO x environment (Logan et al., 1981).
In the last decades several air and space-borne instruments provided information on the distribution of chemical species in the troposphere, leading to an improved understanding of the chemical and transport processes as well as emissions.In particular, tropospheric CO columns and profiles have been obtained from polar orbiting satellites (e.g.Gloudemans et al., 2006;Luo et al., 2007;Clerbaux et al., 2008;George et al., 2009;Deeter et al., 2010;McMillan et al., 2005) and from aircraft data either routinely available (Cammas et al., 2009) or obtained during specific campaigns (e.g.McMillan et al., 2008;Paris et al., 2009;Warneke et al., 2009).Moreover, a large multi-year database with in situ CO surface measurements has been obtained from ground-station networks (e.g.Novelli et al., 2003;Zander et al., 2008).Numerous modelling studies have been confronted with the available observations leading to additional insight into the processes leading to the distribution and transport of chemical species (e.g.Klonecki et al., 2003;Shindell et al., 2008).
Data from aircraft routine and mission flights as well as data from ground stations suffer from limited spatial coverage.Only satellite missions can provide us with global coverage, however, even for the satellite-based instruments, the availability of data is limited to the periods of the overpasses of the satellites and to the periods with favourable geophysical conditions that allow carrying out the retrievals (e.g.clear sky conditions).As previous studies have shown, the joint use of satellite and aircraft data allows to better understand the pollution transport pathways (e.g.Pommier et al., 2010).Including models brings additional insight either by using direct model simulations (e.g.Sodemann et al., 2011), or by applying data assimilation (e.g.Lamarque et al., 2004).
Assimilation techniques offer a powerful tool to propagate in space and time the information provided by the satellites.The resulting product is global and continuous with spatial resolution corresponding to the resolution of the model.The problem of lack of coincident data in space and time (e.g.Pommier et al., 2010) is thus removed.In particular, the availability of vertical assimilated profiles facilitates the comparison with in situ aircraft measurements taken at various altitudes (comparing directly with satellite products requires often a vertical integration of incomplete in-situ profiles).Besides providing global concentrations, the assimilation techniques are also used to constrain surface emissions of chemical tracers as well as to optimise parameters in physical process based emission models (e.g.Elbern and Schmidt, 1999;Fortems-Cheiney et al., 2009).
Various assimilation techniques exist to assimilate chemical tracers in chemistry-transport models (CTM).In this work we use a sequential Kalman filter approach (Katthatov et al., 2000) that was previously used to assimilate MO-PITT (Measurement Measurements Of Pollution In The Troposphere) and IMG (Interferometric Monitor for Greenhouse gases) satellite CO data into the MOZART (Model of Ozone and Related chemical Tracers) (Lamarque et al., 2004;Clerbaux et al., 2001) and MOCAGE (MOdèle de Chimie Atmosphérique à Grande Echelle) CTM (Pradier et al., 2006).Here, the CO fields derived from IASI measurements are assimilated into the LMDz-INCA CTM model (Hauglustaine et al., 2004).The resulting fields are compared with independent aircraft observations from the MOZAIC (Measurements of OZone and water vapour by AIrbus in-service air-Craft) regular data and the POLARCAT (Polar Study using Aircraft, Remote Sensing, Surface Measurements and Models, of Climate, Chemistry, Aerosols, and Transport) Arctic campaigns.
The manuscript is organized as follows: Sect. 2 describes the CO data as retrieved from the IASI observations.Section 3 details assimilation routines that were implemented in the LMDz-INCA model.Section 4 reports on the impact of assimilation on the column CO while Sect. 5 focuses on the validation of the retrieved product through a comparison of the assimilated product against available aircraft data.Section 6 contains a summary.

IASI measurements and CO retrievals
The IASI (Infrared Atmospheric Sounding Interferometer) instrument was launched on 19 October 2006 aboard of the polar-orbiting MetOp-A platform.IASI is a nadir-looking Fourier Transform Spectrometer (FTS) with scanning capability across track (swath width of 2200 km) that ensures a near global coverage twice per 24-hour period with morning and evening measurements (the local time of the equator crossing is 9:30 AM for the ascending orbit and 9:30 PM for the descending orbit).IASI was designed to record the top of the atmosphere thermal infrared (IR) radiation emitted by the Earth-atmosphere system.From the IASI spectra (645 to 2760 cm −1 , with an apodised resolution of 0.5 cm −1 ) a range of molecules can be retrieved (Clerbaux et al., 2009;Clarisse et al., 2011).
The CO retrievals used in this study were performed with the FORLI-CO (Fast Optimal Retrievals on Layers for IASI) retrieval code described in George et al. (2009) and Hurtmans et al. (2012).This operational software, based on the optimal estimation method, can process all the IASI spectra, i.e. 1.3 millions of observations per day, in near real time thanks to the use of a fast radiative transfer model on a PC cluster.
The prior information on CO and the associated variancecovariance matrix were constructed using a database of observations that includes aircraft profiles from the MOZAIC program (Nédélec et al., 2003), and ACE-FTS satellite observations in the upper troposphere and above (Clerbaux et al., 2008)   matrix representative of both background and polluted conditions (Turquety et al., 2009;Hurtmans et al., 2012).
The retrieval code provides 19 partial columns of CO defined on a vertical grid with 1 km resolution from the surface up to 18 km and the last layer corresponding to the atmosphere from 18 to 60 km.For each retrieved profile, the retrieval method provides the estimation of the error and averaging kernel for each layer.As shown in George et al. (2009), the vertical information content is coarse and generally much lower than the number of retrieved layers.It varies between a total column and two partial columns and depends strongly on atmospheric conditions and surface type.For the thermal IR CO band used in the retrievals, the sensitivity of the measurements is generally higher for mid tropospheric layers (see Fig. 1), and the boundary layer is visible only when there is thermal contrast between the effective radiative temperature of the surface (that takes into account surface emissivity) and the temperature of the overlying air.For the morning orbit and for measurements over land, on average, the contribution from the boundary layers is higher than during evening orbit (Clerbaux et al., 2009).

CO columns used in the assimilation
Due to the relatively low vertical information content present in the retrieved CO products, we have chosen not to assimilate the 19 distinct layers present in the retrieval product.Instead, the total vertical CO column is assimilated along with the vertically integrated averaging kernel.This solution increases the robustness of the assimilated product and leads to considerably faster execution of the assimilation package.On the other hand, for the cases where the retrieved product contains more than one piece of information on the vertical distribution of CO, this information is no longer contained in the vertically integrated product.For this study we accumu- In order to keep only the more reliable data in the assimilation, two filters were applied: one over high latitudes and one over bright surfaces.As discussed in Pommier et al. (2010), the information content for the retrieved CO product is particularly low over the polar regions (DOFS of only 0.6 to 1.2 in April) due to the low surface temperatures and thus weak signal to noise ratio.The retrieved profiles over these regions show some anomalous characteristics, such as fairly strongly negative values of the averaging kernels for the levels situated near the surface.For this reason, the data to be assimilated were filtered to exclude the poles (latitudes south of 60 • S and north of 75 • N).
As independent emissivity data were not available in 2008, errors in the retrieved columns were marked in certain areas, with especially unrealistically elevated columns for the morning orbits above deserts (Clerbaux et al., 2009).To filter these data, we have used a simple test based on the surface emissivity extracted from the MODIS/Terra climatology (Wan, 2008) and collocated with IASI observations.Data for regions over which this surface emissivity was below 0.94 are not used in the assimilation.Similarly, data over ice-covered regions with surface emissivity above 0.98 were not used.
Figure 2 shows the retained IASI total CO column averaged at the model resolution for July 2008.The month of July is shown as this period coincides with the POLARCAT summer campaign (see Sect.

Superobservations
The retrieved CO columns characterise CO content in a cone with a spatial extent corresponding to the size of one IASI pixel (about 12 km at nadir).As the model gridboxes are considerably larger (their size is 3.75 • in longitude ×1.89 • in latitude) and as there is often more than one measurement per model grid for a given assimilation window, a pre-processing of the data was necessary prior to assimilation.We have used an approach in which all the observations taken during a predefined observation window and falling into a given model gridbox are grouped into a single observation, called there-after superobservation.We have chosen the length of the assimilation window to be 30 min.This approach is commonly used in assimilation studies in which the size of model grids exceeds the measurement footprint and several measurements are available for sampled model grid (e.g.Lamarque et al., 1999;Clerbaux et al., 2001).Figure 3a shows the number of retrieved CO observations over a 12-h period that fall into each model grid.
Even though the approach of superobservations is commonly used, the implementation can depend on the case studied.In this work, due to the frequent presence of high intragrid spatial CO gradients, the question of representativeness of observations is addressed.Sharp horizontal gradients in the retrieved CO IASI columns exist due to the presence of, for example, pronounced CO plumes from forest fires.When both elevated and background CO columns fall in the same model grid, the total error on the superobservation should reflect not only the individual observation errors but also the high variability.Figure 3b shows the variability calculated on all of the individual observations for 6 July 2008 that fall within model grids.The variability can be high and can exceed 30 % of the mean total column.Figure 3b illustrates also the retrieval problems in the Antarctic and certain desert regions (especially the Sahara) where high spatial variability in the CO column is due to the high variability in emissivity (as mentioned in Sect.2.2 these data are not assimilated).
One way to define a superobservation is to calculate a mean, weighted by the errors on the individual measurements: where y i is the individual retrieved column of CO and e i is the associated error.N obs is the number of observations used to generate a superobservation.Equation (1) reduces to a simple arithmetic mean if the ratio of e i /y i is the same for all the measurements that fall into a given grid of the model.In a similar way, the error on the superobservation can be calculated with Eq. ( 2).This definition corresponds to an average error calculated by taking into account the weights of individual measurements.
If e i /y i is the same for all measurements, Eq. ( 2) reduces to y mean •(e i /y i ).
The above definitions of superobservation and its error have several limitations: 1. Equation (2) does not take into account the error due to the representativity of the superobservation and the information on the variability of the observations falling into a given model grid is not taken into account.One can thus have a case, when two sets of observations, one with low and one with high variability will have similar errors on the superobservations if the above formula is used.Since it is desirable to give higher error estimation for the latter case, a different formulation is needed to take into account the representativity of the superobservation.The superobservation with low variability can generally be considered as being more representative of the area covered by model grid.
2. With Eq. ( 2), a superobservation composed of a large number of observations can have an error that is similar to a superobservation composed of only one observation.Clearly having more observations distributed throughout the grid increases the representativity of the superobservation on the scale of the model grid.In addition, in regions with little variability, having several measurements can reduce the overall error of the mean.
The exact reduction will depend on the contribution of the systematic and random errors in the individual retrievals.
3. Finally, the above definition can lead in certain regions to a strong inconsistency between the variability of the superobservations (both spatial and temporal variability) and the error on these values.The errors on superobservations obtained with Eq. ( 2) are of the same order as the errors on the individual retrieved columns.However, in certain regions of the globe far from emission sources, like most of the southern hemisphere, due to averaging of a relatively high number of observations, the variability (spatial and temporal) of the superobservations is very weak.We thus have a case where the differences between neighbouring superobservations (both in space: nearby grids, and in time: superobservations over a few days) are much smaller than the estimated error.Clearly the above definition does not lead to a gaussian distribution of superobservations with the specified error.
In order to address the above issues, a new definition of the error on the superobservation is proposed in this work: where e superobservation is the value of the error calculated with Eq. 2 and V is the variance of individual observations in a grid of the model for a given assimilation window.The first term on the right hand side accounts for the reduction of the error if more than one measurement is present in a given grid box during the predefined assimilation window while the second term accounts for the variance.In this definition, for N Obs that is sufficiently high, the second term dominates.
As generally there is a high number of measurements in each grid, we use the variance of these measurements, instead of the error on the individual measurements, to estimate the effective error of the superobservation.
To have superobservations that are coherent with this definition of the error, we define them as: where y superobservation is calculated with Eq. ( 1) and Rand is a random number generator with standard deviation of one and mean of zero (Gaussian distribution).In regions with little spatio-temporal variability in the retrieved values, the second term allows having the same variability on the superobservations as was the case for individual observations (V ).In order not introduce too much variability on the superobservations in regions with high variability on individual measurements (such as regions close to CO sources), the value of V in Eq. ( 4) is limited to (2.5×10 17 molec cm −2 ) 2 .This value was chosen as it separates well the low and high variability regions (figure not shown).In the implementation proposed, the superobservations are calculated only if N Obs is greater or equal to 4.
The averaging kernel for each superobservation is calculated using a standard averaging formalism.For a vertical level k, the definition given in Eq. ( 5) was applied to average all N Obs averaging kernels for each grid of the model. (5) A(k) is the value of the obtained averaging kernel at level k, A(k, i) are the values at level k for each individual profile i.

The LMDz-INCA model
In this work we have used the Laboratoire de Météorologie Dynamique (LMDz) General Circulation Model (Hourdin et al., 2006) coupled with the INteraction with Chemistry and Aerosols (INCA) model (Hauglustaine et al., 2004;Folberth et al., 2006).The chemical scheme used in this study takes into account the CH 4 -NO x -CO-O 3 chemistry of the background troposphere as well as a detailed non-methane hydrocarbon scheme.We ran the model with a resolution of 96 grid points in longitude and 95 in latitudes (3.75 at 70 m) and between 1 and 2 km in the remaining parts of the troposphere.The model has been run in a nudged mode with winds relaxed towards ECMWF reanalysis.The relaxation of the GCM winds towards ECMWF meteorology is performed by applying a correction term to the calculated GCM wind horizontal components (u, v) corresponding to a relaxation time of 2.5 h over the whole model domain.The ECMWF fields are provided every 6 h and interpolated onto the LMDz grid.For the results reported here we use the winds for year 2008 corresponding to the period of interest for the POLAR-CAT campaign.
We have used the inventories for anthropogenic emissions prepared for the Climate Model Intercomparison Project #5 (CMIP5) in support of the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment report (AR5) by Lamarque et al. (2010).These inventories provide emissions of chemical precursors for climate simulations in order to study the long-term change of the atmospheric chemical composition.In this study we use the emissions corresponding to the year 2000.Grassland burning and forest fire emissions are prescribed according to the GFEDv2 (Global Fire Emissions Database version 2) inventories (van der Werf et al., 2006).GFEDv2 provides a range of information including burned area, fuel load, combustion completeness and emissions from fires for a series of gases and aerosols at 1 • ×1 • spatial resolution.The fluxes used represent a mean for the 1997-2004 period.Van der Werf et al. (2006) states that this inventory represents well the fire seasonality in boreal ecosystems and global wetland due to the improved modelling of fuel loads.Biogenic emissions from Lathière et al. (2006) are used for isoprene and monoterpenes as well as for emissions of other reactive chemical species such as methanol, acetone, aldehydes, and organic acids.

Kalman filter
The assimilation scheme used here belongs to the class of the optimal estimation methods in which a solution that minimizes a cost function is found.The cost function contains two terms that depend on the difference between the model and the observations and on the difference between the model and the background knowledge.Both terms are weighted by the covariance error matrices that express our confidence levels in the observations and in background model information.The assimilation approach used in this paper is based on the sub-optimal Kalman filter described in Khattatov et al. (2000).In this sequential assimilation approach, for each assimilation window of 30 min, a new analysis is calculated using the following equation: where x a and x b are the model analysis and forecast respectively.Both x a and x b are defined on the model vertical levels (in the units of volume mixing ratios) and are provided for model grids for which information is provided by the ob-servations.K is the Kalman gain matrix and H is the observational operator that allows transforming the information from the model space (CO profiles on a model horizontal and vertical grids) to the observation space allowing a comparison of model CO profiles (x b ) and superobservation for total CO columns retrieved from IASI data.The retrieved column in the optimal estimation algorithm contains a contribution both from the real profile and from the a priori used in the retrieval: where x r is the real profile (unknown), x ap is the a priori profile used in the inversion of the IASI radiances, and dp T is the transpose of the vector containing the pressure thicknesses of the 19 vertical layers.The product of dp T and x ap gives the a priori total CO column.A is the normalised averaging kernel vector defined on 19 levels.It characterises the contribution of these layers to the total retrieved column.Similar equation in which x r is replaced by x b is used to construct the corresponding model value (H (x b ) in Eq. ( 6)).K in Eq. ( 6) is calculated as follows: where B is the model variance-covariance error matrix, O is the observation variance-covariance error matrix, and R is the representativity error of the superobservation (term 4)).
The method of the Kalman filter allows calculating the impact of assimilation on the error of the model (the model improves as a result of assimilation): where B a is the variance-covariance matrix of the model error after assimilation.

Model error
Besides characterizing the error on the observations, it is also necessary to provide an estimation of the model's variancecovariance error matrix.Relative values of model and observational error will determine the impact of observations on the model's CO column.The vertical correlation and vertical distribution of the model's error will also have an impact on deciding which vertical levels of the model will be impacted by the assimilation of the CO columns.
The parameterisation of the model's variance-covariance matrix (B matrix) follows Khattatov et al. (2000).Given the size of the B matrix, several simplifications are necessary: the cross-correlation terms in the matrix are based on correlation lengths and are calculated as needed using the parameterisations specified in  carried out to select optimal values for the remaining adjustable parameters: error growth term (ε), as well as the vertical (L z ) and horizontal correlation lengths (L xy ).As specified in Khattatov et al. (2000) the χ 2 (chi2) test and the OmF (Observations minus Forecast) values were used to select final values: the error growth term was set in this study to 0.01, and the vertical (L z ) and horizontal (L xy ) correlation lengths were fixed at 0.4 * atmospheric scale height (atmospheric scale height ∼7 km) and 500 km respectively.

Chi2 test
In order to test the coherence of the parameterisation of the model error (Sect.

Impact of assimilation on the simulated total CO columns
In this section we evaluate the impact of the assimilation by comparing the IASI CO products with the columns simulated with and without assimilation.Note than the model run time is doubled when IASI data are assimilated.Figure 5 shows the CO total columns simulated by the model with assimilation (Fig. 5a) and for the control run in which observations were not assimilated (Fig. 5b). Figure 5c shows the impact of assimilation with the difference between the assimilation and control runs.The averaged fields for July 2008 are plotted.Over the Northern Hemisphere assimilation leads to a consistent increase in the column CO.The strongest increase is over the North Pacific and the Canadian Arctic where the difference reaches 6×10 17 molecules cm −2 .Lower CO columns in the control run indicate a likely underestimation of CO sources from eastern Asia and possibly also from boreal forest fires over North America.Assimilation increases also the CO column over the north Atlantic and most of Eurasia.There are also two regions in  the tropics: the eastern equatorial Africa and tropical South America where assimilation leads to a consistent decrease in the CO columns.The results shown in Fig. 5 are for column values; the implications for different levels of the profiles will be discussed in Sect.5.3.
Figure 6 shows the observations (both the individual observations and the superobservations) as well as model results from the runs with and without assimilation for 7 July 2008 -a day with a strong cross-pole transport event.The comparison of the two figures showing observations indicates that the superobservations capture well the patterns and range of values seen in the individual observations.The strong CO plumes are generally well reproduced in the assimilation run, even though the highest values in the centres of the plumes are underestimated with respect to observations.The difference with respect to observations is considerably stronger for the control run.For example, the fire plume that is visible in the IASI data and which corresponds to the strong transport event into the high Arctic (Pommier et al., 2010), is better represented in the run with assimilation than in the control run.This transport episode is addressed further in comparisons with in situ data in Sect.5.3.
The mean positive impact of assimilation on the comparison with the IASI CO columns is quantified for all months of the simulation in the Table 2.These results indicate that assimilation leads to a considerable reduction of the mean absolute difference between the model simulated values and the observations.The mean relative absolute difference between the observations (O) and the model (M) is reduced from about 14-17 % error for the control run to about 5-8 % error for the run with assimilation (N in the equation is the number of superobservations in each month, and i is the index of the superobservations and corresponding model value).

Validation of the IASI CO product by comparing against independent data from aircraft campaigns
The IASI derived CO data have been analysed in detail and validated using other satellite data (George et al., 2009), but as detailed in Pommier et al. ( 2010) direct validation of level 2 products using aircraft observation remains difficult.Assimilation can be seen as a tool that facilitates this task by removing the constraint of having both types of observations collocated in space and time.In addition, assimilated products allow comparison at the vertical levels at which the in situ data are available.To compare directly with satellite data, even if a vertical profile is retrieved, application of the averaging kernels to smooth the in situ profiles is necessary to have a meaningful comparison (Pommier et al., 2010).On the other hand, the assimilated products contain the contribution from the model that can be important in certain cases.

MOZAIC data
We first compare the model results with CO measurements obtained by the MOZAIC aircraft measurement program (Nédélec et al., 2003).Because these measurements were obtained by commercial aircraft, their geographical positions are limited to the usual flight corridors (Fig. 7) with most of the measurements available at the flight altitude of about 12 km.The atmospheric profiles are available only near the airports.Despite these limitations, and thanks to the high number of measurements, these data remain an exceptional dataset for validating model results.
Due to the large number of flights, the average of the MOZAIC data represents well climatological conditions, however, the data contain also numerous outliers corresponding to aircraft flying through CO anomalies.Figure 7 shows that the CO values for May 2008 at the cruising altitude are highly non-uniform.Several air masses with high CO were sampled mostly over the North American continent.To compare the CO assimilated fields with aircraft observations, we sampled the model at the time and location of the measurements.The sampling was done by performing temporal and spatial interpolation of the model results onto the positions of the measurements.This sampling was done during the execution of the chemistry transport model in order to benefit from the original temporal resolution of the model, which corresponds to model's physical time step.The comparison of the measured and modelled CO values averaged over three latitude bands and 50 mb pressure bins is shown for the month of May in Fig. 8.In order to evaluate the impact of assimilation, the results are shown for both the control and assimilation runs.The panels on the left side of the figure show that in the mid-troposphere the means and medians obtained with assimilation are considerably closer to the measured values for all latitude bands for which the MOZAIC measurements are available.In the mid-troposphere of the northern hemisphere (30 to 45 • N and 45 to 60 • N) the original mean bias of about 10 to 30 ppbv present in the control simulation is considerably reduced when CO data are assimilated.The mean absolute differences in CO mixing ratios (centre panels of the figure) are reduced to about 10-15 ppbv from the original difference that is generally 5 to 20 ppbv higher.Near the surface, the positive impact of assimilation is less obvious, especially in the 30-45 • N latitude band, where assimilation seems to increase the mean bias.As discussed before, the values near the surface are not well constrained by the total column.In the tropics, the original model bias in the mid-troposphere is smaller, and assimilation helps to reduce it even further.For measurements situated near or above the tropopause, due to the relatively long vertical correlation lengths that are independent of the pressure level (Sect.3.3) and limited vertical resolution of the model, assimilation can lead to a slight degradation of the results as shown for example for the 45-60 • N latitude band.
In Fig. 9 we show the results for the 8 months and only for the 45 • N-60 • N latitude band which contains the highest number of measurements.With the exception of the surface values, this figure shows that assimilation helps improving the seasonal cycle of CO.The impact of assimilation is seen especially for the first three months of the simulation, for which the control run has on average too low CO mixing ratios, and near the end of the simulation, where these values are too high.Near the surface, the improvement is less obvious.Similar improvements are also observed for other latitude bands: 30 • S-30 • N and 30    Finally in Table 3 the correlations between the data from MOZAIC database and model simulations are tabulated for 4 pressure bins and 8 months.These results allow analysing the impact of the assimilation on the correlation coefficients for the simulations with and without assimilation.The results show that near the surface the impact of assimilation on the correlation coefficients is negligible.For the remaining pressure bins, the correlation coefficients increase for a great majority of the months indicating that assimilation helps in reproducing also the spatial and temporal trends seen in observations, such as the presence of CO anomalies.This is confirmed by reproducing Fig. 8 only for measurements that are higher than the mean plus standard deviation (figure not shown).With assimilation, the model is closer to the observations with high CO than the control run indicating that the assimilation of the high CO episodes had a positive impact on model simulations.

POLARCAT data
The results described in the previous section are valid for MOZAIC CO data that were measured by commercial planes in the vicinity of a small number of airports situated, for a great majority, over Western Europe and North America.To extend the comparison to other regions, we used data acquired during a recent aircraft campaign named POLAR-CAT.POLARCAT is an international program involving 18 countries within IPY (International Polar Year).Several aircraft were deployed in the Arctic during two intensive periods during spring and summer 2008 (Pommier et al., 2010;Paris et al., 2009;Jacob et al., 2010;Adam de Villiers et al., 2010;Roiger et al., 2011;Brock et al., 2011).An effort was made during POLARCAT to sample polluted plumes with air masses with different origins (mainly boreal forest fire plumes and anthropogenic pollution) and therefore the measurements are often not representative of climatological conditions.Since the POLARCAT campaigns focused on sources, transport pathways and climate impacts of Arc-tic pollution, the flights covered the mid and high latitudes over the Northern Hemisphere.In this article we use the CO measurements acquired during the summer campaign (July 2008), which overlaps with our simulation.The measurements provide transects at different altitudes but they also provide vertical profiles corresponding to descends and ascends of the aircraft not only near the airports but also during the flight.
Five different aircraft were involved in the summer campaigns: American DC-8 and P-3B aircraft flew mostly over the Canadian Arctic as well as the North Pole, the French and German planes, ATR-42 and Falcon-20 respectively flew over and in the vicinity of Greenland and the Russian Antonov-30 aircraft covered a relatively large area over eastern Siberia.The flight paths are shown in Fig. 10.The precision (1 to 5 ppbv) and accuracy (1 to 5 %) of the various instruments used for CO measurements, as well as the measurement techniques applied, have been summarised in Pommier et al. (2010).Pommier et al. (2010) contains also a comparison of POLARCAT CO measurements with the IASI retrievals when collocation is possible.
We first present a comparison of CO time series obtained during two DC-8 flights that traversed pollution plumes.The first analysed flight took place on 9 July 2008 over Greenland and Canadian Arctic sampling a high altitude plume with Siberian origins.This plume which contains contributions from biomass burning and anthropogenic pollution was observed by IASI as shown in Fig. 6 and as described in Pommier et al. (2010) and Sodemann et al. (2011).The analysis of the bottom left panel of Fig. 11 shows a significant improvement of the CO level when assimilation is used.The model with assimilation captures relatively well the presence of highly polluted air masses in the upper troposphere during the second half of the flight (bottom left panel of Fig. 11).The model values reach 200 ppbv in agreement with observations.The control run with the monthly averaged GFEDv2 emissions is unable to reproduce these high CO signatures  less time to be diffused and also that were sampled by fewer IASI assimilation windows than the plumes present at high altitudes over Greenland on 8 and 9 July.Figure 12 compares model results with all available PO-LARCAT CO measurements made by the five aircraft used during the summer campaign during the months of June and July.The panels in the left column show the mean CO profiles for observations and model with and without assimilation, while graphs in the centre column show the mean absolute difference between the observations and the two versions of the model.Both columns contain as well information on the median values, 25 and 75 percentiles.All data were averaged in 25 hPa bins.In most cases, the assimilation of IASI observations improves the agreement between in situ CO values and model results.For the control run the model profiles are generally fairly constant in the vertical and are characterised by the mean background concentrations between 80 and 100 ppbv.With assimilation of the CO plumes observed by IASI, the model compares well with the relatively high measured values of CO in the middle troposphere that are due to the choice of sampling by the aircraft of polluted air masses.The position and intensity of these plumes are better simulated than with only climatological emissions present in the control run.For the ATR-42 and especially Falcon 20 (flights over Greenland) the increase of CO in the upper part of the troposphere is due to intense sampling by these aircraft of plumes resulting from long range transport from lower latitudes.This increase is particularly well simulated by the model when assimilation is used.For the flights of DC-8 and P-3B that covered also regions much closer to the emission regions (e.g.North American forest fires), assimilation leads to a considerable improvement in the upper part of the troposphere but is unable to simulate the high mean values present near the surface.As mentioned previously, the IASI data are only weakly sensitive to these near surface plumes.The median values, which are comparable to the background values, are however well simulated by the model with assimilation for P-3B data and generally also for the DC-8 flights.For the Antonov-30 flights in Siberia, the aircraft sampled mostly background air and only a limited number of fairly weak CO plumes was observed near the surface.The vertical shapes of the two simulated CO profiles (assimilated and control) are similar for the Siberian data with the assimilated data being consistently higher.In the middle troposphere this increase brings the assimilated data closer to observations indicating that on average, assimilating IASI CO data helps in reproducing the correct background CO values in this part of the atmosphere.In the lower troposphere the control run was generally in good agreement with in situ data, and the increase observed for the assimilation run results generally in values that are too high.

Atmos
Another characteristic that is improved with assimilation is the spread of the simulated CO mixing ratios in the middle and upper troposphere.The spread, indicated by the 25 and 75 percentiles in Fig. 12, is generally too low in both versions of the model, however the simulation with assimilation is considerably closer to observations.The centre panels in Fig. 12 indicate that the mean absolute difference between the model and observations decreases to generally between 10 and 20 ppbv in the middle troposphere from about 30 to 60 ppbv for the control run (with the exception of the Antonov-30 data for which the mean absolute difference with control run is generally better than 20 ppbv).

Discussion of the validation results
As shown in the above two sections, the version of the model with assimilation leads to a considerable improvement of the model results as compared with in situ measurements from the POLARCAT and MOZAIC campaigns.Assimilation helps both by improving the background values as well as the elevated concentrations measured in the pollution plumes.The consistency of these improvements for both MOZAIC and POLARCAT data contributes to the validation of the original IASI product.
For both datasets, and due to the long lifetime of CO, assimilation acts primarily to correct errors resulting from the differences between the climatological surface emissions used in the simulations and the true emissions that took place during the analysed period.This difference can be particularly strong for emissions from forest fires, as the plumes sampled during POLARCAT, which are by nature highly episodic.In the case of boreal forest fires, elevated levels of CO can be injected directly into the middle troposphere where IASI is the most sensitive.The cross pole transport event represented in Fig. 6 and sampled by the DC-8 flights on the 8 and 9 July (Fig. 11) is a good demonstration of the positive impact of assimilation on CO levels in biomass fire plumes.The cross-polar transport patterns seem to be simulated well by the model, however, in the version with climatological sources and no assimilation, too low CO concentrations in transported plumes are explained by too low background in the region where the plume has originated.over Siberia and when this air is transported towards the pole (no assimilation is performed north of 75 • N), CO is in better agreement with IASI and also in situ observations.Assimilation seems to modify the impact of sources also in other regions, such as south-east Asia, leading to a considerable increase of the total CO column over the North Pacific as shown in Fig. 5. Assimilation can also correct for other model related deficiencies such as, for example, horizontal transport and especially vertical transport out of the boundary layer.
As the IASI CO products are mostly sensitive to the CO column in the middle troposphere, most of the improvement is present in this part of the atmosphere.Relatively little improvement is observed near the surface.An example can be seen for the POLARCAT data over Siberia sampled by the Antonov-30 aircraft.Over Siberia the CO emissions from fires and other diffused sources have lower emission heights than for other boreal forest fires.The few near-surface COrich episodes sampled by the aircraft were not reproduced in the model with assimilation.Assimilation increases however, in agreement with aircraft observations, the mixing ratios in the middle troposphere.Similarly, when model results are compared with in situ surface data (not shown), little or no positive impact of assimilation is seen.Clear improvement is observed only in remote polar regions of the southern hemisphere.In those regions that are far from strong emission, the vertical profile of CO varies little and by removing the bias in the mid-troposphere, the information is also transported with time to the lower levels due to vertical mixing.For other remote locations, the version with assimilation, while generally improving the background surface values, shows episodic CO peaks that are not present in observations.The presence of these peaks corresponds to the CO plumes in the higher levels of the troposphere, and indicates a possible too strong vertical mixing in the model.

Summary
In this paper we present the work carried out in order to contribute to the validation of the CO columns retrieved from the IASI measurements.Eight months of IASI CO columns from May until December 2008 were assimilated into the LMDZ-INCA global chemistry transport model, and the assimilated data were compared with in-situ measurements from the MOZAIC program and from the POLARCAT summer campaign.One of the advantages of using the assimilated values in the comparison is that assimilation provides global and continuous maps of the CO column that are not limited to the overpass of the satellite.In addition, the assimilated product contains information on the vertical distribution of CO.Both of these characteristics simplify considerably the comparison with the in situ data that are rarely collocated with the IASI measurements, and that rarely have enough vertical extent for a comparison to column-like satellite observations.It was found that in the mid-troposphere assimilation leads to a consistent reduction of the differences between the model values and in situ aircraft data measured during MOZAIC and POLARCAT flights.This improvement is observed despite the relatively limited vertical resolution of the IASI CO product that smoothes out any features with high vertical resolution.The general improvement of the model with assimilation, as compared with the control simulation, when evaluated against in situ aircraft data in the mid troposphere, is an important contribution to the validation of the IASI CO product.
Near the surface the impact of assimilation is much less beneficial.The lowest troposphere contributes less to the IASI CO product, and the simulated values are not well constrained by the assimilation.concentrations inside the plumes reaching the Arctic.This is particularly true for elevated plumes representing transport from remote sources.As shown for example for the cross pole transport event that took place on 8-12 July, the control run was not able to simulate the concentrations measured by aircraft around Greenland because the concentrations in the source region over Asia were too small.Assimilation of IASI CO data led to a considerable increase in CO over Asia that corrected for the deficiency of the climatological emissions and led to a higher, more realistic transport of pollution into the Arctic region (no assimilation was performed above 75 • N).
In a parallel experiment related to the POLARCAT campaign, the version of the model with assimilation has been run in a forecast mode using forecast winds and initial conditions updated with the most recent observations.This version can be a very useful tool for planning the aircraft flight paths during the field missions.The model can act as a tool that transports the plumes seen in the recent IASI CO data forward in time thus allowing an identification of regions were the pollution concentration is likely to be elevated.The posterior simulations performed for the POLARCAT flights show that indeed the assimilated IASI CO plumes present in the model initial conditions are transported in time in good agreement with the IASI observations obtained over the following day.
Assimilation of IASI total CO columns used in this work is now also performed with 4DVAR methods in the framework of the Monitoring Atmospheric Composition and Climate (MACC-II) project, the current pre-operational atmospheric service of the European GMES programme led by ECMWF (Hollingsworth et al., 2008).The near real time assimilation of IASI data (less than 3 h after the observation) allows providing CO forecast for the next three days (for maps see http://www.gmes-atmosphere.eu/d/services/gac/nrt/nrt fields).The work presented in this paper has served as the first experience with the assimilation of the IASI total CO columns.

Atmos
Figure 1.Monthly mean altitude of the maximum of the IASI CO averaging kernel (in kilometres) for July 2008.The white areas indicate regions that were filtered out (see text in section 2.2).

Figure 2 .
Figure 2. Monthly mean IASI CO columns averaged over model grids for morning and evening orbit (in molecules/cm2) for July 2008.The white areas indicate regions that were filtered out (see text in section 2.2).

Fig. 1 .
Fig. 1.Monthly mean altitude of the maximum of the IASI CO averaging kernel (in kilometres) for July 2008.The white areas indicate regions that were filtered out (see text in Sect.2.2).

1Figure 1 .Fig. 2 .
Fig. 2. Monthly mean IASI CO columns averaged over model grids for morning and evening orbit (in molecules cm −2 ) for July 2008.The white areas indicate regions that were filtered out (see text in Sect.2.2).
Figure 3. Top Figure (a): Number of individual measurements of the CO column per model grid during a 12 hour period.Bottom figure (b): Ratio of the value of standard deviation calculated on individual measurements of the CO column inside each model grid and the mean value of the CO column (in %).Both figures are generated for data from 0 UTC until 12 UTC for July 6th, 2008.White areas indicate model griboxes with no observations for this particular period.
L xy b i,j = b i,i b j,j exp −Dist(i,j )

JulyFig. 4 .
Fig. 4. Chi2 parameter averaged for all model grids for each of the 30 min assimilation windows.Chi2 is plotted as a function of the number of assimilation windows available for July 2008.

Fig. 5 .Fig. 6 .
Fig. 5. Monthly averaged CO columns for July 2008 simulated by the model for a run with assimilation (a), model without assimilation (control run) (b) and for the difference between these two simulations (impact of assimilation = simulation with assimilation-control)) (c).The CO columns are calculated by applying the averaging kernels supplied with each observation to the model analysis.

Fig. 8 .
Fig. 8. Left column: solid lines indicate monthly averaged CO for: MOZAIC (black curve), LMDz-INCA with assimilation of IASI CO products (green curve) and LMDz-INCA control run (red curve).Centre column: mean absolute differences between: MOZAIC data and assimilation run (green curve) and MOZAIC and control run (red curve).The horizontal bars in the left and centre column panels indicate the 25 and percentiles, and the symbol x indicates the median.The right column indicates the number of MOZAIC measurements.The number of measurements in the 200-300 mb bin is off the scale of the figure.The results are shown for May 2008, for data in three latitude bands: 30 • S-30 • N (top figures), 30 • N-45 • N (centre figures), and 45 • N-60 • N (bottom figures) and 100 hPa pressure bins.

Fig. 9 .
Fig. 9. Monthly averaged CO data for: MOZAIC (black curve), LMDz-INCA with assimilation of IASI CO products (green curve) and LMDz-INCA control run (red curve).The plotted data represent the means and standard deviations (error bars) for one latitude band (45-60 • N), for data in four pressure bins: below 800 mb (upper left), between 800 and 600 mb (upper right), between 600 and 400 mb (centre left) and between 400 and 300 mb (centre right) and from May to December 2008.The data in the lowest panel shows the number of measurements in each pressure bin as a function of month.

Fig. 10 .
Fig. 10.Flight paths of the five aircraft during the summer campaign (June and July 2008) of POLARCAT.

Fig. 11 .
figures: July 9 th over the Arctic and right figures: July 5 th over Canada.The top panels show 4 the flight pressure, lower panels show CO mixing ratio from in situ measurements (black) and 5 the simulation with assimilation (green) and without assimilation (red).6 7

Fig. 12 .Fig. 12 .
Fig. 12. Left column: solid lines show mean CO profiles for simulation with assimilation (green), control run (red) and in situ measurements (black).Centre column: mean absolute difference between CO data from POLARCAT observations and model with assimilation (green), and observations and control model run (red).The horizontal bars in the left and centre column panels indicate the 25 and 75 percentiles, and the symbol x indicates the median.The right column indicates the number of POLARCAT measurements for each aircraft.Data are plotted for July 2008 for the ATR-42 (a), the Falcon-20 (b), the DC-8 (c), the P-3B (d) and the Antonov-30 (e).Continued on next page.

Table 1 .
Equation (9)is applied only to calculate the diagonal terms.The initial model error in each model grid (f ) was set to 50 % of the CO concentrations at the beginning of the simulation.Several simulations were Atmos.Chem.Phys., 12, 4493-4512, 2012 www.atmos-chem-phys.net/12/4493/2012/

Table 1 .
List of adjustable parameters that define the variance-covariance error matrix B. CO is the mixing ratio of CO in (ppbv), p i and p j are the pressures (in Pa) of two vertical model levels, Dist(i, j ) is the horizontal distance between centers of model gridboxes i and j , b i,j is the calculated term of the model error covariance matrix.
* CO) 2 for diagonal terms 0.5 (no units) Error growth term

Table 2 .
Quantification of the impact of assimilating observations (IASI CO columns) on the difference between the observations (O i ) and the simulated model CO columns.The results are shown for both model results with assimilation M A,i , and the control run (M C,i ).The impact is calculated separately for the 8 months of simulations.

Table 3 .
Number of measurements (N) and correlation coefficients (R) between MOZAIC CO data and model results obtained with assimilating IASI CO data and without assimilation (control).The results are shown for data for 4 pressure bins: pressures higher than 800 mb, between 800 and 600 mb, between 600 and 400 mb, and between 400 and 300 mb.