Simultaneous assimilation of satellite NO 2 , O 3 , CO , and HNO 3 data for the analysis of tropospheric chemical composition and emissions

We have developed an advanced chemical data assimilation system to combine observations of chemical compounds from multiple satellites. NO 2, O3, CO, and HNO3 measurements from the Ozone Monitoring Instrument (OMI), Tropospheric Emission Spectrometer (TES), Measurement of Pollution in the Troposphere (MOPITT), and Microwave Limb Sounder (MLS) satellite instruments are assimilated into the global chemical transport model CHASER for the years 2006–2007. The CHASER data assimilation system (CHASER-DAS), based on the local ensemble transform Kalman filter technique, simultaneously optimizes the chemical species, as well as the emissions of O3 precursors, while taking their chemical feedbacks into account. With the available datasets, an improved description of the chemical feedbacks can be obtained, especially related to the NOx-CO-OH-O3 set of chemical reactions. Comparisons against independent satellite, aircraft, and ozonesonde data show that the data assimilation results in substantial improvements for various chemical compounds. These improvements include a reduced negative tropospheric NO 2 column bias (by 40–85 %), a reduced negative CO bias in the Northern Hemisphere (by 40–90 %), and a reduced positive O 3 bias in the middle and upper troposphere (from 30–40 % to within 10 %). These changes are related to increased tropospheric OH concentrations by 5–15 % in the tropics and the Southern Hemisphere in July. Observing System Experiments (OSEs) have been conducted to quantify the relative importance of each data set on constraining the emissions and concentrations. The OSEs confirm that the assimilation of individual data sets results in a strong influence on both assimilated and non-assimilated species through the inter-species error correlation and the chemical coupling described by the model. The simultaneous adjustment of the emissions and concentrations is a powerful approach to correcting the tropospheric ozone budget and profile analyses.


Introduction
Tropospheric ozone (O 3 ) is an important chemical species for air quality and climate (IPCC, 2007).It is an atmospheric pollutant in the lower troposphere and an effective greenhouse gas in the upper troposphere.Surface emissions of carbon monoxide (CO) and nitrogen oxides (NO x ) play an important role in determining tropospheric O 3 abundances.CO is an important precursor of tropospheric O 3 under high NO x conditions.The concentration of CO is strongly related to the oxidising capacity of the atmosphere since it reacts primarily with OH (e.g.Logan et al., 1981;Daniel and Solomon, 1998;Thompson, 1992).In the middle and upper troposphere, O 3 can be generated efficiently through lightning NO x sources (e.g.Pickering et al., 1998;Jenkins and Ryu, 2004;Martin et al., 2007).The abundance of tropospheric CO and NO 2 influences the atmospheric lifetime of the important greenhouse gases, methane (CH 4 ), O 3 (Shindell et al., 2009), and also CO 2 (Folberth et al., 2005).
NO x and CO have both anthropogenic and natural sources.Anthropogenic sources include fossil fuel and biofuel combustion.Natural sources include biomass burning, soil, and also lightning emissions for NO x .CO is produced from the oxidation of hydrocarbons by the incomplete combustion of fossil fuels and biofuels, and during biomass burning events (Holloway et al., 2000).Knowledge about variations in surface emissions is important, but currently available bottom-up emissions inventories have large uncertainties (e.g., Jaeglé et al., 2005;Zhao et al., 2011).These inventories use statistical data, which generally have coarse resolution and large uncertainties.The extent of emission-related activities and emission factors are sources of error.For instance, Zhao et al. (2011) estimated the uncertainties of a bottom-up inventory of Chinese anthropogenic NO x emissions to be −13 %∼37 %.In addition, temporal (e.g.diurnal, weekly, seasonal, inter-annual) variations in emissions are generally poorly represented in the inventories.For instance, rapid economic growth in industrialized Asia has led to a rapid increase in the concentrations of O 3 precursors, such as NO 2 , CO (Richter et al., 2005;Stavrakou and Müller, 2006;van der A et al., 2008), and Volatile Organic Compounds (VOCs) (Fu et al., 2007), but these may not be captured well by most of the inventories (Lamsal et al., 2011).
In the past decade, top-down inverse modelling approaches have been proposed to estimate emission variations in CO (e.g.Kasibhatla et al., 2002;Arellano et al., 2004;Stavrakou and Müller, 2006;Kopacz et al., 2009;Hooghiemstra et al., 2011) and in NO x (e.g.Martin et al., 2003;Boersma et al., 2008b;Zhao and Wang, 2009;Lamsal et al., 2010;Miyazaki et al., 2012).The inversion adjusts the emissions in order to minimize the discrepancy between the model predictions and observations, while taking the observation errors into account.The estimated regional emissions show large discrepancies among different estimates, reflecting differences in inversion frameworks, atmospheric models (e.g.Arellano and Hess, 2006), and datasets (e.g.Miyazaki et al., 2012) employed in the analyses.Since the relationship between surface emissions and atmospheric abundances is assumed to be predicted well by the model in the inversions, it is important to represent the chemical processes in a realistic way when estimating the emissions.The CO-OH-NO x -non-methane VOC (NMVOC) chemical interactions may have large impacts on the inversion of NO x and CO emissions (Müller and Stavrakou, 2005).For instance, neglecting the chemical feedback of changes in surface emissions on the abundance of OH could introduce biases in the a posteriori estimates of the CO sources (Jones et al., 2009).
Data assimilation is a technique to combine different observational data sets with a model (e.g., Kalnay, 2003).Data assimilation systems for tropospheric chemistry have been developed in the past decade for mapping the global distribution of chemical species, including O 3 and its precursors.In the past decade, advanced techniques involving the variational approach (Elbern and Schmidt, 2001;Errera et al., 2008;Flemming et al., 2009;Elguindi et al., 2010) and Kalman filters (Khattatov et al., 2000;Eskes and Boersma, 2003;Grassi et al., 2004;Hanea et al., 2004;Segers et al., 2005;Parrington et al., 2008) have been applied to atmospheric chemistry.Recently, the ensemble Kalman filter (EnKF) technique has been applied for tropospheric chemical data assimilation (van Loon et al., 2000;Arellano et al., 2007;Constantinescu et al., 2007;Coman et al., 2012).The EnKF uses an ensemble forecast to estimate the background error covariance matrix.The advantage of the EnKF is its easy implementation for complicated systems; i.e. it does not require the development of an adjoint code.
The use of data assimilation for atmospheric chemistry, especially for short-lived chemical species, is still challenging, as discussed by Lahoz et al. (2007) and Sandu and Chai (2011).Short-lived species concentrations vary on timescales from less than a minute to one day, and detailed treatment of various chemical processes is required to simulate the variability.A large part of the atmospheric chemical system is not sensitive to the initial conditions because of the chemical equilibrium, which is different from the chaotic system involved in the numerical weather prediction (Constantinescu et al., 2007;Lahoz et al., 2007), but is sensitive to the model parameters (e.g.emission, chemical reaction rate, and deposition velocity) and processes (e.g. chemical reaction equation, wet and dry deposition, and atmospheric transport).Although the errors in simulated tropospheric composition are caused by many factors, they are largely affected by highly uncertain emissions (e.g.Mallet and Sportisse, 2005).Thus, the simultaneous adjustment of emissions and concentrations is a powerful framework in tropospheric chemical data assimilation.However, most recent satellite data assimilation systems optimize either the concentration of a very limited number of chemical species (e.g.Mallet and Sportisse, 2005;Parrington et al., 2008;Flemming et al., 2009;Elguindi et al.,  2010) or emissions (e.g.Müller and Stavrakou, 2005;Kopacz et al., 2010;Hooghiemstra et al., 2011).Only a few advanced studies (Hanea et al., 2004;Elbern et al., 2007) have demonstrated that the simultaneous optimization of multiple chemical states including emissions is an effective way to improve air quality near the surface using surface in-situ observations.In this study, an advanced EnKF data assimilation system is presented to simultaneously optimize the chemical concentrations and emissions in the troposphere.Satellite observations of O 3 , CO, NO 2 , and HNO 3 obtained from TES, MO-PITT, OMI, and MLS are assimilated into the global chemical transport model (CTM) "Chemical AGCM for study of atmospheric environment and radiative forcing" (CHASER).TES has the potential to efficiently constrain tropospheric O 3 profiles (Foret et al., 2009).MOPPIT is suitable for global CO emission estimates because of its good global coverage.MLS is expected to provide important constraints on the background concentrations of O 3 , HNO 3 , and other O 3 precursors in the UTLS together with lightning NO x sources.The high temporal and spatial resolutions of the OMI are useful to optimize NO x emissions on a daily basis.The assimilation results are validated against independent data, obtained from five satellite instruments, MLS/OMI (tropospheric O 3 column, TOC), TES (CO), and GOME-2 and SCIAMACHY (tropospheric NO 2 column).Global ozonesonde data and aircraft observations obtained during the INTEX-B campaign (Singh et al., 2009) are also used for the validation of the vertical profiles.To the authors best knowledge, this is the first advanced data assimilation system that simultaneously optimizes the concentrations and emissions of multiple tropospheric trace gases, based on multiple satellite sensor/species data sets.The structure of this paper is as follows.Section 2 describes the data.Section 3 introduces the data assimilation system.Section 4 presents Observing System Experiment (OSE) results to identify the relative contribution of each as-similated data set.Section 5 presents the data assimilation results including the estimated emissions, the validation, and the properties of the assimilated fields.Section 6 concludes this study.Section 7 discusses future challenges.

Observations
This section introduces the observations used for the data assimilation (Sect.2.1 and Table 1) and validation (Sect.2.2 and Table 2).The data assimilation requires a non-linear observation operator, H , for each satellite retrieval.The model fields, x, are first interpolated to the horizontal location of each observation and the height of each of the vertical layers using the spatial interpolation operator, S. Then the averaging kernel, A, and the a priori profile, x a , of each observation are applied to obtain the model fields in the observation space, y b , The averaging kernel matrix is used to define the sensitivity of the estimated state to changes to the true state, while the trace of the averaging kernel matrix gives a measure of the number of independent pieces of information, i.e. the Degree of Freedom for Signals (DOFs) (Rodgers, 2000).In this approach, the satellite-model difference (y o −y b ) is not, or only weakly, biased by the a priori profile x a (Eskes and Boersma, 2003;Rodgers and Connor, 2003), where the observational error is the sum of the measurement error and the representativeness error (both random and systematic), and x true represents the true atmosphere profile.The same observation operator has been also applied for validating the model profile against retrievals in order to remove the influence of the smoothing error and the retrieval error arising from the a priori profile.For plotting the global distribution, both the retrieved and simulated concentrations are mapped onto a same resolution of 2.5 × 2.5 • (1.25 × 1 • for MLS/OMI TOC only).

OMI tropospheric NO 2 column
The Dutch-Finnish OMI instrument, which was launched aboard the Aura satellite in July 2004, is a nadir-viewing imaging spectrograph (Levelt et al., 2006).Aura traces a sun-synchronous, polar orbit with a period of 100 min.OMI provides measurements of both direct and atmospherebackscattered sunlight in the ultraviolet visible range from 270 to 500 nm.OMI pixels are 13 × 24 km at nadir, increasing in size to 24 × 135 km for the largest viewing angles.OMI tropospheric NO 2 column retrievals, with their daily global coverage, are effective to constrain global NO x emissions on a daily basis, unlike GOME-2 and SCIA-MACHY retrievals which have poorer spatial and temporal resolutions and less global coverage (Richter and Burrows, 2002;Boersma et al., 2008b).The overpass time of OMI (about 13:40 LT) is more suitable for the estimation of lightning NO x sources than that of GOME-2 and SCIA-MACHY (both in the morning).The Dutch OMI tropospheric NO 2 data product DOMINO version 2 (Boersma et al., 2011) is used in this study.The error in OMI NO 2 retrievals for individual pixels can be approximated as 1.0 × 10 15 molec cm −2 + 25 % (Boersma et al., 2011).Details of the retrieval and error estimates are described by Boersma et al. (2004Boersma et al. ( , 2007Boersma et al. ( , 2011)).Only observations with a radiance reflectance from clouds of less than 50 % (i.e.cloud fraction less than about 20 %) and surface albedo of less than 0.3 with quality flag = 0 (meaningful tropospheric retrievals) are used, as recommended by the product specification document (Boersma et al., 2011).
The averaging kernel is used to create modeled tropospheric NO 2 columns from the observation operator, which removes the contribution of the retrieval error due to the a priori profile assumed (Eskes and Boersma, 2003), as described by Miyazaki et al. (2012).The spatial resolution of the OMI data is much finer than that of the model used in this study ( 2.8 • , about 300 km in the equator).Thus, there are large representativeness errors in the model because of unresolved small-scale variations.To fill the spatial scale gaps and to obtain more representative data, a super-observation approach has been developed and applied to the OMI data, as described by Miyazaki et al. (2012).The super-observation error covariance matrix includes contributions from the measurement error and the representation error.

TES O 3 profile
TES onboard the Aura satellite was designed to measure the global, vertical distribution of tropospheric O 3 and its precursors (Beer, 2006;Bowman et al., 2009).TES is an infrared Fourier transform spectrometer (FTS) with high spectral resolution (0.1 cm −1 ) and a wide spectral range from 650 ton 3250 cm −1 .The version 4 level 2 nadir data obtained from the global survey mode are used in this study.This product consists of 16 daily orbits of nadir-viewing measurements with a spatial resolution of 5 × 8 km spaced 1.6 • apart along the orbit track every other day.The TES algorithm is described by Bowman et al. (2002), Worden et al. (2004), and Bowman et al. (2006).The vertical resolution of TES O 3 profile retrievals is typically 6 km in the tropics and in the summer hemisphere for cloud free conditions (Worden et al., 2004).The peaks of the TES O 3 averaging kernel matrix are generally in the middle troposphere, while its sensitivity is reduced greatly in the lower troposphere.On average, there are less than 2 DOFs for the tropospheric profile in the tropics (Jourdain et al., 2007).
The observation operator is applied to account for the vertical smoothing of the retrievals as reflected by the averaging kernel and for the TES a priori profile.This removes the influence of the a priori profile in the data assimilation, as performed by Jones et al. (2003).The observation error includes the smoothing error, the systematic error, and the measurement error.Vertical correlations due to the smoothing influence of the TES retrievals are accounted for in the forecast error covariance matrix through the influence of the averaging kernel.The TES data used in the data assimilation are filtered following the TES L2 Data Users Guide (Osterman et al., 2009).The C-Curve flag and the emission layer quality flag were used to exclude low-quality data.We excluded data poleward of 70 • , where satellite sensitivities are low because of the low brightness temperature.TES O 3 profiles are positively biased by less than 15 % from the surface to the upper troposphere (to 100 hPa) and negatively biased by less than 20 % from the upper troposphere to the lower stratosphere (100 to 30 hPa) compared to ozonesonde data (Worden et al., 2007;Nassar et al., 2008;Boxe et al., 2010).We will investigate the effect of the bias in TES O 3 data on the data assimilation in Sect.5.1.3.

MOPITT CO profile
The MOPITT instrument was launched onboard EOS Terra in December 1999.MOPITT measures thermal emission in the 4.7 µm and 2.2-2.4 µm absorption band.The equator crossing time is 10:30 LT/22:30 LT with global coverage every 3 days.The data employed are the version 5 level 2 TIR data (Deeter, 2011).The MOPITT instrument is mainly sensitive to free tropospheric CO, especially in the middle troposphere, but it also provides boundary layer information (Deeter et al., 2003(Deeter et al., , 2007(Deeter et al., , 2010)).DOF is typically much larger than 0.5, indicating that most of the information comes from the measurement as opposed to the a priori (Kopacz et al., 2010).Maximum zonal mean DOF values of approximately 1.5 occur in daytime overpasses over land in the tropics.
The retrieved error represents the cumulative error from the smoothing error, model parameter error, forward model error, geophysical noise, and instrument error.These are accounted for in the observation error covariance.We exclude MOPITT data in polar regions (>65 • latitude), where the quality deteriorates because of potential problems related to cloud detection and icy surfaces.Also, the retrievals in these regions have low information content related to poor thermal contrast conditions.Daytime conditions typically provide better thermal contrast conditions for TIR-based retrievals than nighttime conditions over land, whereas nighttime observations have not been validated and appear subject to larger bias (Heald et al., 2004).We thus exclude the nighttime MOPITT data using a filter based on solar zenith angle.The super-observation approach is applied to the MOPITT data in the same manner as for the OMI data.The representativeness error for the MOPITT super-observations derived from the variability of the observed concentrations in a super-observation grid-cell is typically much smaller (less than 5 %) than that for OMI tropospheric NO 2 columns.Validation results based on in situ profiles exhibit a bias of less than 1 % at the surface, 700 hPa, and 100 hPa, and nearly −6 % at 400 hPa for version 4 data (Deeter et al., 2010).No bias correction is applied to MOPITT data in this study, which may lead to slight bias in the estimated CO emissions.The MOPITT data on the 9 pressure levels (900,800,700,600,500,400,300,200,and 100 hPa) and at the surface are used in the data assimilation, while the data only at 700 hPa is used for the CO emission optimization.

MLS O 3 and HNO 3 profile
The MLS instrument was launched in August 2004 onboard the Aura satellite.Vertical profiles of several atmospheric parameters are retrieved from the millimeter and sub-millimeter thermal emissions measured in the atmospheric limb (Waters et al., 2006).The vertical resolution for the standard O 3 product is up to 2.5 km in the uppermost troposphere and stratosphere.We use the version 3.3 level 2 MLS O 3 and HNO 3 products.A detailed validation and comparison with other data sets is available in Livesey et al. (2011).
We used data with good quality flags, with quality fields greater than 0.6 (1.0), odd status fields, and convergence fields less than 1.18 (1.6) for O 3 (HNO 3 ), following the recommendations in Livesey et al. (2011).In the UTLS, the MLS version 3.3 retrieval provides data at 6 levels, 316, 261, 215, 150, 100, and 68 hPa.Since further evaluations are still required for data for pressures higher than 261 hPa, we use only data for pressures lower than 215 hPa.For HNO 3 , data for pressures less than 150 hPa are used because of large systematic uncertainties at 215 hPa (±30 %).Detailed instructions for screening tropical-cloud-induced outliers in the HNO 3 and O 3 products given in the version 3.3 data quality document (Livesey et al., 2011) were applied before data assimilation.Because the instrument's vertical resolution is reasonably comparable to the model grid, the averaging kernel is neglected.The measurement error is used as the diagonal element of the observation error covariance matrix, while the vertical correlation is neglected.

SCIAMACHY tropospheric NO 2 column
SCIAMACHY, which was launched in March 2002 on board ENVISAT (Bovensmann et al., 1999), is a passive remote sensing spectrometer observing backscattered, reflected, transmitted and emitted radiation from the atmosphere and the Earths surface, in the wavelength range between 240 nm and 2380 nm and with a spectral resolution of 0.25 nm in the UV and 0.4 nm in the visible.We use the version 2 tropospheric NO 2 data from the KNMI retrieval algorithm (Boersma et al., 2004(Boersma et al., , 2011)).The ground pixel of the nadir mode is generally 60 × 30 km, but depends on the solar zenith angle, with global coverage approximately once every six days.The local overpass time is 10:00 LT.The approach adopted to calculate the AMF is the same as that for DOMINO version 2 data.Errors in the slant column fitting, the stratospheric corrections, and in the AMFs lead to an overall error in the SCIAMACHY retrieval, as described in Boersma et al. (2004).The error for individual pixels can be approximated as 0.7 × 10 15 molec cm −2 + 25 % (Boersma et al., 2011).Cloud radiance fraction of less than 50 % with quality flag = 0 is used for the comparison.

GOME-2 tropospheric NO 2 column
GOME-2, which is an improved version of the GOME instrument, is a nadir UV-visible spectrometer (Callies et al., 2000).GOME-2 covers the spectral range between 240 nm and 790 nm and has a spectral resolution between 0.25 nm and 0.5 nm.The ground pixel size of GOME-2 tropospheric NO 2 retrievals is 80 × 40 km, with a global coverage within 1.5 day.The equatorial overpass time is at 09:30 LT in the descending node.This study employs the version 2 tropospheric NO 2 data from the KNMI retrieval algorithm (Boersma et al., 2004(Boersma et al., , 2011)).The error for individual pixels can be approximated as 0.7 × 10 15 molec cm −2 + 25 % (Boersma et al., 2011).Only observations with a radiance reflectance of less than 50 % from clouds with quality flag = 0 were used.

TES CO
Version 4 CO profiles retrieved from TES measurements are used for the validation.The TES CO retrievals are sensitive primarily to CO in the troposphere, with a DOF between 1 and 1.5 for the tropospheric profile.The maximum sensitivity appears in the lower troposphere, below 500 hPa (Parrington et al., 2008).
Global patterns of CO as measured by TES are in good qualitative agreement with those seen by MOPITT.The mean difference between column abundances of CO from TES and MOPITT was less than 5 %.TES CO agrees within the estimated uncertainty of the aircraft instruments, including both errors and the variability of CO itself (Luo et al., 2007;Ho et al., 2009).The TES and MOPITT retrievals both have a maximum sensitivity mainly from 300 to 800 hPa.

OMI/MLS tropospheric O 3 column (TOC)
Several approaches have been developed to derive global TOC from satellite measurements that involve subtracting the stratospheric O 3 column measured in the limb from the total O 3 column measured independently in the nadir (Ziemke et al., 2006;Schoeberl et al., 2007).The monthly mean TOC data derived using the OMI total columns and the MLS profiles from Ziemke et al. (2006) with a horizontal resolution of 1 × 1.25 • are used for the validation.Ziemke et al. (2006) produced TOCs at the MLS measurement locations in daylight, where OMI retrievals are available, and where it is not excessively cloudy.Note that the quality of the derived TOC can be very sensitive to the choice of the tropopause definition in this approach (Stajner et al., 2008).Outside the tropics, the large and rapid tropospheric O 3 variability complicates determining tropospheric O 3 , as it requires individual observations to be of sufficient accuracy.

Ozonesonde
Ozonesonde observations are taken from the World Ozone and Ultraviolet radiation Data Center (WOUDC) and the Southern Hemisphere Additional Ozonesondes project (SHADOZ) database.The accuracy of the ozonesonde measurement is about±5 % in the troposphere (Smit and Kley, 1998).The observation sites considered for the validation are listed in Table 3.We use data from 39 locations for a total number of 99 (89) observations in January (July) 2007.
To compare ozonesonde measurements with the simulation and the data assimilation, all ozonesonde profiles have been interpolated to a common vertical pressure grid, with a bin of 25 hPa.Then, for each interpolated observed profile, the co-located model profile is computed using the nearest neighbor grid point data for the linear space/time interpolation.The averaged profile is computed globally and for three latitudinal bands, the Northern Hemisphere (30-90 • N), the tropics (30 • S-30 • N), and the Southern Hemisphere (90-30 • S).The standard deviations of the normalized differences are computed over these regions.(Singh et al., 2009).Thornton et al. (2003), Bucsela et al. (2008), Hains et al. (2010) provide a detailed description and discuss the performance of the measurements.In the comparison between model and assimilation results, the data were binned on a pressure grid with an interval of 30 hPa, while the model output was interpolated to the time and space of each sample.Data collected over highly polluted areas (over Mexico City and Houston) have been removed from the comparison, since they can cause a serious representativeness error in the comparison (Hains et al., 2010).The comparisons were made for March 2006.

Data assimilation system
The CHASER data assimilation system (CHASER-DAS) is developed based on an ensemble Kalman filter approach.
This section introduces the forecast model, the data assimilation approach, and the experimental settings.

The global chemical transport model CHASER
The forecast model used in the data assimilation system is the global CTM CHASER (Sudo et al., 2002) (Kanamitsu et al., 2002) at each time step (i.e.every 20 min) in order to reproduce past meteorological conditions.As described by Miyazaki et al. (2012), the anthropogenic emissions are based on a yearly mean inventory of national emissions obtained from the Emission Database for Global Atmospheric Research (EDGAR) version 3.2 (Olivier et al., 2005).The Global Fire Emissions Data base (GFED) version 2.1 (Randerson et al., 2007), estimated on a monthly basis, is employed for emissions from biomass burning.The monthly biogenic emissions from vegetation, obtained via the GEIA inventory (Guenther et al., 1995), are considered for isoprene, terpenes, and other non-methane VOCs.NO x emissions from soils are based on monthly mean Global Emissions Inventory Activity (GEIA) (Graedel et al., 1993).The emissions over Asia were obtained from Regional Emission inventory in Asia (REAS) version 1.1 (Ohara et al., 2007).The emissions for the simulation years 2006-2007 are obtained by extrapolating the emissions inventories from the years 1995 and 2000.Emissions of lightning NO x are linked to convective cloud top height following the parameterization of Price and Rind (1992).The lightning NO x production is calculated at each time step of CHASER using the convection scheme in the AGCM.The total aircraft NO x emission is 0.55 Tg N yr −1 , which is obtained from the EDGAR inventory.We apply a diurnal variability scheme to the surface NO x emissions depending on the dominant category for each area: anthropogenic, biogenic, and soil emissions, as in Miyazaki et al. (2012).

Ensemble Kalman filter data assimilation
The data assimilation technique used in this study is a local ensemble transform Kalman filter (LETKF) (Hunt et al., 2007).The implementation is the same as in Miyazaki et al. (2012).The LETKF has conceptual and computational advantages over the original EnKF (e.g.Ott et al., 2004;Hunt et al., 2007).The LETKF performs the analysis locally in space and time, reducing sampling errors caused by limited ensemble size.It also reduces the computational cost by performing most calculations in parallel (Miyoshi and Yamane, 2007).Because of the large state vector size and the large number of grid cells in a global CTM, the computational advantages of the LETKF over the original EnKF is important for global tropospheric chemistry data assimilation.
The LETKF transforms a background ensemble (x b i ; i = 1, . . ., k) into an analysis ensemble (x a i ; i = 1, . . ., k) and updates the analysis mean, where x represents the model variable; b the background state; a the analysis state; and k the ensemble size.In the forecast step, a background ensemble, x b i , is globally obtained from the evolution of each ensemble model simulation.The background ensemble mean, x b , and its perturbations (spread), X b , are thus estimated from the ensemble forecast, These are N × k matrices, where N indicates the system dimension and k indicates the ensemble size.The background error covariance (P b = X b (X b ) T ) tends to underestimate the true background error covariance because of model errors and sampling errors (Houtekamer and Mitchell, 1998).To prevent the covariance underestimation, the covariance inflation technique (with a covariance inflation parameter of 5 %) is applied at each analysis step, as in Miyazaki et al. (2012).
In the analysis step, an ensemble of background observation vectors in the observation space, y b i = H (x b i ), is estimated using the non-linear observational operator H .An ensemble of background perturbations Y b = y b i − y b is also computed.The ensemble mean is then updated by where y o is the observation vector, R is the p×p observation error covariance, Pa is the Pa is the local analysis error covariance in the ensemble space.The new analysis ensemble perturbation matrix in the model space X a is simultaneously obtained by transforming the background ensemble X b .An ensemble simulation with the new analysis ensemble is then used to predict the new background error covariance X b in the next forecast step.Further details are described in Hunt et al. (2007) and Miyazaki et al. (2012).
EnKF approaches always have a spurious long distance correlation problem because of imperfect sampling of the probability distribution due to limited ensembles (Houtekamer and Mitchell, 2001).In complex chemical data assimilation systems, a realistic estimation of the background error distribution is very important (Singh et al., 2011;Massart et al., 2012).Boynard et al. (2011) demonstrated that the spatial correlations estimated from ensemble simulations are overestimated in the chemical model error covariance fields, and suggested the need for special attention to avoid

Atmospheric concentrations
Ensemble CTM simulations

Data assimilation cycle
Fig. 1.Schematic diagram of the data assimilation system.The ensemble model simulation with a priori emissions is used to provide the background error covariance information (X b ).The data assimilation is performed using the background error information and the observation information (y o ).Then the data assimilation provides a posteriori estimates of surface NO x emissions, surface CO emissions, lightning NO x emissions, and 3-D distributions of the chemical species (X a ).Assimilation of MLS O 3 and HNO 3 data affects the concentrations only above 260 and 220 hPa, respectively.O x is the sum of O 3 and O( 1 D), and NO x is the sum of NO, NO 2 , and NO 3 .See Sect. 3 for details.
too large correlation of fields distant from the location of the observation.A covariance localization technique is used to avoid possible degradation because of under sampling.We assumed that observations located far from the analysis point have larger errors and that those observations have less effect on the analysis (Miyoshi and Yamane, 2007).A correct choice of ensemble size and correlation lengths is important to improve the data assimilation performance, as will be discussed in Sect.3.3.4

Experimental setting
Three series of one-month data assimilation experiments have been conducted, starting from the 1 March 2006, 1 January 2007, and 1 July 2007.The March 2006 experiment was used to validate against the INTEX-B airc raft data, while the January and July 2007 experiments were used to compare the seasonal difference in the data assimilation performance.The data assimilation cycle is 100 min; e.g. each orbit cycle of polar-orbit satellites.This setting is useful to reduce the time discrepancy (sampling errors) between the observations and the model in the data assimilation, given distinct diurnal variation in tropospheric chemistry (Miyazaki et al., 2012).Figure 1 shows a schematic diagram of the data assimilation process.
In addition to the full assimilation run (with all the data), we have also conducted a control run (without any assimilation), five observing system experiments (see Sect. 4), an emission inversion run, and a fixed-flux assimilation run.In the emission inversion run, only surface emissions are opti-mized (i.e., without concentrations in the state vector).In the fixed-emission assimilation run, only concentrations are updated from the data assimilation (i.e., without emissions in the state vector).The emission inversion run and the fixedemission assimilation run have been compared with the full assimilation run in which both the concentration and emission are updated.The comparison allows us to understand the relative importance of the emission optimization and the direct concentration adjustment in the simultaneous assimilation (see Sect. 5.3).Further, we have conducted an idealized data assimilation experiment in which synthetic observations are derived from a perturbed model run.The results obtained from the idealized experiment confirmed that the data assimilation system is properly implemented, and the simultaneous optimization for O 3 concentration and its precursors emissions is a powerful framework for the tropospheric chemistry analysis (see Appendix A).

State vector
The state vector is chosen to include uncertain model aspects that most effectively optimize the tropospheric chemical system.First, emissions are a major source of uncertainty in CTM simulations.The solution of a tropospheric chemical model is only weakly influenced by the initial conditions, because of the strong stiffness of tropospheric chemical processes (Constantinescu et al., 2007;Lahoz et al., 2007).An improvement could be achieved by an ensemble obtained by perturbing various parameters of the model (emissions, reaction rates, etc.).The EnKF can be extended to include such parameters in the data assimilation process.A state vector which includes both the concentrations and the emissions makes it possible to find the optimal values for the emissions, which are linked to the concentrations by the CTM.In the EnKF system, the background error covariance, estimated from the ensemble CTM simulations, varies with time and space, reflecting dominant atmospheric processes.The local analysis increment for emissions thus reflects the complex indirect relationship between concentrations and emissions of related species.
The surface emissions of NO x , e(NO x ), the surface emissions of CO, e(CO), the lightning sources of NO x , e(LNO x ), and the concentrations of all the predicted (i.e., transported, total 35) chemical species, c, are optimized at all the models grid cells for each data assimilation cycle.The concentrations of radical and members of family species are not included in the state vector.The data assimilation influences their concentrations through the chemical coupling during the forecast.The background ensemble can be represented as follows, ( Although the data assimilation system simultaneously updates emissions of NO x and CO, we treat the data independently and do not include NO x -CO emissions covariance in the background error matrix.This is to avoid the effects of spurious multi-variate correlations in the background error covariance, possibly developed because of limited ensembles, and errors in both model and observations.However, the forecasted atmospheric concentrations of NO 2 and CO are coupled chemically through their effect on the tropospheric chemistry. Based on sensitivity experiment results (see Sect. 4), we have also applied the variable localization to improve the analysis.This means the covariance among non-or weaklyrelated variables is set to zero.This technique allows us to neglect the correlations among variables that may suffer significantly from spurious correlations.The optimization of the variable localization was based on a comparison against satellite data.If the data assimilation significantly deteriorated the agreement with at least one of the data used for the data assimilation and the validation, variable localization was applied to reduce the deterioration by considering dominant chemical processes, as will be further described in Sect.4.2.The state vector structure used is summarized in Fig. 2. With the technique, lightning NO x sources are optimized using TES O 3 , OMI NO 2 , and MLS O 3 and HNO 3 observations, whereas the covariance between CO concentration and lightning NO x sources was set to be zero, since their error correlation are not expected to contain meaningful information.Similarly, OMI tropospheric NO 2 column data are used to update the concentrations of NO y (= NO x + HNO 3 + HNO 4 + PAN + MPAN + N 2 O 5 ) species only, since the ensemble may not contain meaningful information on the profile of other chemical species.For the same reason, and related to their poor quality, MLS HNO 3 data are only allowed to influence the NO y species in the analysis.Similarly, MOPITT CO data affect the concentration of CO, hydrocarbons, and formaldehyde only.CO emissions are optimized using MOPITT CO data only.The variable localization is found to significantly improve the analysis (see Sect. 4.2).

Parameter estimation
A diurnal variability is implemented for the NO x emissions as in Miyazaki et al. (2012), depending on the dominant source category for each area.The lightning NO x sources vary in time and space, reflecting the variability in meteorological fields.However, because a model error term is not implemented during the forecast step, the background error covariance can be continuously deflated and underestimated during the data assimilation.To prevent covariance underestimation during the data assimilation, we have applied a covariance inflation to the analyzed emission as in Miyazaki et al. (2012).The analyzed standard deviation (i.e., background error) is artificially inflated to a minimum predefined value at each analysis step.This minimum value is chosen as 30 % of the initial standard deviation, based on sensitivity experiments.Because of the absence of any forecast model (i.e., model bias) to the emissions, and of the use of the background covariance inflation, initial bias in the a priori emissions can be reduced gradually through the data assimilation cycle using the state-augmentation approach, as discussed by Lin et al. (2008).
The initial error is set to 40 % of the a priori emissions for surface emissions of NO x and CO.For lightning NO x sources, the initial error is set to 60 %, considering large discrepancies among different estimates (Schumann and Huntrieser, 2007).For the concentrations, it is set to 10 %.Although the optimized emissions (i.e., the analysis mean) and the uncertainty (i.e., the analysis spread) are not strongly sensitive to the choice of the initial error after some assimilation cycles (e.g.several weeks) because of the analysis applied for both the mean and spread fields and the use of the inflation, convergence is generally attained faster in the case for larger initial uncertainties.

Observation error
The observation error covariance matrix contains the measurement error provided by each retrieval.The representativeness error is also considered for the OMI NO 2 and MO-PITT CO super-observations as in Miyazaki et al. (2012).The off-diagonal components are neglected for MLS data; the observation error of one measurement is assumed to be independent of the observation error of other measurements.For TES O 3 and MOPITT CO data, the full error covariance is used, including correlations between vertical layers.We also account for the influence of the averaging kernels from the instruments, which captures the vertical sensitivity profiles of the retrievals.The horizontal correlation in the observation error covariance matrix is neglected.We do not attempt to remove possible biases from the observations before assimilation, mainly because of the difficulty in estimating the true bias structure; this will be further discussed in Sect.5.1.2.

Assimilation parameters
Since the DOF of the state vector employed in this study is large (∼ O(10 6 )), a large ensemble size is essential to capture the proper background error covariance structure, but at the expense of an increased computational cost.We have optimized the data assimilation parameters based on sensitivity experiments.The observation-minus-forecast (OmF) analysis (see Sect. 5.1.1)was used to choose the best value of the ensemble size and localization length, as summarized in Table 4.The sensitivity experiment showed that the analysis is improved significantly by increasing the ensemble size from 16 to 32 and is further somewhat improved by increasing it from 32 to 48, as seen in the OmF reduction in the comparison, for instance, with MOPITT CO, MLS O 3 , and MLS HNO 3 data.In contrast, the impact was much less significant by increasing it from 48 to 64.The ensemble size is accordingly set to 48.The sensitivity experiments also show that the analysis results are sensitive to the horizontal localization length.The inclusion of spatial correlations with appropriately chosen correlation lengths leads to improvements.From the sensitivity experiments, the horizontal localization length was set to 450 km for NO x emissions and 600 km for CO emissions, lightning NO x , and the concentrations.Too short localization length (i.e.half size) increases the OmF error, for instance, for MOPITT and MLS data, because of the neglected influence of remote observation information.Although the larger localization length (i.e.double size) somewhat reduces the OmF for some cases, we use the abovementioned setting to avoid possible serious spurious correlations.The physical vertical localization length was set to lnP [hPa] = 0.2 based on sensitivity experiments (results not shown).The optimal length, however, may depend on the location, season, species, and model resolution (Pajot et al., 2011), reflecting the chemical lifetime of the species and atmospheric wind patterns.

Observing system experiments
Observation system experiments (OSEs) are used to study how each individual observational data set improves the overall performance.We have conducted five OSEs by separately assimilating OMI NO 2 , TES O 3 , MOPITT CO, MLS O 3 , and MLS HNO 3 data, and the results are compared with the control run (without any assimilation) and the full assimilation run (with all the data).

Background error covariance
The background error covariance estimated from ensemble simulations allow unobserved species to be constrained by observed species.Inter-species adjustment can be expected when observed and unobserved species chemically interact on a time scale of the order of the assimilation cycle.The background error covariance follows from the assumption that the background ensemble perturbations X b sample the forecast errors, Figure 3 shows the simulated global mean background error covariance structure, P b .The covariance analysis shows tight correlations between variations in surface emissions and low-level concentrations of chemically-related species.NO x emissions show strong positive correlations with low-level (950 hPa) concentrations of NO x (r = 0.66), O x (r = 0.60), N 2 O 5 (r = 0.69), HNO 3 (r = 0.62), and HNO 4 (r = 0.59), whereas its relation to upper-level (500 hPa) concentrations is much less significant.CO emissions have a significant correlation with the lower tropospheric CO concentration (r = 0.74), but does not relate to other species obviously.Because of the time delays associated with vertical mixing, the middle tropospheric CO is generally delayed in phase, with less variability associated with the CO emission variability.Positive correlations are found between lightning NO x sources and concentrations of O x (r = 0.18) and NO y species (e.g.r = 0.30 for NO x ) in the middle troposphere, demonstrating the potential to constrain lightning NO x sources from those observations.Note that correlations with lightning NO x sources are more robust in the tropics (r = 0.30 for O x and r = 0.36 for NO x between 25 • S and 25 • N) than the global mean.Negative correlations are also found between reactive species.For instance, r =-0.63 between O x and C 2 H 4 (ethene) at the surface results from the removal of C 2 H 4 as a result of the fast reaction with OH and O 3 (Sawada and Totsuka, 1986).The background error covariance shows significant correlations among the concentrations of related chemical species, reflecting the complex tropospheric chemical processes.For instance, O x shows large correlations with NO y species, CO, CH 2 O, SO 4 , and PAN at low levels (with r > 0.30).NO x shows a similar covariance structure, reflecting strong chemical links between O x and NO x (with r = 0.41) both in the lower and middle troposphere.There are large correlations among the hydrocarbons throughout the troposphere.
The background error structure strongly depends on the model characteristics, and it may have a critical effect on the data assimilation performance.In complex chemical data assimilation systems, a realistic estimation of the background error distribution is very important, given the noisy  observations along with imperfect model predictions, as suggested by Singh et al. (2011).

Results
The OSEs confirm that the assimilation of each species data set has a strong influence on both assimilated and nonassimilated species through the use of the inter-species error correlation and through the chemical coupling provided by the model forecast.The assimilation of OMI NO 2 data provides some changes in O 3 and CO concentrations, whereas the assimilation of TES O 3 data has some effects on NO 2 fields, as will be shown in Sect.5.1.2..These changes are tightly associated with the changes in OH because of the chemical interactions in the CO-OH-NO x system, as depicted in Fig.

January July
Fig. 5.The differences in the global spatial correlation, the global mean bias, and the global RMSE between the data assimilation runs and the control run for the 16-30 (from the 7-30 only for the ozonesondes) of January (left) and July (right) in 2007.These scores are first estimated from the comparison against observations listed at the bottom (assimilated data in black and independent data in blue), and then compared with the control run.For the spatial correlation, the difference (the data assimilation runs minus the control run) is positive (negative) when the spatial correlation is higher (lower) in the data assimilation runs than in the control run.For the bias and the RMSE, the error reduction rate defined by Eq. ( 7) is plotted; the positive (negative) value represents that the error is smaller (larger) in the data assimilation runs than in the control run.A reduction rate of 100 % indicates that the error in the model is completely removed by the data assimilation.The results are shown for six different data assimilation runs (the full assimilation run and the five OSEs).The number shown in the bottom list represents the approximate altitude level in hPa.
gradient in OH concentration in the free troposphere is reduced in July.
The OSEs quantify the improvement due to the assimilation of each individual species data set in comparison with the assimilation of all data sets and the control run with-out assimilation.The global spatial correlation, root-meansquare-error (RMSE), and mean bias for 15-days (from the 16th to the 30th of each month) mean fields were estimated for the control run and the OSEs.The improvement rate due to each data set was then estimated by comparing scores between the control run and the OSE, as shown in Fig. 5.For the RMSE and bias, the error reduction rate is estimated by comparing these statistics between the control run (E cntl ) and the OSE (E OSE ) as follows, When the global mean model bias of the control run is nearly zero, the error reduction rate is not meaningful and is set to zero.This is done for the comparisons with TES O 3 data at 700 hPa in January, TES CO data at 700 hPa in July, and ozonesonde data between 450 and 200 hPa in July.The nearly zero bias compared to TES O 3 and CO data at 700 hPa can be largely attributed to the very small sensitivity of the retrievals at these levels, and does not reflect the true model bias which may be large.
The comparison demonstrates significant improvements of the scores obtained by the assimilation.Improvements in the non-assimilated chemical species show that the ensemble simulation is capable of correctly representing inter-species error correlations and propagating observation information through assimilation cycle.For instance, the assimilation of MLS O 3 and HNO 3 data leads to an improved agreement with OMI NO 2 , as shown by the large reduction of the bias.Furthermore, all the assimilated data sets improve the agreement with O 3 profiles obtained from ozonesondes in July, as will be further discussed in Sect.5.1.2.
Note that the effect of the data assimilation on nonobserved species is not always positive.Consideration of inter-species error correlations sometime causes the error to grow.Optimization of the state vector structure is thus conducted to minimize the error based on the OSEs, by neglecting the inter-species correlations that result in serious error growth; the optimized state vector is depicted in Fig. 2. Serious negative effects arose, for example, from the TES O 3 data assimilation on CO fields, the MLS HNO 3 data assimilation on CO fields, the OMI NO 2 data assimilation on O 3 fields, and the MOPITT CO data assimilation on O 3 fields (figure not shown).This is primarily because limited ensembles can cause spurious error correlation among chemical species, especially for species having insignificant chemical links.For instance, because of its relatively long chemical lifetime (∼2 months), CO may not have significant correlations with chemically active species such as O 3 in the lower troposphere with a time scale on the order of the data assimilation cycle.Similarly, the OMI NO 2 tropospheric columns may not have enough information to directly constrain the vertical profile of O 3 because of its smooth averaging kernel profile and large observation error.Since we applied the variable localization to avoid these negative influences (see Sect. 3.3.1),the full assimilation run provides the best performance among the individual data assimilation in most cases, see Fig. 5.Note that all the observation data can affect all the chemical fields throughout the forecast.5 Data assimilation results

Self-consistency tests
An important test for the quality of data assimilation is whether the differences between the model forecast and observations (the innovations) are consistent with the covariance matrices for the model forecast and observations (e.g.Segers et al., 2005;Lahoz et al., 2007).The background covariance matrix is important in reaching an appropriate balance between the background and the observations.A quantitative criterion for the choice of the background error is a chi-square (χ 2 ) test, the χ 2 diagnostics (e.g.Ménard and Chang, 2000).χ 2 should approach 1 if the background error covariances are properly specified, while a value higher (lower) than 1 indicates an underestimation (overestimation) of the background error covariance matrices.The χ 2 determined for each assimilated data set is shown in Fig. 6.The χ 2 is greater than 1 for the MLS O 3 and HNO 3 data assimilation, indicating too much confidence in the model.The model overconfidence is associated with the limited lower stratospheric variations in the ensemble, which are strongly constrained by the fixed upper boundary conditions in CHASER.
For MOPITT CO and TES O 3 data assimilation, χ 2 lower than 1, which indicates a possible overestimation of the background errors, may result in too much correction of the model fields.
Figure 7 shows the latitude dependence of the bias and root-mean-square (RMS) innovation of the OmF computed in the observation space.The innovation between forecast and assimilated data is a sum of three contributions; the observation error, the forecast error, and the representativeness error caused by mismatches between the satellite ground pixel and the model grid cell (Eskes et al., 2003;Lahoz et al., 2007).A persistent model bias is found in the underestimation of tropospheric NO 2 columns compared to OMI NO 2    data, the overestimation of the middle tropospheric CO in the extratropics compared to MOPITT CO data, and the overestimation in the middle and upper tropospheric O 3 compared to TES and MLS O 3 data.The data assimilation removes most of the OmF bias.The large reduction of the O 3 OmF bias for TES O 3 data in the middle and upper troposphere, which reflected the reduction of the OmA bias, implies that TES O 3 has meaningful information for constraining the O 3 fields at these altitudes, as similarly reported by Parrington et al. (2008).In contrast, the bias reduction is not obvious in the lower troposphere (800-500 hPa), especially at high latitudes.This is because the DOFs of the TES retrieval in the troposphere are generally smaller than 1 poleward of 45 • and TES has little sensitivity to the lower tropospheric O 3 (Worden et al., 2004;Osterman et al., 2008).The near zero OmF bias for MLS O 3 in the data assimilation reflects a good coverage and high quality of MLS O 3 data.A long lifetime of O 3 in the UTLS also helps to accumulate the observation information.The observation-minus-analysis (OmA) histogram shows a more pronounced peak than that for OmF (closer to a Gaussian curve, figure not shown) in many cases, as the analysis is closer to the assimilated observations than to the forecast, as shown by Miyazaki et al. (2012).
The standard deviation about the mean of the OmF was found to be mostly equal to the observation error, indicating that the data assimilation captures the observed variability well and satisfies the data assimilation assumptions.A substantial part of the RMS of the OmF has been removed by the data assimilation for MOPITT CO, MLS O 3 , and TES O 3 in the middle and upper troposphere.A reduction of the RMS is less pronounced for OMI NO 2 , TES O 3 in the lower troposphere, and for MLS HNO 3 .These are, respectively, related to rapid spatiotemporal variations and large errors in the observed NO 2 , small sensitivities to the true profile (i.e.small averaging kernel), and large observation errors.

Comparison with satellite data
The data assimilation results are validated against independent data, as listed in Table 5 and shown in Fig. 5.The tropospheric NO 2 columns are compared with GOME-2, SCIA-MACHY, and OMI data.Differences among the retrievals mainly reflect diurnal variations of chemical processes and emissions, because a very similar algorithm is used for the retrieval of these data.The viewing pixel size difference will not affect the comparison results too much, since these retrievals are gridded to the same resolution (2.5 • × 2.5 • ), using weighting factors for the surface overlap between the satellite pixel and grid cell.The data assimilation largely improves the agreement with these data.The improvement is most pronounced in January.The data assimilation increases the spatial correlation by about 0.15-0.21,decreases the bias by about 85 % (except for SCIAMACHY), and decreases the RMSE by about 30 %.These improvements are mainly attributed to the increased NO 2 columns over East China and Central Africa and the decreased columns over Europe (Fig. 8).The OSEs confirm that these improvements are mainly due to the OMI NO 2 data assimilation (Fig. 5).The assimilation of MLS O 3 and HNO 3 data also contributes significantly to the reduction of the negative NO 2 column bias compared to OMI and GOME-2 (in January only) data, by increasing the upper tropospheric NO 2 concentration.In contrast, the bias compared to independent GOME-2 (in July only) and SCIAMACHY data is increased by the data assimilation.The errors in the simulated diurnal NO 2 variations along with a bias between OMI and these retrievals may cause the bias to increase.The diurnal variations are especially important in the warmer seasons (e.g. in July in the Northern Hemisphere), when chemistry is sufficiently fast to make a difference between morning and early noon NO 2 columns (Boersma et al., 2009).
The global mean negative bias and large RMSE in the model simulation against the MOPITT CO are mostly (by 40-90 %) removed by the data assimilation, while a very high spatial correlation (0.92-0.97) is maintained.The reduced negative bias is primarily due to the enhanced concentrations over East Asia, North America, and northern Eurasian continent (Fig. 10).Because of the long lifetime of CO, the data assimilation system is able to capture the observed CO variability.This improvement is mostly achieved by the MOPITT CO data assimilation, while the assimilation of other data sets slightly (typically less than 5 %) affected the comparison through their influence on the OH fields.
The data assimilation also improves the spatial correlation with the independent TES CO data both at 700 and 300 hPa, reflecting enhanced concentrations over China, India, Central Africa, and North America, and reduced concentrations over South America (Fig. 10).However, the global mean bias and RMSE mostly increases due to the data assimilation, primarily reflecting too high concentrations at high latitudes.The bias increase is possibly due to a systematic bias between MOPITT and TES.Luo et al. (2007) showed that MOPITT CO version 3 data have higher values than TES CO version 2 data, with a mean difference of −4.8 % at 150 hPa and −0.2 % at 700 hPa.This result is consistent with our data assimilation result, but with different data versions.The assimilation of other species (i.e., NO 2 , O 3 , and HNO 3 ) data contributed slightly to improve the agreement.
The data assimilation greatly improves the agreement with the TES O 3 data in the upper troposphere (300 hPa), with a bias reduction of up to 65 % and an RMSE reduction of about 40-50 %.The improvements are mainly due to increased concentrations in the southern extratropics (Fig. 9), which are achieved by the TES O 3 and MLS O 3 data assimilation.The assimilation of OMI NO 2 data acts to reduce the bias compared to TES O 3 data at 700 hPa in July, implying the importance of optimizing O 3 precursors fields.However, the improvement is not significant in the simultaneous data assimilation fields in the lower troposphere.The TES O 3 sensitivity is reduced greatly in the lower troposphere especially due to the presence of clouds and makes it difficult to improve the analysis.Validation or assimilation is virtually meaningless when the retrieval sensitivity is very low.Since we applied the averaging kernel and the a priori profile information in the comparison (Eq.1), substantial adjustments in the assimilation or differences in the validation only occur when there is a meaningful signal (i.e. the retrieved profile minus the retrieval a-priori).
The MLS O 3 data assimilation is very effective in removing the positive O 3 model bias in the UTLS because of its wide and dense coverage and good quality, as similarly shown by Jackson (2007) and Feng et al. (2008).The global RMSE against MLS O 3 data is also reduced by TES O 3 data assimilation.However, the OSEs confirmed that the assimilated concentration becomes too high because of the TES O 3 assimilation compared to MLS O 3 data.The OSEs suggest that TES O 3 concentration is higher (lower) in the tropics (extratropics) than MLS O 3 concentration, with a mean difference of 20-40 ppb at altitudes between 200 and 80 hPa.
The bias and the RMSE compared with MLS HNO 3 data have also been largely removed by the data assimilation.The improvement is primarily due to MLS HNO 3 data assimilation, but MLS O 3 data assimilation also contributes to the improvement, as seen in reduced HNO 3 bias.This indicates that MLS O 3 data have meaningful information about the abundance of HNO 3 in the UTLS, through atmospheric transports and the chemical link.In contrast, the decreased (increased) spatial correlation (RMSE) due to the MLS O 3 data assimilation may be related to errors in the background error covariance or poor data quality either in MLS HNO 3 or MLS O 3 data, especially in the upper troposphere.
The improved agreement with TOC data obtained from the independent MLS/OMI data is mainly attributed to the assimilation of TES O 3 data because of their strong sensitivity to tropospheric O 3 in the tropics.For instance, the high columns over the Atlantic and in the southern subtropics (from South Africa to Australia) are better captured by the data assimilation (Fig. 9).However, the data assimilation still has difficulty in reproducing the observed features.For instance, longitudinal variations with a persistent wave-1 pattern in the southern tropics are larger in both the model and assimilation compared to the OMI/MLS product.This may indicate a difficulty in correcting processes responsible for the enhanced ozone in the Atlantic (e.g. via rapid convective updraft).At the same time, there is large uncertainty in the retrieved TOC.Measuring tropospheric O 3 from space is challenging because of large amount of stratospheric O 3 in the total column, while the separation between the troposphere and stratosphere strongly depends on the tropopause definition (e.g., Bethan et al., 1996).

Comparison with ozonesonde data
Figure 11 shows the comparison against the ozonesonde data.Without assimilation the global mean bias with the ozonesonde is large, up to 30 % in the free troposphere and 40 % in the lower stratosphere.The data assimilation removes most of the bias from the middle troposphere to the lower stratosphere, down to within 10 %.It also reduces the RMSE by about 20 % in the middle troposphere and by 50 % in the UTLS.Significant positive biases in simulated O 3 in the UTLS are also mostly removed by the data assimilation, whereas the simulated O 3 profiles suffer from errors in stratosphere-to-troposphere exchange (STE).The great improvements in the UTLS reflects the long chemical lifetime of O 3 and the fact that satellite retrievals capture the large scale variations of O 3 well.The effect of the data assimilation on the lower tropospheric O 3 below about 850 hPa is not obvious on a global scale, implying that further constraints are needed on the near surface O 3 and its precursors (e.g.VOCs).Parrington et al. (2009) demonstrated that the changes in the O 3 flux from the free troposphere into the planetary boundary layer (PBL) by the TES O 3 assimilation reduces the positive bias in the PBL indirectly over North America.Although this effect is not confirmed by our global analysis, it is of interest to survey the detailed spatial distributions resulting from the data assimilation.
The OSEs demonstrate that these improvements are mainly due to the assimilation of TES data in the free troposphere (between 750 to 200 hPa) and both TES and MLS O 3 data in the UTLS (between 200 and 90 hPa).TES data provide valuable constraints on the free tropospheric O 3 .Although the MLS data do not extend down to altitudes below 260 hPa, the MLS assimilation influenced the ozone analysis even below this level through the vertical propagation of the observation signal mainly via the extratropical downward motion.It is emphasized that all the assimilated datasets contribute to reduce the global mean bias between 750 and 450 hPa and and between 200 and 90 hPa in July.This indicates that the simultaneous assimilation of multiple chemical observations is effective to improve tropospheric O 3 , by their influence on the precursor emissions and chemical processes that affect the O 3 concentrations.In contrast, the improvement by the non-O 3 data assimilation is not obvious in January.This may reflect the seasonal difference in the chemical links between O 3 and other species.Since most ozonesonde sites are located in the Northern Hemisphere, the greater improvement in July may be related to summertime active chemical processes in the Northern Hemisphere.Much less ozone is produced from the precursors in winter than in summer (e.g.Liu et al., 1987).
The assimilated O 3 fields show a persistent positive bias compared to the ozonesonde data, with a global mean bias TES CO at 700 hPa MOPITT CO at 500 hPa MLS HNO 3 at 215 hPa of up to 15 %, below 300 (500) hPa in January (July).The OSEs demonstrate that the positive bias can be attributed to the assimilation of TES O 3 data.The positive bias in TES O 3 data compared to ozonesonde data is reported by Nassar et al. (2008) and Worden et al. (2009).A data assimilation experiment with a bias correction (a uniform 3.3 ppbv bias above 500 hPa and a 6.5 ppbv below 500 hPa, according to Worden et al., 2009) reduces the negative bias in the data assimilation (Fig. 12), demonstrating the importance of bias correction before data assimilation.However the effect of bias correction is not always positive, causing too low concentrations in the middle troposphere in both January and July.A more accurate estimation of the spatially-varying bias is thus required to improve the analysis.

Comparison with aircraft data
Comparisons with aircraft measurements from the INTEX-B campaign allow us to look into the effect of data assimilation on various chemical fields (Fig. 13).The observed NO x concentrations show a decrease from the boundary layer to the free troposphere.Oxidation of NO x to HNO 3 and other minor products dominates NO x loss in the boundary layer, whereas conversions to HNO 3 and PAN dominate it in the free troposphere (e.g.Staudt et al., 2003).The increase in HNO 3 toward the surface is driven by chemical production of HNO 3 in polluted areas, while HNO 3 is depleted in the troposphere because of deposition processes.Compared to the observed profiles, the simulated NO 2 is slightly lower in the boundary layer and too low in the free troposphere, while HNO 3 is too high by 500 pptv in the boundary layer.
Observed PAN shows a maximum above the boundary layer and a minimum in the free troposphere, while the simulation overestimates (underestimates) it by 80 pptv (by 200 pptv) in the boundary layer (the upper troposphere).Observed O 3 shows a maximum near 900 hPa and decreases toward the lower free troposphere, while the simulation slightly underestimates it, except near the surface.Observed HO 2 and CH 2 O decrease with altitude, reflecting the decrease in water vapor (Heikes, 1992) and the boundary layer source from oxidation of isoprene (Millet et al., 2006), respectively.The simulation captures the observed features of CH 2 O well, but overestimates HO 2 by 10 pptv throughout the troposphere.The data assimilation improves the agreement with the aircraft observations for NO 2 , O 3 , and PAN.Underestimations of these species concentrations are generally reduced by the data assimilation.Chemical production of O 3 is strongly related to the abundance of NO x and OH.The NO x emissions tend to increase OH via the NO and HO 2 reaction and the O 3 (and excited oxygen atoms) and H 2 O reaction, while the CO emissions tend to decrease OH.Corresponding to the increased NO x emissions and decreased CO emissions Table 6.The 15-day means (from the 16-30 of each month) and the standard deviations (from the means) of the global and regional surface NO x emissions (e(NO x ), in Tg N yr −1 ), lightning NO x emissions (e(LNO x ), in Tg N yr −1 ), and surface CO emissions (e(CO), in Tg CO yr −1 ) obtained from the a priori emissions and the a posteriori emissions.GL is global (90 • S-90 • N); NH is the Northern Hemisphere (20 • N-90 • N); TR is the tropics (20 • S-20 • N); and SH is the Southern Hemisphere (90 • S-20 • S).The emissions optimized from the full assimilation run and the emission inversion run (in bracket) are presented.at low latitudes, the data assimilation increases OH and O 3 .Because of the low sensitivity of TES in the lower troposphere, the changes in near surface O 3 are largely attributed to the change in NO x emissions, as will be further discussed in Sect.5.3.The assimilated fields still underestimate the concentrations of NO and PAN in the free troposphere, and overestimate HO 2 throughout the troposphere.Martin et al. (2007) concluded that oxidation of lightning NO x explains nearly 80 % of the HNO 3 concentration in the tropical upper troposphere.Increasing the lightning NO x source also decreases HO 2 in the upper troposphere, while increasing OH (Hudman et al., 2007).The increased NO x results in a reduction of the HO 2 /OH ratio through the NO + HO 2 and NO 2 + HO 2 reactions, and also results in an increased loss of OH via production of HNO 3 (DeCaria et al., 2005).Thus, it is likely that more NO x sources in the free troposphere are required to reduce the negative bias of NO 2 , NO, PAN, and HNO 3 and the positive bias of HO 2 in the free troposphere.

Globe
The overestimated H 2 O may also contribute to the overestimation in the concentrations of OH and other HO x species through its reaction with excited oxygen atoms.The data assimilation tends to increase the overestimation in HNO 3 concentration in the boundary layer, corresponding to the increased NO 2 concentration.Simultaneous adjustments for its removal processes (e.g.wet and dry depositions) might be important to further improve the analysis.Removal of HNO 3 by wet deposition processes occurs within a few days in the lower troposphere and results in the loss of HO x species, which may also explain a part of the overestimation in HO x species concentrations.Meanwhile, a large uncertainty in both observed and simulated OH concentrations in the free troposphere remains an important issue (e.g.Hudman et al., 2007).There are many other factors in the chemical transport processes affecting the overall model performance.They may obstruct further improvements by the data assimilation.

Estimated emission sources
The simultaneous optimization of multiple species leads to complex chemical interactions which together determine the estimated emissions.Especially, the imperfect representation of OH fields may cause large uncertainties in the NO x and CO emissions inversion (Müller and Stavrakou, 2005;Jones et al., 2009;Pison et al., 2009;Hooghiemstra et al., 2011).Müller and Stavrakou (2005) demonstrated that the optimization of CO emissions constrained by both CO and
In our system, as shown in Figs. 4 and 5 and discussed in Sect.4.2, all the assimilated data significantly influence concentrations of OH, NO 2 , and CO.The assimilation of OMI NO 2 data generally increases (decreases) the OH concentration in the tropics (extratropics) by 15 %, which affects the atmospheric CO lifetime and influence the CO emission inversion.Meanwhile, the higher CO emissions lead to a decrease

NOx emissions CO emissions
Lightning NOx JAN (a priori) JAN (incr.)JUL (a priori) JUL (incr.)Fig. 14.Global distributions of the surface CO emissions (in 10 −10 kg m −2 s −1 ) (left panels), the surface NO x emissions (in 10 −11 kg m −2 s −1 ) (centre panels), and the lightning NO x emissions (in 10 −11 kg m −2 s −1 ) (right panels), averaged over the 16-30 January (upper 6 panels) and July (lower 6 panels) in 2007.The a priori emissions (upper rows) and the analysis increment (lower rows), i.e. the difference between the a posteriori and the a priori, are shown for each panel.The red (blue) colour indicates an emission increase (decrease) for the analysis increment, respectively.Table 8.Same as in Table 7, but for the regional surface CO emissions (e(CO), in Tg CO yr −1 ). in OH abundances and slightly increases NO 2 concentration in the extratropics.The simultaneous data assimilation thus provides comprehensive constraints on the emission inversion.It is expected that the simultaneous data assimilation provides a better estimate of the emissions than the inversion run because the concentration assimilation may reduce some of the model errors.However, this will not be the case for all model errors.For instance, errors in boundary layer venting or deposition may be compensated in our assimilation system by (incorrectly) changing the emissions.The a priori and a posteriori emissions estimated from data assimilation are shown in Fig. 14 and listed in Table 6, 7, and 8.Note that the a priori surface emissions for the simulation years 2006-2007 were obtained by linearly temporal extrapolating the 1995 and 2000 inventories.

NO x emissions
The data assimilation changes the global total NO x emissions from 42.8 to 42.9 Tg N yr −1 in January and from 46.7 to 52.0 Tg N yr −1 in July.The a posteriori and the a priori emissions differ more significantly at the regional scale.The analysis increment is generally positive over Eastern China, North America (only in January), Australia, Northern India (only in January), Southeast Asia, and Southern Africa.An obvious increment is observed over Eastern China, with a factor of up to about 1.6 in January.Over the Eastern United States, the a posteriori emissions are higher than the a priori emissions in January, but are lower in July.The a posteriori emissions are lower than the a priori emissions over Europe, unlike over other industrial areas.Over Central Africa, the data assimilation increases the emissions in January.Over Northern Africa, the data assimilation decreases the emissions in January, but increases the emissions in July.Most of these features are also reported in Miyazaki et al. (2012).As a result of the data assimilation and the covariance inflation, the mean a posteriori error for the surface NO x emissions typically ranges from 12 to 60 %, with smaller relative errors over polluted areas than over clean areas.The mean differences between the a priori and the a posteriori emissions are generally larger than both the a posteriori error and the variability (i.e., standard deviation) of the a posteriori emissions estimated during the analysis period.
The analysis increment structures obtained from the data assimilation strongly depend on the assumption made on the a priori emission.In CHASER, the 1995 and 2000 emission inventories are extrapolated to the simulation years 2006-2007.This procedure may give spurious results for certain regions, as described in Miyazaki et al. (2012).However, the bottom-up emissions obtained from the newer inventories (EDGAR version 4.2 (European Commission, 2011), GFED version 3.1, and GEIA) for the year 2007 show a similar difference with the a posteriori emissions.This indicates common problems in the emission inventories (e.g.too little emissions over Eastern China, the Eastern United States in January, Central Africa in January, Northern Africa in July, and Southern Africa in January, and too much emissions over Europe in January, the Eastern United States in July, South America in July).Note that the a posteriori emissions are closer to the newer inventories than the a priori emissions in some cases (e.g. over the Eastern United States, Eastern China, Europe, and Southeast Asia).In particular, the a priori emissions in Spain are unrealistically high, which are very different from both the a posteriori emissions and the newer inventories.
The simultaneous data assimilation system results in NO x emissions somewhat different from the emission inver-sion system in which only surface emissions are optimized (brackets in Tables 6 and 7).This indicates that the direct adjustment to the concentration fields by the data assimilation provides important effects upon the emission inversion, with a regional difference of up to 40 % over industrial areas and up to 30 % over biomass burning areas.For instance, the emissions over Central Africa in the simultaneous data assimilation are smaller than in the emission inversion, which is attributed to the increased NO 2 concentration in the middle and upper troposphere mainly due to the assimilation of TES O 3 data.The smaller emissions over the Eastern United States for July in the simultaneous data assimilation results from the larger NO 2 concentrations in the middle and upper troposphere, primarily by the adjustment made directly to the concentrations due to the assimilation of OMI NO 2 data.
The January and July mean global surface NO x emissions of 47.4 Tg N yr −1 estimated from the data assimilation is slightly larger than the annual mean emissions estimated from previous studies (e.g.42.1 Tg N yr −1 , Müller and Stavrakou, 2005, 40.3 Tg N yr −1 , Jaeglé et al., 2005, 45.4 Tg N yr −1 , Miyazaki et al., 2012).Differences in analysis years and the focus on only two months may primarily contribute to the difference in NO x emission estimates.The NO x emissions are generally larger over industrial areas in winter and over soil/desert areas in summer than in other seasons; this may also contribute to the larger NO x emissions estimated from this study compared to the annual mean emissions.Meanwhile, the comparison against the January and July mean a priori emissions (44.7 Tg N yr −1 ) and the newer inventories (40.4 Tg N yr −1 ) implies general underestimations in the emission inventories.On the regional scale, the 11.0 Tg N yr −1 estimated over East Asia (80-150 • E, 10-50 • N) for July 2007 from OMI observations (Zhao and Wang, 2009) is comparable to our estimates of 10.2 Tg N yr −1 .The 0.465 TgN estimated over the Eastern United States (102-64 • W, 22-50 • N) from the OMI observations for March 2006 (Boersma et al., 2008a) is also comparable to our estimate of 0.485 TgN.

CO emissions
Because of the long chemical lifetime of CO in the troposphere, the CO emission inversion requires an assimilation cycle with a long assimilation window (i.e. by using the 4D-Var assimilation technique (e.g.Hooghiemstra et al., 2011)) in order to obtain enough constraints from observations.The CO emissions estimated from this study, based on one-month calculation, may not have been sufficiently constrained by the observations.Further, the simultaneous data assimilation corrects the CO concentrations from the MO-PITT data obtained at the 9 pressure levels and at the surface, whereas the emissions are optimized using the data only obtained at 700 hPa (see Sect. 2.1.3).Consequently, the simultaneous data assimilation system can start by adjusting concentrations, and then the emissions will adjust more slowly, depending on the averaging kernel profile and the DOFs.The estimated CO emissions are thus more strongly constrained by the observations in the emission inversion run than in the full assimilation run.Therefore only the CO emissions estimated by the emission inversion system are presented, as depicted in Fig. 14 and listed in the brackets in Tables 6 and 8.
CHASER shows a large underestimation in the simulated CO fields in the northern extratropics, as commonly revealed by many CTMs (Shindell et al., 2006).The underestimated CO fields might be mostly attributed to an underestimation of the surface CO emissions along with an overestimation of OH.Correspondingly, the assimilation of MOPITT data largely increases the surface CO emissions in the northern extratropics both in January (+66 %) and July (+25 %).The large increase in the CO emissions are mainly attributed to the increase over industrial areas, especially over Eastern China in the both seasons with a factor of 2-3.The large positive increment is consistent with the results of Arellano et al. (2004), who showed that anthropogenic emissions over Asia are too low in EDGAR v3.2.The decreased emissions over North Africa and the increased emission over Australia and South Asia (especially in January) are also consistent with recent estimates (Jones et al., 2009;Fortems-Cheiney et al., 2011).The large increments obtained for Central and North Africa indicate a large uncertainty in biomass burning in the GFED2 inventory, as similarly suggested by Kopacz et al. (2010).The larger emissions in winter than in summer in the US, Europe, and East Asia are also commonly revealed from recent inversions, which could be due to a combination of emissions from residential heating and vehicle cold starts (e.g.Kopacz et al., 2010).The newer inventories show lower emission values than the a priori emissions over Europe, whereas the data assimilation further increases the emissions from the a priori emissions.In contrast, the a posteriori emissions are significantly larger than both the a priori emission and the newer inventories over Eastern China.Over Eastern United States, the data assimilation decreases the emissions; however, the newer inventories show even lower emissions.These results imply different error characteristics in the different bottom-emission inventories.
Our a posteriori January and July mean estimate for the surface CO emission is 1393 Tg yr −1 , which is about 20 % higher than the a priori emissions, mainly due to increased emissions by up to 60 % in the Northern Hemisphere in January.This is within 10 % of the results from previous estimates of 1342-1502 Tg CO yr −1 (Arellano et al., 2004;Arellano and Hess, 2006), 1390 Tg CO yr −1 (Hooghiemstra et al., 2011), 1391 Tg CO yr −1 (Pison et al., 2009), 1393 Tg CO yr −1 (Kopacz et al., 2010), 1440 Tg CO yr −1 (Jones et al., 2009), 1504 Tg CO yr −1 (Fortems- Cheiney et al., 2011).The a posteriori emissions are also much larger than the newer inventories (with the January and July mean global emission of 892 Tg CO yr −1 ).

Lightning NO x sources
The data assimilation provides strong constraint on the magnitude and the distribution of LNO x .The global LNO x amount is increased from 4.4 to 5.2 Tg N yr −1 in January and from 5.7 to 7.3 Tg N yr −1 in July.The estimated emissions are within the range of the annual global LNOx source of 5 ± 3 Tg N yr −1 by Schumann and Huntrieser (2007).The large increase in July corresponds to the significant increase over the Eurasian continent, North America, Southeast Asia, the tropical South America, and Central Africa.The data assimilation also changed the vertical profile of the LNO x sources both in the tropical and extratropics.The large changes in the three-dimensional distribution of LNO x obtained from the data assimilation indicate that the Price and Rind (1992) lightning parameterization used in the simulation does not fully capture the observed distribution of lightning activity, as also suggested by Allen and Pickering (2002).In particular, the data assimilation generally increases LNO x in the upper troposphere both in the tropics and the extratropics.This suggests that the C-shape vertical profile of lightning NO x assumed in the parameterization may place too much mass near the surface and too little in the middle troposphere, as also suggested by Ott et al. (2010).Especially, TES O 3 and OMI NO 2 data provided particularly strong constraints on LNO x sources.However, a lightning signal in satellite observations of NO 2 columns is often obscured by the high contributions from (anthropogenic) boundary layer pollution and biomass burning (Martin et al., 2002(Martin et al., , 2007;;Boersma et al., 2005).Also the increase of the NO/NO 2 ratio with height in the troposphere reduces the relative sensitivity to lightning produced NO 2 .The lightning signal is also almost comparable to the measurement uncertainty for tropospheric NO 2 .Thus, further careful considerations are required for LNO x estimates, which will be discussed in a separate study.

Relative importance of the emission and concentration optimization on the tropospheric O 3 analysis
As a result of the simultaneous optimization of the emissions and the concentrations, the global tropospheric O 3 burden, which is calculated for the region below the tropopause height determined from the vertical temperature gradient (−2 K km −1 ) in the model, is decreased by 2.5 % in January and is increased by 5.6 % in July in the full assimilation run (Table 9).The obvious increase in July, from 346.8 to 366.1 Tg O 3 , is almost equally attributed to the enhanced emissions of the precursors and the direct adjustment to the concentration fields.The simultaneous optimization provides important contributions to the tropospheric ozone budget analysis, by optimizing its precursor emissions and reducing model errors while taking the chemical feedback into account.
Emission inversion -Control Emission-fixed assim -Control Figure 15 shows the relative importance of the emission inversion and the direct concentration assimilation on the vertical O 3 profiles.The emission inversion largely changes the O 3 profiles in the PBL, especially below 900 hPa.This demonstrates the importance of optimizing O 3 precursors fields in correcting the near surface O 3 .The obvious impact in the PBL, with a mean difference of up to 15 %, is found in the tropics and at northern mid-latitudes in July, associated with changes in biomass burning and anthropogenic emissions, respectively.The regional differences are more pronounced over the northern mid-latitudes polluted regions, central Africa, and south America in July, with a maximum difference of 30 % (figure not shown).Even in the free troposphere, the O 3 analysis is significantly affected by the emission changes through vertical transport of O 3 and its precursors.However, the direct concentration adjustment dominates the changes in the O 3 profiles in the free troposphere in the combined data assimilation.The simultaneous adjustment of the emissions and the concentrations is thus a powerful approach to optimize the whole tropospheric O 3 profiles.
The sum of these two individual effects mostly explains the difference between the full assimilation run and the control run (figure not shown).The O 3 changes in the tropical troposphere are an exception to this rule in that the changes estimated from the sum of these two individual effects are slightly (∼15 %) larger than the changes estimated from the full assimilation run.This may indicate too large emission adjustments and resultant O 3 productions in the emission inversion run.The spatial pattern of the changes in the O 3 profiles obtained from emission inversion run and the fixed-emission assimilation run is very different (figure not shown), confirming the independent adjustments realized from the emission and concentration optimizations.

Uncertainties
The EnKF data assimilation provides information about the uncertainty of the analysis.The ensemble spread, estimated as the standard deviation of the simulated concentrations across the ensemble, is a measure of the analysis uncertainty (e.g.Arellano et al., 2007).The uncertainty in the a posteriori fields represented by the analysis spread is reduced if the analysis converges to a true state.This spread is caused by errors in the model input data, chemical or physical parameters, parameterizations, the numerical scheme as well as errors in the measurements assimilated (Boynard et al., 2011).
Figure 16 shows the distributions of the analysis spread for O 3 , CO, and NO 2 .The analysis spread typically shows a reduction of the analysis errors due to effective (high quality, high sensitivity, good coverage) observations and an increase due to error growth as represented by the ensemble model forecast and the covariance inflation.Near the surface, the analysis spread of O 3 and CO is generally smaller in the tropics than in the extratropics, corresponding to the latitudinal dependence of TES O 3 and MOPITT CO retrieval sensitivities, respectively.The vertical profile shows that the analysis spread is effectively reduced in the middle troposphere, reflecting the maximum sensitivity of the retrievals at these altitudes.Within the free troposphere, the O 3 analysis O 3 , 700 hPa [ppb] NO 2 , Sfc [ppt] CO, 700 hPa Analysis ensemble spread spread is relatively high in the tropical upper troposphere.The OSEs showed that the assimilation of MLS HNO 3 data acts to increase the O 3 analysis spread in the tropical upper troposphere during the forecast, through its influence on the NO y species fields during the analysis and because of its large observation errors.In the extratropical upper troposphere and around the subtropical jet streams, the downward propagation of the well constrained O 3 due to the assimilation of MLS O 3 data helps to reduce the analysis spread.The CO analysis spread is maximum at the northern mid-latitudes near the surface, related to large uncertainties in the analyzed CO emissions.The analysis spread of NO 2 is closely related to the emissions near the surface, while it also has strong latitude-vertical dependence.The analysis spread is generally maximum in the upper troposphere, which is a result of the low concentrations which are not well constrained by the OMI data.The large analysis spread in the southern extratropics is related to large relative observation errors of OMI NO 2 data related to the low concentrations.These large analysis spreads indicate requirements for further constraints from additional observations or higher quality data.
The assimilation system can also be used to diagnose model and/or observation errors.We use the difference between analysis and forecast, the so-called analysis increment, to represent short-term systematic errors in the model (Fig. 17).By assuming that the assimilated fields approximate the assimilated data after several assimilation cycles, the averaged analysis increment primarily relates to the persistent model bias.The increment thus represents the adjustment made in the analysis step to bring the model closer to the observations, and the spatial distribution of the averaged increments shows where the model fields are frequently adjusted by the data assimilation.The positive increments obtained for O 3 and CO in the extratropical lower troposphere imply that CHASER tends to underestimate those concentrations in these regions compared to the assimilated data.Positive increments of O 3 are frequently observed over the northern Eurasian continent, around North America, and over the Southern Ocean.The OSEs confirms that the positive O 3 analysis increments in the lower and middle troposphere are due to the TES O 3 data, while the upper tropospheric negative increments are due to both the TES and MLS O 3 data.This implies that the model bias strongly varies with height, because of different contributions of transport and chemical processes.The data assimilation also tends to increase CO over East China and North America near the surface, as similarly shown by Elguindi et al. (2010), whereas it decreases CO in the tropics.The negative NO 2 increments in the tropical and high-latitudes troposphere are associated with the assimilation of very small or negative OMI NO 2 concentrations mainly over the oceans.The large positive NO 2 increments obtained for the extratropical UTLS reflects the fact that the assimilation of MLS O 3 and HNO 3 data tends to compensate for the model underestimation through the inter-species correlation.The knowledge of the model error structure is useful to identify sources of the model error.Geer et al. (2006) showed that the enhanced skill of the best performing analysis can usually be attributed to better modeling.Analysis increment Fig. 17.Same as Fig. 16, but for the analysis increment.Lower panels show the latitude-pressure distribution of the percentage ratio of the zonal mean analysis increment to the zonal mean analysis ensemble mean concentration.The red (blue) colour indicates relatively positive (negative) values.

Conclusions
An advanced data assimilation system for tropospheric chemical compositions, the CHASER-DAS, is developed based on the CHASER model and the LETKF scheme.The data assimilation system is applied to integrate observation information obtained from multiple satellite measurements, namely, NO 2 data from OMI, O 3 data from TES, CO data from MOPITT, and O 3 and HNO 3 data from MLS.The data assimilation provides multiple constraints on tropospheric composition and allows us to simultaneously optimize the atmospheric distributions of various chemical compositions together with the emissions of O 3 precursors (NO x and CO) while taking their chemical feedbacks in the CO-OH-NO x -O 3 system into account.In the simultaneous data assimilation system, improved atmospheric concentrations of chemically-related species have the potential to improve the emission inversion, while the improved emissions estimates will benefit the atmospheric concentration analysis through a reduction in the model forecast error.A covariance localization technique is applied to neglect the covariance among non-related or weakly-related variables which may suffer significantly from errors in the ensemble sampling and the forecast model.
The improvement obtained by the assimilation demonstrates that multi-species data assimilation provides valuable information on various chemical fields.The OmF analysis confirmed significant error reductions for both bias and RMSE from the data assimilation.The standard deviation around the mean of the OmF is generally comparable to the observation error, indicating that the data assimilation is successfully performed.Significant reductions of both bias (by 85 %) and RMSE (by 50 %) against independent data sets for various chemical fields show that multi-species data assimilation is a very effective way of combining observation information and compensating for systematic model errors.The improvements include enhanced tropospheric NO 2 columns over industrial areas (with a global mean bias reduction of 40-85 %), especially over China, reduced positive O 3 bias in the middle and upper troposphere (by 60 %), reduced negative CO bias in the Northern Hemisphere in the lower troposphere (by 40-90 %), especially over East Asia and North America, and a reduced negative HNO 3 bias in the extratropical UTLS (by 70-85 %).Comparisons against ozonesonde and aircraft data confirmed improvements in the vertical profiles of O 3 and its precursors in the free troposphere and the UTLS through the data assimilation.The data assimilation removes most of the bias from the middle troposphere to the lower stratosphere against ozonesonde data, from 30-40 % to within 10 %.The results confirm that the assimilated satellite data have highly valuable information about the tropospheric chemical processes, although further improvements are required for the lower tropospheric processes.
OSEs have been conducted to quantify the relative importance of each data set on constraining the emissions and concentrations.The assimilation of each individual dataset has a strong influence on both assimilated and non-assimilated species through the use of inter-species error correlations and through the chemical model.For instance, the assimilation of upper tropospheric O 3 and HNO 3 obtained from MLS was useful to reduce the bias in the tropospheric NO 2 columns.Comparisons against independent ozonesonde data showed that both MLS and TES O 3 data largely improve the O 3 profiles in the free troposphere and the UTLS.Note that all the assimilated data contribute to the global mean O 3 bias reduction compared to ozonesonde data in the middle troposphere (between 750 and 450 hPa) in July, through their influences on various chemical states that affect O 3 variations.Especially this last result demonstrates the strength of the simultaneous assimilation of multiple datasets for different species.These inter-species influences can be tightly associated with the changes in OH.The simultaneous assimilation increased tropospheric OH concentrations in July by 5-15 % in the tropics and the Southern Hemisphere mainly due to the assimilation of OMI NO 2 and TES O 3 data, respectively.The large improvement in July may be related to summertime active chemical processes in the Northern Hemisphere.
In comparison to the a priori emissions based on bottomup inventories (EDGAR3.2+GFED2.1+REAS1.1),the optimized emissions of both NO x and CO are generally higher over most industrial areas, especially in the northern midlatitudes, implying that the emission inventories underestimate sources.The NO x emissions estimated from the simultaneous data assimilation are different from those from the emission inversion system in which only the emissions are optimized from observations.The results indicate a large uncertainty in the a posteriori NO x emissions due to model errors when estimating from NO 2 data only, with an uncertainty of up to 40 % over industrial areas and up to 30 % over biomass burning areas, as measured by the impact of the concentration assimilation on the a posteriori emissions.The simultaneous assimilation of multiple chemical observations is very useful to represent the chemical processes in a realistic way by removing model errors, and it provides important effects upon the emission inversion.The CO emissions estimated in this study may not have enough constraint from observations, because the calculation period is too short and the observational information is insufficient.Nevertheless, comparison of our results to previous inverse modeling studies (e.g.Kopacz et al., 2010) is very encouraging.The uncertainties in the a priori emissions, based on an extrapolation of year 1995 and 2000 inventories, caused large increments especially over anthropogenic source areas.The data assimilation also increases the lightning NO x sources over land, especially in boreal summer, indicating that the lightning parameterization used in the simulation has a large uncertainty.
As a result of the simultaneous optimization, the tropospheric O 3 burden is increased by 5.6 % in July, with almost equal contributions from the emission optimization and the direct adjustment to the concentration fields.The emission optimization dominated the changes in the O 3 profiles in the PBL in the tropics and at northern mid-latitudes, whereas the direct concentration adjustment was much more important in the free troposphere.This reveals the importance of the si-multaneous adjustment of the emissions and concentrations for the tropospheric ozone budget and profile analyses.

Discussions: future challenges
The CHASER-DAS provides valuable information for the future development of both models and observations.The ensemble spread can be a measure of the analysis uncertainty.The observed large analysis spreads for O 3 and NO 2 in the tropical upper troposphere and near the surface indicate a requirement for further constraints from additional observations or high quality data to improve the analysis.The analysis increment obtained during the data assimilation cycle primarily relates to persistent model biases.The positive analysis increments obtained for O 3 and CO imply that CHASER tends to underestimate (overestimate) O 3 (CO) concentrations in lower/middle troposphere and tends to overestimate (underestimate) them in the upper troposphere.This information is useful to identify sources of the model error and improve the performance of both model and data assimilation.The large analysis spreads and increments near the surface also indicate a requirement for better emission data sets.
The simultaneous assimilation of multiple satellite datasets is an important development for improving chemical weather forecasting (e.g.Kaminski et al., 2008) and better understanding the processes controlling the atmospheric environment.However, further developments are still required.First, more observation data are required to constrain O 3 and its precursors, especially near the surface.Retrieval sensitivity to the lowermost troposphere is critical for the emission inversion and the near surface air quality analysis.For instance, adding the near infrared (NIR) channel to the MO-PITT retrieval increases the near surface sensitivity (Deeter et al., 2010), which may help to improve the analysis, while the IASI retrievals may contain information on the spatial extent of plumes (Coheur et al., 2009).Also, the emissions of O 3 precursors other than NO x and CO, such as VOCs, have a pronounced influence on tropospheric chemistry.Further constraints are required for these fields; in particular, satellite CH 2 O data may provide a significant constraints on VOCs emissions.Apart from the lower tropospheric observations, high quality satellite observations in the UTLS are needed for O 3 and HNO 3 , potentially augmented with CO and PAN as well as short-lived gases such as NO 2 (ESA, 2012).Second, the model resolution is too coarse to describe accurately small scale processes.A chemical data assimilation requires observations with sufficient spatial and temporal resolution to capture the heterogeneous distribution of tropospheric composition.In order to better take into account the small scale information available in the dataset, it is important to increase the model resolution close to the data set's resolution, as suggested by Pajot et al. (2011) and demonstrated using regional data assimilation systems (e.g.Hanea et al., 2004).In addition, the combined use of satellite and surface in-situ data may provide strong constraints on the near surface analysis at high resolution.Third, introduction of a reasonable bias correction scheme is important to improve the analysis, especially when multiple data sets are simultaneously assimilated (e.g.Dee, 2005).

System ability check based on synthetic observations
It is of great interest to test the ability of the data assimilation system to improve the O 3 analysis in the presence of an emission error.We conducted an idealized data assimilation experiment, the so-called twin experiment (e.g.Ghil et al., 1991), by perturbing both the initial condition and the NO x emission.The purpose of this experiment is to demonstrate that the data assimilation is properly implemented and quantify how the emission optimization influences the O 3 analysis, as similarly performed by Constantinescu et al. (2007) and Messina et al. (2011).Under the assumption of the perfect model scenario (i.e. a forecast model provides a perfect representation of the atmosphere), the actual background error (P b without model errors Q) and observation error (R) statistics can be determined precisely, so that the perfect model experiment allows us to demonstrate the importance of the data assimilation without unexpected model and observation errors.
A time series of a reference solution (or true state) for O 3 and NO 2 fields was generated by the simulation (without any assimilation) of the CTM using unperturbed emissions (i.e. the a priori emissions used for the real data assimilation).The reference solution was used to obtain artificial observation data and initial conditions for ensemble simulations and to validate the analysis.The artificial observation data were obtained from the true state, with the addition of zero-mean Gaussian random noise as observation errors with standard deviations of 10 % of the reference concentration.It was assumed that observation stations were located at 6.25 % (12.5 %) of the model grid points for O 3 (NO 2 ) in the horizontal; the vertical partial column with 3 km resolution was assimilated every 6 h.The state vector includes O 3 concentration and NO x emissions.The O 3 data was used to update the O 3 concentration, while the NO 2 data was used to update NO x emissions.The number of the assumed observations was larger than that in the real world, and it will affect the data assimilation performance.However, this idealized setting helps to demonstrate the ability of the system with enough constraints from observations.The simulated fields on 1 November 2007, were used in the initial assimilation cycle, and the analysis for 7-8 November was evaluated.The background error covariance for the initial assimilation cycle was obtained from the the lagged average forecast (Hoffman and Kalnay, 1983); the initial ensemble concentration fields were obtained from the reference simulation during 28 October to 4 November 2007.
Because of the biased emissions (constructed based on the annual mean a priori emissions), the model simulation without data assimilation has large errors in the simulated O 3 fields in the lower troposphere.The mean O 3 RMSE normalized by the background concentration averaged over 10 • S-50 • N latitudinal bands at 950 hPa is 23.7 % for the model simulation, which is almost the same as the initial error of 25.0 %.The assimilation of O 3 data reduces it to 16.5 %.The assimilation of NO 2 data helped to improve the O 3 analysis by reducing the errors included in the O 3 simulation due to biased NO x emissions; the normalized O 3 RMSE is 14.2 % with a regional mean NO x emission bias (RMSE) reduction of 41 (30) %.The assimilation of both O 3 and NO 2 data provided the best performance analysis, with a normalized O 3 RMSE of 11.7 %, which is almost equivalent to the assumed observation error (i.e. 10 %).In contrast, in the free troposphere (e.g. at 500 hPa), assimilation of O 3 data provided a much more significant improvement of the O 3 analysis than that provided by the NO 2 data.These results confirm that the simultaneous optimization for O 3 concentration and its precursors emissions is a powerful framework for the tropospheric chemistry analysis.

Fig. 2 .
Fig. 2. Schematic diagram of the correlation matrix between observations and the state variables.Satellite data used for the data assimilation are listed in the left column.The model variables updated during the data assimilation are listed in the top row.The blue (gray) colour indicates that correlations between the observed variables and the model variables are considered (neglected using the variable localization technique).See Sect.3.3 for details.

Fig. 3 .
Fig.3.Correlations between species in the background error covariance matrix, estimated from the LETKF ensemble at 950 hPa (left) and 500 hPa (right) averaged over 15-20 July 2007.The global mean of the covariance estimated for each grid point is plotted.The matrix includes concentrations of all the predicted species, surface NO x emission (NO x -emi.), surface CO emissions (CO-emi.), and lightning NO x sources (LNO x ).O x is the sum of O 3 and O( 1 D), and NO x is the sum of NO, NO 2 , and NO 3 .The red (blue) colour represents positive (negative) correlations.

Fig. 4 .
Fig. 4.Latitudinal distributions of the effect of data assimilation on the mean concentration of OH, averaged between 800 and 550 hPa for 16-30 January 2007 (left) and16-30 July 2007 (right).The percentage difference of the zonal mean concentration, averaged between 800 and 550 hPa, between the assimilation runs and the control run is shown for six different assimilation runs; the full assimilation run (with all the data, black) and the five OSEs with TES O 3 data (marble), OMI NO 2 data (light blue), MOPITT CO data (green), MLS O 3 data (red), and MLS HNO 3 data (yellow).A positive (negative) value indicates that the assimilation run has a higher (lower) concentration than the control run.

Fig. 7 .
Fig. 7.Latitudinal distributions of the mean OmF (upper panels) and its standard deviation around the mean (lower panels) estimated in the observation space for each assimilated data set, averaged over the period 16-30 January 2007.The results are shown for the data assimilation run (red) and the control run (blue).

Fig. 8 .
Fig.8.Global distributions of the tropospheric NO 2 columns (in 10 15 molec cm −2 ), averaged over the period 16-30 January 2007.The results are shown for OMI (left columns), SCIAMACHY (middle columns), and GOME-2 (right columns).Upper rows show the tropospheric NO 2 columns obtained from the satellite retrievals (OBS); centre rows from the control run (Cntl); and lower rows from the data assimilation run (Assim).The averaging kernel of each retrieval is applied to the control run and data assimilation fields.The red (blue) colour indicates relatively high (low) values.

Fig. 9 .
Fig. 9. Global distributions of the tropospheric O 3 columns (in DU) and O 3 mixing ratio (in ppb), averaged over 16-30 July 2007.The results are shown for OML/MLS O 3 columns (left columns), TES O 3 mixing ratio at 300 hPa (middle columns), and MLS O 3 mixing ratio at 215 hPa (right columns).Upper rows show the satellite retrievals (OBS); centre rows from the control run (Cntl); and lower rows from the data assimilation run (Assim).The red (blue) colour indicates relatively high (low) values.

Fig. 10 .
Fig.10.Global distribution of the CO mixing ratio (in ppb) and HNO 3 mixing ratio (in ppb), averaged over 16-30 July 2007.The results are shown for MOPITT CO mixing ratio at 500 hPa (left columns), TES CO mixing ratio at 700 hPa (middle columns), and MLS HNO 3 mixing ratio at 215 hPa (right columns).Upper row shows the satellite retrievals (OBS); centre row from the control run (Cntl); and lower row from the data assimilation run (Assim).The red (blue) colour indicates relatively high (low) values.

Fig. 12 .
Fig. 12.The mean relative difference of the vertical O 3 profiles between ozonesondes and the data assimilation with (red dashed) and without (red solid) the bias correction for TES O 3 data during 7-30 January 2007 (left) and 7-30 July 2007 (right).

Fig. 15 .
Fig. 15.The latitude-pressure distribution of the relative difference of zonal mean O 3 mixing ratio (in %) between the emission inversion run and the control run (left) and the emission-fixed assimilation and the control run (right) averaged over 16-30 July 2007.The red (blue) colour indicates relatively high (low) values in the inversion/assimilation run.

Fig. 16 .
Fig. 16.Analysis ensemble spread of O 3 (left), CO (centre), and NO 2 (right) averaged over 16-30 July 2007.Upper panels show the global distribution at 700 hPa.Lower panels show the latitude-pressure distribution of the percentage ratio of the zonal mean analysis ensemble spread to the zonal mean analysis ensemble mean concentration.The red (blue) colour indicates relatively high (low) values.

Table 1 .
List of satellite observations used for the data assimilation.

Table 2 .
List of observations used for the validation.

Table 3 .
List of ozonesonde stations used for the validation.

Table 4 .
The performance of the data assimilation for different parameters: the horizontal localization length (loc) and the ensemble number (ens).Ten-day mean (averaged over 20-30 January 2007) global mean RMS innovation of the OmF for each assimilated data are shown.The control (CTL) simulation was conducted with loc = 450 km for NO x emissions and with 600 km for CO emissions, lightning NO x , and the concentrations, and ens = 48.The simulations with different loc values were conducted with ens = 48.The smallest RMS innovation for each comparison is shown in bold.

Table 5 .
Comparisons between the data assimilation run and the satellite retrievals.The results are obtained from 15-day averages (from the 16-30 of each month) for January and July in 2007.Shown are the global spatial correlation (Corr), the global mean difference (Bias), and the global root-mean-square error (RMSE).The model simulation results (without data assimilation) are shown in brackets.

Table 9 .
The 1-day average (on the 19th of each month in 2007) global tropospheric O 3 burden (Tg O 3 ) obtained from the control run, the emission inversion run, and the full data assimilation run.