Pm 10 Data Assimilation over Europe with the Optimal Interpolation Method

This paper presents experiments of PM 10 data assimilation with the optimal interpolation method. The observations are provided by BDQA (Base de Données sur la Qualité de l'Air), whose monitoring network covers France. Two other databases (EMEP and AirBase) are used to evaluate the improvements in the analyzed state over January 2001 and for several outputs (PM 10 , PM 2.5 and chemical composition). The method is then applied in operational-forecast conditions. It is found that the assimilation of PM 10 observations significantly improves the one-day forecast of total mass (PM 10 and PM 2.5), whereas the improvement is non significant for the two-day forecast. The errors on aerosol chemical composition are sometimes amplified by the assimilation procedure, which shows the need for chemical data. Since the observations cover a limited part of the domain (France versus Europe) and since the method used for assimilation is sequential, we focus on the horizontal and temporal impacts of the assimilation and we study how several parameters of the assimilation system modify these impacts. The strategy followed in this paper, with the optimal interpolation , could be useful for operational forecasts. Meanwhile, considering the weak temporal impact of the approach (about one day), the method has to be improved or other methods have to be considered.


Introduction
State-of-the-art models, in meteorology or in air quality, reasonably approximate the atmospheric state (meteorological fields and chemical composition).However, there are still Correspondence to: M. Tombette (tombette@cerea.enpc.fr) a lot of uncertainties in modeling atmospheric components (Hanna et al., 1998;Mallet and Sportisse, 2006), in particular aerosols (Roustan et al., 2008), leading to substantial discrepancies with observational data.
Data assimilation (DA hereafter) makes use of observations in order to reduce the uncertainties in input data such as initial conditions or boundary conditions.In some cases, especially for air quality modeling, the purpose may be the evaluation of the emissions fluxes, and not necessary the improvement of the forecast itself (Quélo et al., 2005;Chang et al., 1997).This defines the so-called inverse modeling issue, which is not addressed in this paper.
DA could be applied with different objectives: to produce an analysis, in other words to compute a field as close as possible to the "true" state; to improve the initial conditions in order to improve the forecasts; or to identify uncertain parameters, such as the emission fluxes.
DA is a relatively recent research field in atmospheric chemistry (Austin, 1992;Fisher and Lary, 1995;Riishojgaard, 1996), even if it has been widely applied in meteorology.However, numerous studies were carried out for specific gases and with measurements of diverse nature: in-situ, airborne, satellites.Ozone columns are assimilated with an OI approach in Jeuken et al. (1999) and Planet (1984), and with a 4D-Var approach in Riishojgaard (1996).In Levelt et al. (1998), ozone and carbon monoxide profiles are assimilated within the Mozart model.Elbern et al. (1997), Elbern et al. (2007) and Segers (2002) present several assimilation studies of terrestrial data with a 4D-Var approach Published by Copernicus Publications on behalf of the European Geosciences Union.and a sequential approach for the latter.Wu et al. (2008) compare four assimilation methods for assimilation of ozone ground measurements: OI, ensemble Kalman filter, reducedrank square root Kalman filter and 4D-Var.
Aerosol models are now a component of most of available chemistry-transport models (CTMs).Performance obtained for PM 10 with various models could be disappointing in comparison to the performance obtained for gas species.For example, the correlation of simulated versus observed concentrations rarely exceeds 50% for hourly PM 10 over Europe (Van Loon et al., 2004), and the Root Mean Square Error (RMSE) is of the order of the concentration values, i.e. 10 µg m −3 .On the contrary, ozone peaks forecasts, for example, show correlations that may exceed 70% or 80% and RMSEs around 20% of the concentrations.Meanwhile, performance obtained with purely statistical models for PM 10 , which are mainly based on observational data, is much more impressive: the correlations in Hooyberghs et al. (2005) are approximately 70%.It is therefore relevant to investigate data assimilation with physical models for aerosols, in order to take into account the information contained in measurements.In this paper, the aerosol model is SIREAM -SIze-REsolved Aerosol Model, Debry et al. (2007) -embedded in the POLYPHEMUS platform (Mallet et al., 2007).The assimilation method is partially determined by the constraints originated from the thermodynamic model -ISORROPIA, Nenes et al. (1998).Indeed, there is currently no adjoint model for SIREAM.First, the thresholds set by ISORROPIA, which is a discontinuous model because of its phase transitions, raise a theoretical problem : how to define the derivatives at the discontinuity points?Second, the code of ISORROPIA has not been written so that it may be automatically differentiated.A variational approach is thus not practical at the moment.
Aerosol measurements seldom correspond to model state variables -which, for a size-resolved model, give the size distribution of the aerosol chemical composition.They are often aggregated data (PM 10 , PM 2.5 ) or optical data (extinction coefficient, optical thickness).It is therefore not straightforward to assimilate this data.A reason for the choice of assimilating PM 10 data in this study is that the networks giving PM 10 data are the most widely extended.Moreover, since policies about particulate matter have mainly focused on PM 10 up to now, it is important to accurately forecast its concentrations, primarily in urban areas.For example, in Europe, the limit is set to a maximum of 35 days per year during which PM 10 exceeds 50 µg m −3 (directive of the European Commission 1999/30/CE).However, the numerical models sometimes miss important events, e.g., because of the lack of description of some emission sources, like for example wildfires (Hodzic et al., 2007).DA could be useful in order to compensate for these model deficiencies.
In this paper, we use a simple method for PM 10 DA, namely the OI method.This method was applied for the assimilation of aerosol optical thickness (Generoso et al., 2007;Collins et al., 2001).Variational methods for aerosol assim-ilation have also been investigated, like the 1D-Var method in Huneeus (2007) or the 4D-Var method in Benedetti and Fisher (2007), but for simplified aerosol models.
This paper addresses the following questions: 1. Are forecasts improved after the assimilation of PM 10 hourly observations over Europe?If so, to what time extent?
2. Do the analyses produced by the PM 10 assimilation over a sub-domain (here, France) result in an improvement over the remaining part of the domain (western Europe)?
3. Does the analysis produced by the PM 10 assimilation improve the computed chemical aerosol composition?
The first question has no obvious answer.First, the thermodynamic equilibria between the aerosol and gaseous phases could annihilate the corrections after DA if the gaseous concentrations are not corrected at the same time.Second, the aerosol residence time in the atmosphere is of the order of few days and particles are known for having mostly a regional effect.For this reason, it may be necessary to perform a joint state-parameter estimation.For instance, it would be useful to make corrections on the aerosol emissions.
The second question is related to the timescales of atmospheric transport.It aims to determine whether the improvements in a part of the domain, where most of the measurements are available, can improve the simulation in places where the wind transports the air masses.This could be important for regions where only sparse data is available.The choice of the scale parameters L h and L v see Eq. ( 5) might then be important, since they determine the distance at which the measurements have an influence in the analysis.
The last question deals with the effects of DA on chemistry.Assimilating a total mass only brings a coarse information about chemistry; we cannot, a priori, expect an improvement in the aerosol chemical composition, unless the errors are similar for all the species, which may not be a realistic assumption.

Optimal interpolation
In the OI method, observations, as soon as available, are used to produce an analysis.This analysis is supposed to be a better estimate of the true state: it replaces the current state of the model and it therefore serves as initial conditions for the next model iterations.This procedure is repeated at the frequency of measurements.
The analysis is given by the best estimate (linear and unbiased) in the least-squares sense.The analysis, or analyzed state vector, x a , is solution of the minimization problem with J the following cost function: x b is the background state vector, or a priori state vector (i.e., the PM 10 forecast provided by the model), y is the vector of observations (measured PM 10 ), and H is an interpolation function that maps the state x to the observational data.B and R are the matrices of error covariances, for background and observations, respectively.
Upon minimization and under the assumption of the linearity of H, x a is given by: where K is the so-called gain matrix defined as During the assimilation period, the process is repeated each time observations are available.So, at time h, the assimilation produces an analysis x (h) a , which serves as new model state.Starting from that new state, the model computes a forecast x (h+1) f at time h + 1.This forecast is actually called "background", x b , in previous paragraphs.The assimilation takes place to compute x (h+1) a ; and so on and so forth.

Specification of the error covariance matrices
The specification of the covariance matrices is crucial, because these matrices determine the corrections to be applied to the background field in order to better match the truth.The main parameters are the variances (diagonal terms), but the covariances are also important because they specify how the information should be distributed over the domain.
A first estimate of the background error variances can be obtained by taking an arbitrary fraction of the climatological variance of the field itself.It is also conceivable to take an estimate of the model-to-observation error.More complex methods, like the Hollingsworth-Lönnberg method, exist (Daley, 1993;Hollingsworth and Lönnberg, 1986).
It is impossible to accurately approximate all coefficients of B. We therefore need a simplified representation of the covariances between the grid points.A classical method is the Balgovind approach (Balgovind et al., 1983) according to which the covariance is a function of the horizontal and vertical distances (r h and r v respectively) between the two points of interest, where L h and L v are two homogeneous influence radii and v is the variance estimate.
The covariances between the observation errors are set to zero since the instruments errors are independent; so R should be diagonal.This may not be the case for measurements from the same platform (radiosonde, airborne sensor or satellite), but it is a reasonable assumption for instruments located at different ground stations.We also assume that all stations have the same error variance: R=rI where I is the identity matrix and r the error variance.
The value of the observation error variance r is determined with a χ 2 diagnosis (e.g., Ménard et al., 2000).Let the forecast at time h be x (h) f , and the observation at time h be y (h) .Under the usual assumptions of data assimilation, the departure d (h)  =y (h) −H x (h) f (called "innovation") should have the covariance matrix HBH T  +R.If that is true, the scalar should be equal, on average, to the number of observations (at time h).Let N h be that number of observations.In the previous equations, we assumed that N h is constant since H and R do not depend on time.In practice, N h is not constant because not the same stations will be available at two different times.
In practice, the χ 2 diagnosis should lead to check that χ 2 h /N h is about 1.We determined the value of r so that χ 2 h /N h remains reasonably close to 1.This can be a difficult optimization because the innovations d (h) depend on r.

Redistribution over sections and chemical species
The correction applied to the simulated PM 10 at ground is provided by the OI method.The controlled variable is thus the PM 10 concentrations over the whole horizontal domain, in the first layer only (in the reference configuration) or in more layers.Forecast PM 10 are computed by summing the concentrations of all aerosol species simulated over all sections (size discretization).
After DA, the analyzed PM 10 concentrations are redistributed over the model variables following the initial chemical and size distributions.If (PM 10 ) b and (PM 10 ) a are the PM 10 mass concentrations for the background and the analysis respectively, and P j i b and P j i a are the concentrations of the chemical species j in the section i for the background and the analysis respectively, then: The underlying assumption is that the aerosol relative chemical composition and size distribution are well represented in the model.

Simulations
The aerosol model used in this study is SIREAM, plugged to the chemistry-transport model Polair3D.SIREAM stands for SIze-REsolved Aerosol Model, and is described in details in Debry et al. (2007).SIREAM includes 16 aerosol species: 3 primary species (mineral dust, black carbon and primary organic species), 5 inorganic species (ammonium, sulfate, nitrate, chloride and sodium) and 8 organic species predicted with the Secondary Organic Aerosol Model -SORGAM, Schell et al. (2001).In the usual configuration, SIREAM includes 5 bins logarithmically distributed over the size range 0.01 µm−10 µm.All these models are embedded in the POLYPHEMUS system, available at http://cerea.enpc.fr/polyphemus/ and described in Mallet et al. (2007).
The simulations presented here with and without assimilation are carried out at a continental scale, over Europe, and for one month (January 2001).A previous study, Sartelet et al. (2007), evaluated the model configuration for the year 2001 with comparisons to three databases (also used in this study and described hereafter) and with respect to the performance of other CTMs used in Europe (Chimere, EMEP, ...).Polyphemus shows a tendency to underestimate PM 10 , and to overestimate nitrate concentrations in wintertime.Other models also show similar behaviour.The configuration of the simulations of this paper is essentially the same as in Sartelet et al. (2007).The main characteristics of the configuration are summarized below.
The domain covers the area from 10.75 • W to 22.75 • E in longitude and from 34.75 • N to 57.75 • N in latitude, with a 0.5 • step.There are five vertical layers: 0-50 m, 50-600 m, 600-1200 m, 1200-2000 m and 2000-3000 m.
The meteorological fields are provided by the European Centre for Medium-range Weather Forecast (ECMWF, http: //www.ecmwf.int/products/data/operationalsystem/).The ECMWF raw data for 2001 is that of the 12-h forecast cycles starting from analyzed fields, and this data has a resolution of 0.36 • horizontally, 60 sigma-levels vertically and a timestep of 3 h.
The boundary conditions for aerosol species are interpolated from outputs of the GOddard Chemistry Aerosol Radiation and Transport model -GOCART, Chin et al. (2000) for 2001.
The anthropogenic emissions for gases and aerosols are generated from the EMEP expert inventory for 2001 (available at http://www.emep.int/).
The Regional Atmospheric Chemistry Mechanism -RACM, Stockwell et al. (1997) -is used to simulate chemistry.Aerosol and gases are scavenged by dry deposition, rainout and washout.We take into account coagulation and condensation.Nucleation is not included because the diameters of nucleated particles (typically about 1 nm) are below the lowest diameter bound of the model.Aqueous-phase chemistry inside cloud droplets is also taken into account -Variable Size Resolved Model, VSRM, Fahey and Pandis (2001) and Strader et al. (1998).

Observational data for assimilation and comparison
In this paper, just like in Sartelet et al. (2007) EMEP data is provided only on a daily basis.Hence, in this paper, it is not used for assimilation but for performance assessment.
On the contrary, BDQA data is used for assimilation.The reasons for this choice is that it includes hourly concentrations and that the station characteristics are specified: background (rural), suburban, urban, industrial and traffic.The scale at which a station is representative depends on the station type.In the assimilation procedure, the traffic and industrial stations, generally with high concentrations due to the proximity of sources, have been removed.
Since the characteristics of the AirBase stations are not available, AirBase data will only be used for validation, as the EMEP network.Also note that AirBase compiles several European databases, including BDQA.
Figure 1 shows the locations of the BDQA stations, except the traffic and industrial stations whose high concentrations cannot be represented by our model (at a resolution of 0.5 • ).

Data-assimilation parameters
The covariances for the background errors are assumed to be in Balgovind form, with a scale parameter L h set to 1 mesh cell (about 50 km).The background variance is set to 200 µg 2 m −6 , which derives from a RMSE of about 15 µg m −3 for annual model-to-observation comparisons (Sartelet et al., 2007).
We performed several simulations, over January 2001, with different observation error variances r in order to satisfy the χ 2 criterion -see Sect.2.2.We finally selected r 49 µg 2 m −6 .This value is significantly greater than the variance of the measurement error which is probably less than 25 µg 2 m −6 .With this variance for observation error and with the 200 µg 2 m −6 variance for background, the time evolution of χ 2 h /N h is shown in Fig. 2. On average the ratio χ 2 h /N h is almost 1, as expected.The ratio shows no clear temporal trend, which is also desirable.Nevertheless, there are significant time variations, even if there are 107 observations per hour (on average) and therefore 2568 observations per day -so the samples seem to be large enough for the averaged values to be statistically significant.This suggests that further error modeling would be useful.

Comparisons with other networks
As a first step, we evaluate the improvements due to the assimilation process on other networks than the one used for assimilation.For this purpose, two simulations over one month (January 2001) are carried out: one without DA (Model) and one with DA.For the simulation with DA, every hour, the model forecast is modified by OI, using the data from BDQA stations.This produces a sequence of analyzed states (or analyses), which are thereafter compared to hourly observations.
The statistical measures to evaluate the results are: the Root Mean Square Error (RMSE), the correlation, the Mean Fractional Error (MFE), the Mean Fractional Bias (MFB).Let {o i } i=1,n and {s i } i=1,n be respectively the observed and the simulated concentrations.The RMSE -(in µg m −3 )and the other indicators (dimensionless) are defined as: We first present statistics for the comparison with the Air-Base data.The large number of AirBase stations makes it possible to compute statistics for each country, as shown in Table 1. Figure 3 represents the RMSEs and the correlations with circles whose diameters are proportional to the statistical indicator, e.g., the higher the value of RMSE or the correlation, the larger the diameter of the circle.It is noteworthy that the statistics for PM 10 are globally improved: the global The statistics are clearly better over France: the RMSE decreases by more than 4 µg m −3 and the correlation increases by more than 30%.This is obviously due to the fact that the BDQA stations are located in France.In addition, in France, the BDQA and AirBase networks share many stations.The statistics of most border-or nearby-countries (Belgium, Switzerland, Germany, Great Britain or the Netherlands) are also improved with the exception of the MFB for Great Britain.Countries that are relatively far from France, like Portugal, Poland or Slovakia show no changes in their statistics.Also, the statistics for Spain do not change, certainly because there are few stations in the south-west of France, and the scale parameter L h is not large enough to influence Spain.On the contrary, RMSE, MFE and MFB for Italy are improved, but the correlation decreases by 8%.Indeed, the number of BDQA stations in the south-eastern part of France, which could influence Italy, is significant.This remark raises the question of the distance over which the stations of southern France are representative of PM 10 pollution in the direction toward Italy.It is possible that the Alps constitute for aerosols at ground a high barrier, that the error statistics model does not take into account, with an overestimated L h .
Table 2 shows that DA also improves the statistics for PM 2.5 (4 µg m −3 decrease for RMSE and 26% increase for correlation), which could indicate that the a priori layout over the model bins is relatively reliable.h /N h defined with Eq. 6 in Sect.2.2.In red, the daily average of χ 2 h divided by the daily average of N h .The mean over January of χ 2 h /N h is 1.03.
Table 3 shows statistics for the comparisons with the EMEP data.Most statistics on PM 10 for the simulation with DA (analysis and one-hour forecast) are deteriorated compared to the simulation without DA.Nevertheless, the simulated mean is better with assimilation because DA globally increases the concentrations, and thus lessens the underestimation over all stations.EMEP stations are background stations, whereas the assimilated observations are measured at both background and urban or suburban stations.Some DA updates might not be consistent with background concentrations levels.Besides, there may not be enough stations to draw reliable conclusions.
The statistics for sulfate, chlorine and sodium are slightly better with DA.On the other hand, the statistics for nitrate are worse.For ammonium, the statistics are stable.Actually, the model underestimates the PM 10 over the period, so DA tends to add material to the existing aerosol mass.As the repartition over the chemical species is homogeneous, DA tends to add mass to all species.Then, the species that were overestimated at first, like nitrate and ammonium, are even more overestimated.Sulfate was underestimated, so DA allows its concentration to be greater and the statistics are improved.The overestimation of the nitrate concentration plays a role in thermodynamic equilibrium, by reducing the mass of chlorine that condensates on the particles.The change of the sodium concentrations is essentially due to DA.The number of stations that provide measurements for chemical species is lower than the stations providing PM 10 observations and these stations maybe not taken into account for the PM 10 statistics.It is therefore difficult to draw conclusions about a general behavior.Moreover, the overestimation for nitrate concentrations is specific to winter conditions (Sartelet et al., 2007).Despite this remark, these results highlight the need for more chemical measurements in the DA method presented here.The partitioning in different species could then be corrected by assimilation, while it is constant here.At the moment, without chemical data, a more deeper knowledge of the uncertainties on modeled concentrations for each aerosol component would certainly improve the system.Actually, the repartition after DA could be changed according to tendencies in uncertainties.

Operational forecast
In operational conditions, at time t 0 , only the data for the past is available.It is possible to assimilate the past data over a few days before t 0 .The model results from t 0 to (t 0 +1 day) are called one-day forecasts, the results from (t 0 +1 day) to (t 0 +2 days) are called two-day forecasts, etc.This operation can be repeated every day ("moving window"); one-day forecast and two-day forecast are then available every day.Several five-day DA experiments were carried out: the BDQA data are assimilated every hour during the first three days, after which the model runs freely and produces forecasts for the next two days.The first experiment assimilates data from 1 to 3 January 2001 and forecasts the days 4 and 5 January 2001; the second experiment assimilates data from 2 to 4 January 2001 and forecasts the days 5 and 6 January 2001; and so on.Consequently, one-day forecasts are available from 4 to 30 January 2001, and two-day forecasts are available from 5 to 31 January 2001.Table 4 describes the simulations carried out.
Table 5 summarizes the performance of the model without assimilation, and of the one-day and two-day forecasts, when compared to BDQA observations.It is noteworthy that, as expected, the one-day forecast clearly shows better statistics for PM 10 and PM 2.5 than the simulation without assimilation.The decrease of the RMSE is 1.5 µg m −3 for PM 10 and 1.4 µg m −3 for PM 2.5 , that is, about 10%.The increase of the correlation is more than 10% for PM 10 and PM 2.5 .MFE and MFB are also markedly improved; the improvement in MFE brings the model to satisfy the performance objective of 50% defined by Boylan and Russell (2006) -see also Sartelet et al. (2007).
Since the OI method only changes the initial conditions, the model tends after all to its reference trajectory (without assimilation).Here, the two-day forecasts show a less obvious improvement.Two-day forecasts show slightly better statistics than the free-running model, but the decrease of the RMSE is only 0.2 µg m −3 and the increase of the correlation is 2%.The period after which the corrections due to DA become essentially ineffective are discussed in the next section.
Figures 4, 5, and 6 show the daily evolution (averaged over the period 4 to 30 January 2001) of the RMSE, the correlation and the mean concentrations respectively for the model without assimilation and for the forecast.These figures underline the tendency of the assimilation procedure to be almost ineffective after 24 h of forecast.Actually, after 12:00 UTC, the differences in RMSE and in mean concentration are lower than 1 µg m −3 , and the difference in correlation is about 2%.The fact that DA with the OI method has some influence only during such a short period of time (one day in this experiment) is not only a limit of the OI method.Actually, in air quality models, concentrations are not much influenced by initial conditions.Also, it depends on several parameters -see Sect.6 -and on the pollutant.For ozone, the influence is a bit longer -see Wu et al. (2008).

Sensitivity to DA parameters
Because the OI method only modifies the initial conditions, and not the model itself, one may want to evaluate the time and space scales for which DA affects the results.Tests with different configurations were carried out over a shorter period to estimate the effective time scales of the DA impact.The data is assimilated over a period and then the model is running without assimilation for the remaining days.The aim of these tests is to find out key parameters.The configuration for the present simulations is the same as in the previous section, over the period from 1 to 6 January 2001.As in the previous sections, the simulation without assimilation is compared to the other simulations with DA.For the simulations with DA, hourly data from BDQA stations is assimilated from 1 to 5 January 2001.The forecast starts on the 6 January 2001 at 00:00 UTC.Eight simulations are presented here: the reference test and seven alternatives.The different configurations are summarized in Table 6.
In one experiment, the variance for observation errors is equal to the background variance: if R=r I where I is the identity matrix and r a scalar variance, the ratio α=v/r (with v from Eq. 5) is then equal to 1.Note that the results of the OI method are the same for all pairs (r, v) such that the ratio r/v is constant (hence equal to 1) -the actual value of the variance r=v has no impact.
The Balgovind method is used to represent the horizontal and vertical covariances for the background.The impact of the L h and L v parameters is tested.Horizontally, L h is taken equal to one grid cell (about 50 km) in the reference simulation, and it is changed to two grid cells (about 100 km).Vertically, the reference test performs assimilation only in the first model level (centered at 25 m); hence the results are   independent of L v .Three other tests are carried out: respectively with a vertical L v parameter equal to 200 m with two controlled model levels, 300 m with three controlled levels, and 600 m with three controlled levels.
The redistribution of analyzed PM 10 on chemical species is also investigated.The following cases are considered: -redistribution on all species (default): it is assumed that the model uncertainties are equivalent for all species; -redistribution of the corrections only on primary species: it is assumed that the uncertainties are mainly due to the emissions; -redistribution of the corrections only on inorganic species: it is assumed that the uncertainties are mainly due to the condensation of inorganic species.
The redistribution only on the organic species is not tested.Actually, the mass of low volatile material in aerosols is generally not or only partially measured by the instruments used in automatic stations.In most of PM 10 measurement instruments, the samplings are heated to measure the mass of dry particulate matter.For example, the sample temperature of a TEOM (Tapered Element Oscillating Microbalance), which received the certification of US-EPA, is 50 • C. At this temperature the volatile material, such as organic aerosols, is evaporated (Allen et al., 1997;Smith et al., 1997;Salter and Parsons, 1999;Soutar et al., 1999;Green et al., 2001;Josef et al., 2001;Charron et al., 2003).We assume that the observations slightly depend on organic species and then, we cannot assume that the difference in total mass is only attributed to organic species.Figures 7, 8 and 9 show the time evolution of the RMSE, the correlation and the simulated mean respectively, averaged over BDQA stations for the different tests.Table 7 shows the associated statistics, over the forecast day (6 January 2001).All simulations with assimilation improve the RMSE and the correlation.However, the figures show that the influence of DA lasts no more than a few hours.The RMSEs and the correlations are equivalent for all tests after 6 h.For the simulated mean, the figure shows that there still could be a difference after 24 h.The inorganics simulation is almost the same as the reference test where all species are assimilated (RMSEs and correlations are equal), whereas the primary simulation gives very different results for the simulated mean.In the inorganics simulation, the transfer of the PM 10 changes to the inorganic species takes place essentially in the fine mode (less than 2.5 µm diameter), where the major part of inorganics resides.The PM 10 mass is also located in this mode, so the size distribution of corrected PM 10 will be equivalent in both the reference and the inorganics simulations.For this reason, the scavenging which depends on particle size and which could therefore explain the differences in the PM 10 budgets between the simulations, is not affected.The RMSE for the primary simulation is the highest of the tests with DA.This likely means that, for this case, the errors of the model are mostly due to secondary species, rather than primary species.These errors could then be attributed essentially to physical processes rather than to the emissions description.The simulation where α=1 is close to the simulation without DA, particularly in the evolution of the mean.The simulations where one of the parameters L h or L v for the Balgovind method has changed give similar RMSEs on the BDQA network.However, the differences in the concentration mean remain significant for a longer period if the number of controlled levels and the parameter L v are increased.This shows that DA can partially influence the PM 10 vertical profile; but increasing L v above 200 m has a limited impact.
Figure 10 shows the maps of the absolute differences between the PM 10 fields of the simulations with DA and the same field without DA, averaged over the first forecast day.As expected, the test with L h =2 shows differences in regions that were not affected with the reference assimilation settings.The northern part of Italy is more affected.In the reference assimilation experiment, the most impacted region is in the north of Spain.There, the highest difference is located in one grid cell, but it does not affect the statistics since there is no station in that cell.Besides this point, southern (Marseille region) and south-eastern France are the most impacted regions, showing that the concentrations in these regions are particularly badly reproduced by the model.This may be due to the fact that a few stations in these regions are influenced by the mountains, by the Mediterranean circulation, and by large urban areas.Therefore, it is a region difficult to represent with a continental-scale simulation.
It is noteworthy that the regions impacted by the inorganics and the primary tests are rather different.Actually, the inorganics test shows large differences over marine regions (see west of Corsica).Over marine areas, the changes in thermodynamic equilibria due to DA can be amplified by the presence of sea salt.

Conclusions
This paper shows that PM 10 DA with the OI method may be useful for one-day forecasts.In our tests, a mean decrease of 1.5 µg m −3 for the RMSE and a 10% increase in the correlation are obtained.For longer forecast periods, the statistics are not improved because the concentrations converge too quickly to the trajectory without DA.The crosswise comparison with other networks than the one used for DA allows to evaluate the quality of the analysis for different network types.For example, the EMEP database only contains background stations.On the contrary, the data assimilated in this study includes observations from urban or suburban stations, which have much higher concentrations.The forecasts thus tend to increase the concentrations accordingly, spoiling the statistics for rural stations as those of the EMEP database.
The background error covariances are important components of the assimilation.In this study, we used a simple parameterization with an influence radius L h , supposed to be the same in the whole domain.This method results in worse statistics in regions where topography would require specific decorrelation lengths L h .
The sensitivity tests show that, in this specific study, the uncertainties on the condensation process might be greater than the uncertainties on the emissions.These uncertainties could originate from uncertainties in the concentrations of condensable gas species or in the modeling of the condensation process itself.The OI method applied in this paper could be used in operational mode: the model is already running for real-time forecasts over Europe, in the context of tests on the Prév'air platform (http://www.prevair.org/),and the additional computational cost due to OI is slight.Applying other methods like 4D-Var (providing an adjoint model is available) or an ensemble Kalman filter (providing proper uncertainty information is available on inputs) would be much more demanding in terms of computational resources, and it would not necessarily improve the data assimilation efficiency if the state is to be solely controlled (for ozone, see Wu et al., 2008).
For future works, the following studies could be initiated: 1. Using more sophisticated methods to build the background covariance matrix, such as methods based on statistical studies of the simulated fields e.g. the Hollingsworth-Lönnberg method, Daley (1993); 2. Implementing inverse methods in order to improve the quality of input data (emissions) and/or parameterizations; 3. Assimilating observations of gases that are seldom measured but important for the formation of secondary inorganic species, like nitric acid (HNO 3 ) or ammonia (NH 3 ); 4. Assimilating observations of the aerosol chemical composition (nitrate, sulfate, ammonium, primary and organics); the bias existing for some species could then be lowered; 5. Assimilating optical data from a lidar network, which could improve the vertical distribution of aerosols and, as a result, improve the persistence of DA impacts over the domain.

Fig. 1 .
Fig. 1.Location of BDQA stations used for PM 10 DA.The background stations are located with a red triangle, the other ones with a black point.The model grid is also shown.

Fig. 2 .
Fig. 2. Time evolution (over January 2001) of the quantity χ 2h /N h defined with Eq. 6 in Sect.2.2.In red, the daily average of χ 2 h divided by the daily average of N h .The mean over January of χ 2 h /N h is 1.03.

Fig. 3 .
Fig. 3. Map of correlations (a) and RMSEs (b) for each country between the simulation and the AirBase observatio simulation without DA (blue points) and for the one-hour forecast after assimilation (red points).The circle diam proportional to the statistical indicator.

Fig. 3 .
Fig. 3. Map of correlations (a) and RMSEs (b) for each country between the simulation and the AirBase observations for the simulation without DA (blue points) and for the one-hour forecast after assimilation (red points).The circle diameters are proportional to the statistical indicator.

Fig. 4 .
Fig. 4. Hourly evolution (averaged over the 04-01-2001 to 30-01-2001 period) of the RMSE for the PM 10 forecast without assimilation (blue line) and for the one-day forecast (green line).

Fig. 5 .
Fig. 5. Hourly evolution (averaged over the 04-01-2001 to 30-01-2001 period) of the correlation for the PM 10 forecast without assimilation (blue line) and for the one-day forecast (green line).

Fig. 6 .
Fig. 6.Hourly evolution (averaged over the 04-01-2001 to 30-01-2001 period) of the mean concentration for the PM 10 observations (red line), for the forecast without assimilation (blue line) and for the one-day forecast (green line).

Fig. 7 .
Fig. 7. Time evolution of the RMSE for the PM 10 forecasts.The vertical line delimits the assimilation period from the prediction period.

Fig. 8 .Fig. 9 .
Fig. 8. Time evolution of the correlation for the PM 10 forecasts.The vertical line delimits the assimilation period from the prediction period.

Fig. 10 .
Fig. 10.Maps of the absolute difference of the PM10 fields (in µg m −3 ).Comparison between the simulation without assimiand the simulation with assimilation for the eight tests over the day 06 January 2001.

Fig. 10 .
Fig. 10.Maps of the absolute difference of the PM 10 fields µg m −3 ).Comparison between the simulation without assimilation and the simulation with assimilation for the eight tests over the day 6 January 2001.
, three databases are used for comparisons: -the EMEP (European Monitoring and Evaluation Programme) database, available on the EMEP Chemical Co-ordinating Centre (EMEP/CCC) web site at http: //www.emep.int/;-the AirBase database, available on the European Environment Agency (EEA) web site at http://air-climate.eionet.europa.eu/databases/airbase/;-the BDQA database ("Base de Données sur la Qualité de l'Air": the French national data base for air quality that covers France).

Table 1 .
RMSE, correlation, MFE and MFB of the simulated PM 10 without and with DA (for the model, the analysis and the one-hour forecast), computed with the observations from the AirBase network.The total is computed over all stations, without distinguishing the country.Countries are: Austria (AT), Belgium (BE), Switzerland (CH), Czech Republic (CZ), Germany (DE), Spain (ES), France (FR), Great Britain (GB), Ireland (IE), Italy (IT), the Netherlands (NL), Poland (PL), Portugal (PT), Slovenia (SI) and Slovakia (SK).Period: 01 January 2001 to 31 January 2001.

Table 2 .
Statistics of the simulation results without and with DA (for the model, the analysis and the one-hour forecast) on AirBase network for different species.Period: from 1 to 31 January 2001.

Table 3 .
Statistics of the simulation results without and with DA (for the model, the analysis and the one-hour forecast) on the EMEP network for different species.Period: from 1 to 31 January 2001.

Table 4 .
Description of the European simulations carried out in operational-forecast conditions and of their outputs that are compared to observations.t 0 is a given day between 3 and 30 January 2001."d" stands for day.

Table 5 .
Statistics of the simulations (model, one-day forecast and two-day forecast) on the BDQA network for PM 10 and PM 2.5 .Period: 4 to 30 January 2001.

Table 6 .
Configuration of DA for the evaluation of the impact of the assimilation parameters on the forecasts.

Table 7 .
Table of the scores of the different tests at BDQA stations for the first day forecast (6 January 2001).