An Ensemble Kalman Filter for severe dust storm data assimilation over China

. An Ensemble Kalman Filter (EnKF) data assimilation system was developed for a regional dust transport model. This paper applied the EnKF method to investigate modeling of severe dust storm episodes occurring in March 2002 over China based on surface observations of dust concentrations to explore the impact of the EnKF data assimilation systems on forecast improvement. A series of sensi-tivity experiments using our system demonstrates the ability of the advanced EnKF assimilation method using surface observed PM 10 in North China to correct initial conditions, which leads to improved forecasts of dust storms. However, large errors in the forecast may arise from model errors (un-certainties in meteorological ﬁelds, dust emissions, dry deposition velocity, etc.). This result illustrates that the EnKF requires identiﬁcation and correction model errors during the assimilation procedure in order to signiﬁcantly improve forecasts. Results also show that the EnKF should use a large inﬂation parameter to obtain better model performance and forecast potential. Furthermore, the ensemble perturbations generated at the initial time should include enough ensemble spreads to represent the background error after several assimilation cycles.


Introduction
Dust storms have drawn much concern during the past two decades for the various impacts on atmospheric environment, biogeochemical cycles, radiative balance and human health. In recent years, many observational programs have been carried out to study Asian dust storms, and much progress has been achieved and improved the understanding of climatic Correspondence to: Z. Wang (zifawang@mail.iap.ac.cn) and synoptic features of soil dust aerosols (Murayama et al., 2001;Mori et al., 2002;Sugimoto et al., 2002;Zhang et al., 2003). On the other hand, in order to provide high spatial and temporal resolution forecasts of Asian dust and reproduce many important observational facts, several numerical models have been developed and used to study the deflation, transport and budget of soil dust over East Asia (Wang et al., 2000;Shao, 2001;Song et al., 2001;Uno et al., 2001;Gong et al., 2003;Shao et al., 2003;Park et al., 2003;Liu et al., 2003;Han et al., 2004). The intercomparison study (DMIP) involving eight dust emission/transport models over Asia found that the model results correctly captured the major dust onset and cessation timing at each observation site. However, the maximum concentrations predicted by each model differed by 2-4 times (Uno et al., 2006), clearly indicating that modeling results of dust storms with these models are significantly model dependent.
Numerical forecasts of dust storms suffer from uncertainties both in initial conditions and in the model itself. Simply comparing model forecasts to observations cannot separate these uncertainties. Using a data assimilation technique we can firstly reduce uncertainty of initial conditions that may lead to improved forecasts and secondly better examine model errors through comparison with observations, since the uncertainty of initial conditions can be maximally reduced by assimilation of observations at initial times. Vautard et al. (2004) investigated the potential of data assimilation of surface ozone concentrations in a chemistry-transport model over the European continent using statistical interpolation and showed that both the analyses and the 1∼2 day forecast are improved. Recently, Yumimoto et al. (2007) applied a four-dimensional variation (4DVAR) to a regional dust model to assimilate NIES Lidar observations over East Asia to inverse Asian dust emissions demonstrating better estimation capability. Niu et al. (2007)   the CUACE/Dust forecast system and showed the capability of short-term forecast improvement. These all indicate the important role of data assimilation to combine observations with modeling in air quality prediction. In these methods, the background error statistics, one of the most important aspects of data assimilation, are usually assumed to be spatially homogeneous, horizontally isotropic, and temporally stationary. This assumption may disagree with the actual errors which may have significant flow-dependence, especially for meso-scale motions. Although the background error statistics can evolve implicitly in 4DVAR during the time window, the complexity of constructing the adjoint matrix and the expensive computation in 4DVAR usually prevent it from common application especially for complicated models. Hanea et al. (2007) used a hybrid Kalman filter algorithm combining the reduced-rank square root (RRSQRT) and the ensemble Kalman filter (EnKF) for ozone simulation in the European Operational Smog (EUROS) atmospheric chemistry transport model. The hybrid algorithm combines the best of both filters with more than 30 model evaluations.
In this study we perform ensemble Kalman filter (EnKF) data assimilation experiments during some severe dust storm episodes in China using surface observations of dust concentrations and a realistic model in order to explore the impacts on forecast skills. The EnKF is an advanced and flexible technique for data assimilation which can calculate flow-dependent statistics from the ensemble forecasts and have been widely used in atmospheric and oceanic applications (Evensen, 1994;Mitchell, 1998, 2001;Houtekamer, 2000, 2002;Houtekamer et al., 2005;Whitaker et al., 2002;Lorenc, 2003;Evensen, 2003Evensen, , 2006Hanea et al., 2007). However the EnKF has not been applied in severe dust storm forecasts. In this study, we made an initial effort to explore the potential problems of this issue with EnKF. If there is at least one occurrence of floating dust phenomenon observed at stations located within 1 latitude degree around the PM 10 station during the day, the contribution of PM 10 observations of this station are considered as mainly coming from dust, and thus selected for assimilation and validation.
Otherwise the data would be discarded. The number of qualified PM 10 observations after quality control is listed in Table 1, clearly indicating irregular sampling of dust storms. Figure 1 presents the distribution of observation sites (green dots) passing through quality control and used for assimilation on 22 March, and the red rectangles represent three selected independent sites that are only used for verification. The Lidar observation at Beijing was performed at the Sino-Japan Friendship Center for Environmental Protection during the same period (Sugimoto et al., 2003). The visibility observations are the surface observations from the China Meteorological Administration (CMA).

Model description and setting
The regional dust transport model (Wang et al., 2000) included deflation, transport, diffusion, and removal processes during the life cycle of the yellow sand particles. This model had been successfully used to study atmospheric trace gases and particles, such as SO x , dust, O 3 and acid rain over East Asia (Wang et al., 2000(Wang et al., , 2002Uematsu et al., 2003). The advanced deflation module of yellow sand has been designed after detailed analysis of the meteorological conditions, landform, and climatology from daily weather report at about 300 local weather stations in north China. Details about the model are described by Wang et al. (2000). The simulation domain used ranges from (75 • E, 16 • N) to (146 • E, 60 • N) consisting of 72 by 45 horizontal grid cells and 18 vertical layers as shown in Fig. 1. A heavy dust storm occurred in northern China and reached Beijing on 20 March 2002, with peak concentration of Total Suspended Particles (TSP) reaching 10.9 mg m −3 , 54 times higher than the National Air Quality Standard of China (Sun et al., 2004). Model analysis with Lidar observation of this dust storm in Beijing revealed the source and transport path of the dust and further explained the reasons for the occurrence of such extremely high dust concentrations (Sugimoto et al., 2003). It not only swept over most parts of China but also reached Korea and Japan. Using model simulation, Park et al. (2003) studied dust emissions from the source areas of it. In addition, Shao et al. (2003) simulated it with an integrated modeling system and found the model could accurately predict the spatial pattern and temporal evolution of dust concentration. Han et al. (2004) developed a size-segregated aerosol model and coupled this with a regional air quality model to simulate the dust storms of 15-24 March 2002. In this study, we developed a regional chemical transport model combined with EnKF data assimilation method to improve the forecast performance and to investigate the vertical structure of this super dust storm during the period of 15-25 March 2002 in East Asia.

Data assimilation with Ensemble Kalman Filter
The basic idea of the EnKF (Evensen, 1994) is to construct a Monte Carlo ensemble such that the mean of the ensemble is the best estimate, and the ensemble error covariance is a good estimate of the forecast error covariance.
At the current assimilation time t (for notational simplicity, the t time subscript will be dropped), we assume that we have an ensemble of forecasts that randomly sample the forecast errors, denoted by The EnKF performs an ensemble of similar assimilation cycles, i=1, . . ., m, with each member updated to a different realization of the observations: where H is the observation operator that maps the model states to the observation space. In Eq. (1), y i ≈N(0, R) is a perturbed observations from observations y, and R is the observation error covariance which is diagonal with variance equal to 10% of the value of y. The gain matrixK is defined aŝ It can be formed without ever explicitly estimating and storing the full forecast error covarianceP b , but byusing the following equations to calculateP b H T and HP b H T directly (Evensen, 1994, Houtekamer andMitchell, 1998): In Eqs. (3) and (4), Once each member is updated, we take the analyzed ensem- x a i as the optimal analysis. In this study, the initial background ensemble perturbations are generated by adding random amplitude and phase shifts to the first-guess x(x, y, z) as follows: , ω ∈ N(0, l 2 y ), η ∈ N(0, l 2 z ). l x , gf, l y and l z represent the standard deviations of the phase perturbations and are assumed to be about 200 km in this study, while the amplitude perturbation a is assumed to be 20% of the first guess. In order to prevent the usage of negative values, we generate a using an exponential function as a = e γ ×sd− 1 2 ×sd 2 where γ is randomly drawn from a normal distribution with zero mean and variance equal to 1, and sd is equal to 20%. A parameter α≥1 is introduced to allow for inflation of the forecast error variance since the ensemble spread itself may be too small to draw the model states to the observations. In this case, Eq. (2) will be rewritten as In the study here, the parameter α increases with time from 1 to 640 (estimated according to the likely minimal root mean square error). In addition, the value for dust concentrations should be positive. Therefore, the analysis would be set to be zero if it is negative.

Results
Two sets of model runs with and without the assimilation schemes addressed above were performed to test the performance of the EnKF used in the regional transport model of dust. Firstly, the tests were performed once a day during 15-25 March 2002 with initial perturbations generated on 03:00 UTC, 15 March to check the overall impact of the assimilation on 24-h forecasts.
To give an impression of the anisotropic nature of the horizontal correlations, we present some examples of horizontal correlations of surface concentrations with respect to the points shown with the black dots in Figs. 2 and 3. The correlations are directly estimated from 50-member ensembles on 03:00 UTC, 16 and 20 March. It can be seen that the spatial structures are dependent on the flow which displays the most important property of EnKF compared to other traditional data assimilation methods such as optimal interpolation (OI) and three-dimension variational assimilation (3D-Var). In OI and 3D-Var algorithms, the statistics in background error co-  variance are generally taken to be isotropic and largely homogeneous with little variation in time, which is not consistent with the real systems. The root mean square (RMS) errors between daily averaged PM 10 observations and the simulated 24 h-averaged forecasts (d<10 µm) without assimilation (line rectangle) and the analyses with EnKF assimilation (shaded rectangle) at three independent observation stations were calculated during 16-25 March 2002 (Fig. 4). It can be seen that the RMS errors of the assimilated results are much smaller than that without assimilation totally, which clearly shows the potential ability of the EnKF method used in the regional transport model. The RMS errors of the 24 h-averaged forecasts without (line rectangle) and with (shaded rectangle) assimilation for all sites are shown in Fig. 5. The difference between observations and the 24 h forecasts with assimilation are smaller than those without assimilation, showing that the forecasts are also improved after EnKF assimilation, but not obviously for the whole area.
An independent Lidar observed the dust extinction coefficients in Beijing in the same period. Figure 6 gives a comparison between Lidar observations and the modeled dust extinction coefficient with and without assimilation. These results illustrate that use of the EnKF method solely assimilating surface PM 10 concentration changed the vertical structures of dust distribution and significantly improved the modeling results in Beijing compared with Lidar observations. Two peaks of dust concentration distribution exhibited on 20 March in Beijing with EnKF assimilation agree well with the observed dust extinction coefficients near the surface. For higher layers about 500-1000 m above the ground, it is possible that observations on 20 March might be missing because the dust layer is too thick. This prevents the penetration of the lidar signal, so that we cannot compare the results in detail. Generally speaking, the vertical distribution after assimilation is improved with the RMS error reduced (Fig. 6d) and the correlation a little increase (Fig. 6e). This may be largely

Observation
With assimilation Without assimilation (c 3 Fig. 8. Comparisons of assimilation results and simulation with observed daily-mean PM 10 observations at three independent observation stations, respectively.  due to the vertical structure in background error covariance calculated from the ensemble, which propagates the surface PM 10 observations to high levels. Surface dust concentrations and visibility at 3 h intervals are compared in Beijing in Fig. 7. On 20 March, the two troughs of visibility correspond to the two peaks of dust concentrations with assimilation but just one peak of the simulated results without assimilation, with the bigger trough corresponding to the higher peak. This clearly demonstrates the important role ofthe ensemble Kalman filter plays. The correlation between the 3 h surface visibility and dust concentrations during the period of 20-21 March is −0.69 without  assimilation and −0.73 with assimilation, which exceed the significance of 99%. Figure 8 shows the assimilated results of surface dust aerosols (d<10 µm) with assimilation once a day at three independent observed stations, respectively (a, b, c as shown in Fig. 1). It shows that the assimilation analyses are closer to the observations than in the simulation without EnKF assimilation, especially at Shijiazhuang (b) and Shanghai (c). The assimilated results in Dalian (a) are not so close to the observations as the simulation, which may be due to the low resolution of the model. If the model resolution is relatively high the assimilated results are better, such as the results of (b) and (c).
We here select point A (Dalian) to explain the important of resolution. For Dalian, the analyses on 17 and 21 March are worse than in the simulation, while the others are comparable to the simulation. From Fig. 9 we can see that the variation of observed PM 10 at two nearly points (Dalian and Qinhuangdao) are quite different, especially on 17 March and 21 March, indicateing that spatial variability of the dust storm is very strong and the correlation coefficient is in fact small. However, the resolution of model is 1 • ×1 • , which may be too low to resolve the spatial variability of such processes, and then the dust concentrations calculated from the model are similarly distributed in a relatively large region. So the correlations estimated directly from the ensemble forecasts are distributed in a relatively large region, which does not agree with the actual one, so it can bias the assimilated results. Figure 10 gives the correlation distribution of Dalian on 17 March (upper panel) and 21 March (bottom panel), in which the black dot and the green dot denote the position of Dalian and Qinghuangdao respectively. We can see that the correlation coefficients of these two points are larger than 0.8. Therefore, the method to solve it is to increase the model resolution. Figures 11 and 12 give surface distribution of 24 haveraged dust concentration over East Asia without (a) and with assimilation (b) on 20 21 March, respectively. The contours show the difference between situations with and without assimilation. Compared with the PM 10 observations, the forecast concentrations on 20 March are much larger than the observations, especially in the north part of North China, while smaller in the south part. After EnKF assimilation, the average of one-day forecast concentrations decreases in the north part and increases in the south part of North China. On 21 March, the forecasts after assimilation decrease in the middle and south part and increase in north and southwest part of North China. Overall, the EnKF assimilation compensates the model deficiency and improves the forecasts.

Conclusion and discussion
The correlation patterns of several selected points shown in Figs. 2 and 3 prove that EnKF can calculate the flowdependent statistics which may not be expected in other traditional assimilation techniques. To use surface PM 10 observations for data assimilation of dust storms, it is necessary to select them according to the surface synoptic dust event reports. This study shows that, using advanced methods such as EnKF, the assimilation of surface PM 10 observations can provide better initial conditions and lead to improved forecasts of dust storms. However, the forecasts still have much room for improvement. First, the current PM 10 observations that are reported to SEPA are only daily-averaged. For fast changing processes such as dust storms, this study shows that much more frequent observations are needed to correctly describe the fast evolution structure. Denser observational networks are also necessary to specify the spatial variability of such processes as air pollution (see Fig. 8). Second, the model errors are the main contributors to forecast errors, at least in some regions. The assimilation can provide good initial conditions, but a forecast with large errors can result from model errors (see Fig. 5). Therefore, significant improvement of forecasts it requires identification and correction model errors during the assimilation procedure. This may be achieved by either four-dimensional variation method or augmented EnKF. This should be a priority for further studies in this direction.
The dust concentrations vary very rapidly and are generally independent in different dust process. Therefore, the ensemble perturbations generated at the initial time may not have enough ensemble spread to represent the background error after several assimilation cycles. In this study we found it is necessary to use a large inflation parameter (α defined in Eq. (5). In future study, we will try our best to specify the model errors (e.g. errors in meteorological fields, dust emissions, dry deposition velocity) to overcome this limitation. In addition, we also found that the analysis may have negative values for dust concentrations. We use a simple approach that sets negative values to zero. More skillful methods should be explored in further studies.