Interactive comment on “ Ensemble filter based estimation of spatially distributed parameters in a mesoscale dust model : experiments with simulated and real data ”

In this paper, the authors attempt to estimate dust erodibility maps for North Africa and the Arabian peninsula, based on observations and model calculations. Their methodology is based on Bayesian inference, in particular an ensemble Kalman filter. The authors explain methodology, test it in controlled experiments (Observing System Simulation Experiments) and finally apply their method to real (MODIS) observations of AOT. Independent forecasts using the new erodibility maps are validated against MODIS

surface winds (or surface wind stress) to calculate dust emissions. Using the Saharan desert as a test bed, a perfect model Observation System Simulation Experiments (OSSEs) with 40 ensemble members, and observations of aerosol optical depth (AOD), the EAKF is shown to recover correct values of erodibility at about 80 % of the points in the domain. It is found that dust advected from upstream grid points acts as noise 10 and complicates erodibility estimation. It is also found that the rate of convergence is significantly impacted by the structure of the initial distribution of erodibility estimates; isotropic initial distributions exhibit slow convergence while initial distributions with geographically localized structure converge more quickly. Experiments using observations of Deep Blue AOD retrievals from the MODIS satellite sensor result in erodibility 15 estimates that are considerably lower than the values used operationally. Verification shows that the use of the tuned erodibility field results in better predictions of AOD over the Western Sahara and Arabia.

Introduction
Uncertainty in initial conditions, incorrect boundary conditions, and model inadequa- 20 cies render forecasts of the atmosphere generated using Numerical Weather Prediction (NWP) models inaccurate. To obtain the best initial conditions possible, estimation techniques (e.g. data assimilation) are used to combine the state estimates given by the model and those given by the observations. There are a multitude of data assimilation (DA) techniques used in the geophysical community. The first truly operational similation. The theoretical development includes different formulations of the ensemble filter (Bishop et al., 2001;Tipett et al., 2003;Zupanksi, 2005;Hodyss, 2012) and their inter comparisons (Lawson and Hansen, 2004;Lei et al., 2010). Ensemble based DA has been applied to an entire gamut of atmospheric (Majumdar et al., 2002;Whitaker et al., 2004) and oceanographic problems. Houtekamer et al. (2005) have done en- 15 semble based DA in a global model using real satellite and other observations. The performance of ensemble DA in mesoscale models has been investigated by Dirren et al. (2007), by using radio soundings and aircraft observations in the Weather Research and Forecasting model. Wang et al. (2008) (Anderson, 2001) is employed within the DART framework (Anderson et al., 2009;Whitcomb, 2008).
the state (AOD and the dust concentration) and parameters (erodibility as a proxy for source region) related to dust production are estimated by assimilating observations of AOD.

5
estimation experiments with both simulated and real satellite observations are performed.
We find that a 40 member ensemble is able to successfully tune the spatially distributed erodibility parameters in the Observation System Simulation Experiment (OSSE). The covariance between the local erodibility and locally produced dust drives the tuning 10 of the erodibility parameters. Advected dust, as opposed to locally produced dust, confounds the covariance signal thereby acting as noise. The tuning improves if the perturbations in parameters are correlated over length scales of about 5 grid points (400 km). Guided by the success of the OSSE, we have performed tuning experiment using satellite data of Aerosol Optical Depth (AOD). The tuned values are significantly 15 smaller than the operational values of erodibility. The verification experiments show that using the tuned erodibility map leads to a significant decrease in the mean absolute error of AOD forecast over large parts of Western Sahara and Arabia. This paper is organized as follows. The model is described in Sect. 2. The tuning experiments using simulated observations are presented in Sects. 3, 4 and 5. Sec-20 tion 3 describes the setup of the simulated data tuning experiments. Section 3 also discusses the tuning of erodibility at a particular grid point in detail. The tuning of erodibility over the whole domain is discussed in Sect. 4. In this section the perturbations in the erodibility at each grid point are assumed to be independent. The case of correlated perturbations in erodibility is considered in Sect. 5. The tuning experiments with real 25 satellite data are described in Sect. 6. The tuned erodibility is used to run verification experiments whose results are presented in Sect. 6

COAMPS mesoscale aerosol model
The meteorological community, over the years, has developed many mesoscale models (e.g. WRF, Skamarock et al., 2005) for researching and forecasting weather phenomenon. The Coupled Ocean/Atmosphere Mesoscale Prediction System (COAMPS) (Hodur, 1997;Chen et al., 2003) is a mesoscale model used to simulate various atmo-5 spheric (Doyle and Bond, 2001) and oceanographic phenomenon (May et al., 2011). It is used not only for basic research, but operationally by the US Navy. The atmospheric model is non-hydrostatic and fully compressible. It allows for nested grids in which the resolution can increase up to a few meters. COAMPS employs staggered horizontal and vertical grids with a terrain following sigma Z system for the vertical coordinate. 10 COAMPS includes advanced parametrizations for subgrid scale mixing, radiation, cumulus parametrization and explicit moist physics. In this work COAMPS is run with 30 vertical levels. Throughout this work the model uses a resolution of 81 km in the horizontal.
Both the research and operational versions of COAMPS includes a dust module to 15 model the generation, transport and physical effects of aerosols particles, including their size and physical transformations (Liu et al., 2003(Liu et al., , 2007. The module includes simulation of sinks such as sedimentation, dry deposition and wet removal. The integration of the aerosol module provides outputs of various quantities like mass loading, size distribution, optical depth etc. COAMPS can be used for research purposes. 20 The details of the COAMPS dust aerosol model are as follows. The vertical dust flux F at a particular grid point (i , j ) is given by Westphal et al. (1988) as, where the subscript i , j denotes the latitude and longitude index, respectively. The dust is generated only if u * i ,j > u * t i ,j where u * t i ,j is a threshold friction velocity. The amount of dust mobilized depends upon the transfer of (atmospheric) momentum to the earth's surface. This transfer of momentum is proportional to the surface stress τ (Gill, 1982). The friction velocity u * is related to the surface stress through u * = τ/ρ 5 where ρ is the density. Using theory and experimentation it is shown that the dust flux is proportional to the fourth power of the friction velocity (Gillette and Passi, 1988;Nickling and Gillies, 1993). This proportionality forms the basis of Eq. (1). The dust is mobilized because the surface wind erodes the land surface. Different land surfaces have different susceptibilities to erosion by wind. The susceptibility basically depends 10 on the type of soil covering the land surface. For example, a land surface covered by thick vegetation is less susceptible to erosion than one covered with loose and disturbed soil. The production of dust requires a threshold friction velocity to be reached before dust particles can be lifted from the surface. This threshold friction velocity is represented by u * t . At a given grid point dust is not mobilized for u * < u * t . The values 15 of u * t for various land types have been estimated using field experiment data and laboratory experimentation (Gillette and Passi, 1988). Various modeling studies (Westphal et al., 1988;Liu et al., 2007) use a value of u * t = 0.6 m s −1 for all land types for simplicity. Given a particular model grid box, the whole grid box need not be covered by erodible land. Therefore even if u * exceeds u * t , only a part of the grid box which is erodible may 20 emit dust. This is quantified by the erodibility α i ,j in Eq. (1) as a spatial weighting function. At each grid point the erodibility has a value between 0 (no emission) and 1.0 (all emission). The value of the constant k in Eq. (1) is taken from Westphal et al. (1987), which in turn was motivated by Gillette (1981). This value of k is the slope of the linear fit to the scatter plot of experimentally obtained flux data for various values of friction velocity. Since the current study focuses on satellite data assimilation, we model only the actively optical and transportable dust with an assumed diameter of 2 micron for microphysical purposes. The amount of dust in the atmosphere is quantified by the dust concentration (c m ) in µg m −3 . The Aerosol Optical Depth (AOD) is another measure of the amount of dust.
The AOD at a particular grid point i , j is obtained by vertically integrating the dust light extinction over the atmospheric column, which is simply defined here as the mass concentration times a mass extinction efficiency (a e ) taken as 0.5 m 2 g −1 .
where z is the height. Hence, here we are assuming that AOD is linearly proportional to total mass concentration. In reality, dynamics of dust particle size, especially large particles near sources can be quite complicated. However for the purpose of this work this assumption is valid because we want to tune the dust emitting areas to the first 10 order.
The dust generated at various locations in the domain is mixed vertically and advected horizontally. The dust in the atmosphere at a given grid point and vertical level is due to local generation and that advected from upstream areas. The share of the advected and local dust in the total dust depends on meteorological conditions, specif- 15 ically the wind field. The amount of local dust depends on the erodibility and friction velocity at that grid point. It is possible that for a particular grid point at a particular time at some vertical levels the advected dust dominates while at other levels the local dust constitutes the major portion of the dust. In general the total dust contains contributions from local production and dust transported from other areas. Since AOD is the vertical 20 integral of dust concentration, the total AOD at a grid point has contribution from local and transported dust.
The AOD at a particular grid point can be expressed as, The transported AOD is due to dust that is produced in upstream regions and advected by winds. The local generation is given by the dust flux F i ,j . Therefore, The horizontal area element is represented by da and the time step is given by dt. Substituting for the dust flux from Eq. (1), AOD i ,j can be expressed as, Though the sink term is not mentioned in these equations, the actual model calculates the removal of the aerosol. Equation (2) decomposes the total AOD into the local and advected component. This decomposition is central to the understanding of the tuning of erodibility as will be evident in Sect. 4. The erodibility plays an important 10 role in the calculation of AOD. In this work α is used to denote the erodibility vector whose components (α i ,j ) are the erodibilities at various grid points. Accurate forecasts of dust production and transport depend critically on an accurate map of erodibility (Liu et al., 2007). The determination of the value of α at various locations on the earth is a formidable task. Many researchers (Westphal et al., 1988;Tegen and Fung, 1994;15 Park and In, 2003;Walker et al., 2009) have made significant efforts to produce maps of α for important dust producing regions of the earth. The efforts made by these researchers involve the analyses of different types of landforms and the variation of their properties with season etc. These efforts involve the visual inspection of atlases and also observations of Aerosol Optical Depths (AOD). 20 In the current work we aim to use satellite observations of AOD to estimate an optically active α in the North African region by employing an ensemble based estimation approach. Note that the satellite observations of the total AOD, that is the left hand side of Eq. (2)  Prediction System (NOGAPS) global model (Hogan and Rosmond, 1991). An ensemble of these boundary conditions is used so that each ensemble member is a different realization of meteorology. The perfect model experiment uses a particular ensemble member of meteorology. The AOD "observations" from the assumed perfect model are assimilated into the imperfect model and the following question is posed: are the per- 15 fect values of erodibility recovered by the ensemble based tuning? Because one has defined the imperfect model to be different from perfect model only in α, the OSSE represents the best case scenario. Compared to nature, the model has many errors apart from imperfect α. Hence if the perfect α values are not recovered in the OSSE then α will certainly not be tuned correctly using real data. 20 The OSSE is run using the meteorology of June/July 2009, corresponding to the well-known peak in the Saharan dust production and its westward transport into the subtropical Atlantic Ocean. The operational values of α over the Sahara domain are shown in Fig. 1a and are used in the operational run of COAMPS. These values are used here for the perfect model run. An index of the frequency with which the threshold 25 friction velocity is achieved in June/July 2009 at 12:00 Z is shown in Fig. 1b. This index is the fraction (expressed in percent) of the total days in June/July 2009 period that u * exceeded 0.6 m s −1 at 12:00 Z. The observations of AOD are used to update the AOD, Introduction dust concentration and the erodibility map. Note that the dust concentration is a threedimensional field. Operationally a threshold value of u * t = 0.6 is used. However using the threshold value of u * t = 0.6 in the OSSE would tune values only in high friction velocity regions, thus complicating the interpretation of OSSE results. Therefore in the OSSE a value of u * t = 0.0 is used.

5
Throughout this work an ensemble size of N = 40 is used. Each ensemble member has a different initial value of α. At each grid point the ensemble for α i ,j is obtained by sampling 40 ensemble members from a Gaussian distribution ξ(0. 25, 0.25) where 0.25 is the mean and the standard deviation (spread). The ensemble members with negative values are set equal to 0.01. This distribution defines the initial guess. The 10 maps of the mean and standard deviation of this initial guess are shown in Fig. 1c and d, respectively. The model is spun up by integrating ensemble members for 60 h starting at 00:00 Z, 10 June 2009. The first DA cycle is implemented at 12:00 Z, 12 June 2009. The DA cycling frequency is 24 h. That is, the DA cycle (update) is implemented at 12:00 Z, every day. This frequency for update is chosen because real satellite data 15 is available at 12:00 Z every day. The OSSE is run for 48 days ending on 18 July 2009 at 12:00 Z, so that there are 48 update cycles. Ensemble analysis boundary conditions from a global model (NOGAPS) are used every 6 h. These ensemble analysis are obtained by the local Ensemble transform technique (McLay et al., 2010). Only the AOD is observed. The dust concentration and erodibility are not observed. In this work 20 meteorological observations are not assimilated.
Data assimilation experiments with only state (dust concentration) estimation were performed. The forecasts did not improve with state estimation alone. The results from this experiment show that improving the initial conditions in c m is not important. This is because the source and sinks of dust are strong over a 24 h period and play the key 25 role in deciding the forecast. In other words over a 24 h period the sources dominate the dust transport especially over areas of strong dust generation.
The theory underlying parameter estimation is the same as that of state estimation. Therefore, the state is augmented by the parameters (α) and data is assimilated. How-ACPD 12,2012 Ensemble filter based estimation of mesoscale parameters Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ever parameters do not have dynamical equations to integrate them in time which gives rise to some problems. The next section describes these problems and the methodology used in this work to address these problems.

Spread in α
The state variables (temperature, wind speed etc.) have dynamics evolution equations 5 which are used to integrate these variables forward in time between consecutive updates. However the parameter α which is tuned in this work does not have a dynamics evolution equation. At a particular grid point we use the following equation to step forward each ensemble member of α i ,j where k denotes a particular ensemble member and t denotes time.
The lack of a dynamic evolution equation for α gives rise to another issue -that of spread in α. Theory states that each time data assimilation updates α, the spread in α must decrease or remain constant. The smaller the spread in α the less impact observations in succeeding update cycles will have. In the absence of a dynamic evolution equation for α where errors grow in time, the prior spread at a particular update is simply the posterior spread at the last update cycle. This problem is addressed in this work by using conditional inflation (Aksoy et al., 2006). If the posterior spread in α i ,j falls below a particular threshold value (∆α th ) the posterior perturbations in α i ,j are scaled so that the spread is equal to a particular fixed value (∆α fix ).
whereα i ,j is the mean erodibility. In this work the threshold value used is ∆α th = 0.05 and ∆α fix = 0.05. These values are chosen after experimentation with different values. If the mean of posterior α i ,j is close to the limits of α i ,j (0 and 1) then a different strategy is employed. If the mean is less than 0.05 or more than 0.095, the spread is set equal to 0.015. Also if the posterior mean decreases below 0.03 (increases above 0.97) it is reset to 0.03 (0.97). One more issue is the risk of unphysical values of the posterior parameter ensemble. Negative values of α i ,j are physically meaningless and it is possible that an update results in 5 negative values for some members of the posterior α i ,j . In this work such ensemble members with negative values are set equal to 0.01. Before presenting the results of tuning over the whole domain, in the next section the tuning of α i ,j (at a single grid point) in the OSSE is explained.

Tuning at a grid point
It is instructive to consider the tuning of α i ,j at a single grid point, allowing the illumination of various issues involved in ensemble based parameter estimation. The full three dimensional COAMPS model is run with assimilation of simulated AOD data every 24 h. In the experiment described in this section each α i ,j is updated using AOD observation only at that grid point i , j . In this experiment the AOD is observed at all grid points. 15 The tuning of α i ,j at point K (Fig. 1a) as the update cycles proceed is shown in Fig. 2a. It shows the mean and standard deviation of the α i ,j estimate as the update cycles proceed. The red line shows the truth, that is, the operational value of α i ,j at this point K. The mean and standard deviation of the initial guess is 0.3 and 0.2, respectively. As the update cycles proceed the estimate of α i ,j approaches the correct value. 20 For this grid point, by 20 cycles the correct value is recovered.
In this example, the α i ,j update uses AOD observations only at the same grid point, the mean of the posterior (or update) at any update cycle is given by, Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | error variance is given by var(AOD obs ) which is set to 10 % of the mean observation. This observational error variance is motivated by AOD satellite data whose error is typically 10 % of mean observation. The other terms are calculated from the short term ensemble forecast (the prior). The covariance cov α prior , AOD prior plays an important role in the update equation. This covariance exists because of the relation between 5 AOD prior and α prior which is given by Eq.
(2). The short term ensemble forecast gives 40 different AOD realizations. Each realization of AOD prior corresponds to a particular realization of local variables, non-local variables and meteorology. The local (at point K) variables are α prior and u * . The non-local variables are α and u * at regions that are upstream of point K. The meteorological variable of interest is wind because it advects 10 dust from upstream regions. Therefore the uncertainty in the AOD prior ensemble is due to the uncertainties in local α, local u * , upstream α, upstream u * and winds. From Eq.
(2), the uncertainty in prior AOD can be written as, var(AOD prior ) = var(AOD local prior ) + var(AOD transport prior ) The contribution of uncertainty in local variables is contained in the first term on the 15 right hand side (rhs) of Eq. (4). The contribution from non-local variables and winds is given by the the second term on the rhs of Eq. (4). Out of the total spread of α prior only a part is correlated with the α prior . This part is the first term of Eq. (4). The remaining spread is due to that in the transported AOD given by the second term which acts as advective additive noise. Given a particular magnitude of advective noise, the strength 20 of the covariance between AOD prior and α prior depends on the magnitude of the friction velocity. The time series of u * and cov α prior , AOD prior is shown in Fig. 2b. The covariance (scaled up by a factor of 10) is shown by the green curve. The covariance tends to be higher for higher values of u * . This is because stronger local generation helps the covariance signal to rise above advective noise. The prior AOD and α ensembles Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | due to local α and additive noise. A finite size ensemble is used to estimate the true covariance. Because of the small size of the ensemble it is expected that the ensemble estimate of covariances will not match the true covariance. Such covariances are termed spurious. Spurious covariance is basically an inaccurate estimate of the true covariance due to sampling errors. For example (Fig. 2b) the negative covariance val-5 ues at update cycle 4 and 11 are spurious. The finite (small) size of the ensemble (40 in this work) is the reason for these spurious covariances. The estimates of AOD are shown in panel Fig. 2d. The red curve in Fig. 2d is the truth which is same as the mean observation of AOD. In this work the mean of the simulated observations is not perturbed. The difference between the observation mean and prior 10 AOD mean is termed the "innovation". (AOD obs − AOD prior ) is the innovation in Eq. (3). At update cycle 6 the innovation is about 1.0. The difference (α up − α prior ) is termed the increment or correction. At each update cycle the covariance, the innovation and the uncertainty in AOD together dictate the correction in α. The sign of the innovation, along with the prior covariance decides the sign of the increment. The product of 15 the covariance and innovation is scaled by the uncertainty in the AOD estimates. The uncertainty in AOD estimates is given by the denominator in Eq. (3). This quantifies the confidence one should place in the innovation. Notice that the increment in α is almost zero at update cycle 4 in Fig. 2a, though the covariance (Fig. 2b) is ∼ 0.2. This is because the innovation (Fig. 2d) is zero at this update cycle. As the update cycles 20 proceed the corrections at each update cycles progressively push the α value towards its true value (0.48). The uncertainty in the estimate of α progressively decreases. After about 20 update cycles the α estimate almost coincides with truth and as a result the mean forecasted AOD and the observed AOD almost coincide (Fig. 2d). The innovation becomes small after cycle 20 and hence the corrections are small. 25 The estimate of covariance obtained using the ensemble plays a central role in deciding the quality of tuning. Spurious covariance can seriously hamper successful tuning and since they are unavoidable it is important to properly account for them. This issue of spurious covariances is especially important when AOD observations at many differ-ACPD 12,2012 Ensemble filter based estimation of mesoscale parameters Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | ent spatial locations are available and used in the estimation. In principle α at a single grid point is updated using all available observations of AOD. The covariance between α and AOD at the observed grid point informs what part of the innovation is used in the increment. This covariance is calculated using the ensemble. If a very large ensemble size is used the ensemble covariance is more accurate. With a small ensemble size the 5 estimated covariance tends to be inaccurate especially if the true covariance is small. The true covariance with a grid point geographically far away tends to be smaller and hence the estimated covariance should be trusted less for far away grid points. The concept of a cutoff radius or a localization radius is widely used in ensemble based filtering work to address the problem of spurious covariances (Hamill et al., 2001). The 10 cutoff radius, c, dictates the distance over which observations are used to calculate the correction. This is achieved by defining a (localization) function that decays as one moves away from the grid point being tuned. The Gaspari-Cohn function (Gaspari and Cohn, 1999) is used in this work. The width of this function is governed by the value of c. In the current work, we will run experiments with various values of c. The functions 15 corresponding to the values of c used in this work are shown in Fig. 3 as dashed curves at a single grid point. This grid point is marked 0 on the x-axis. The dashed cyan curve shows the Gaspari-Cohn function corresponding to c = 5 grid points. In this work the distance is mentioned is units of grid points. The horizontal resolution used in this work is 81 km (5 grid points is equal to 400 km). 20 The functions peak at the grid point marked 0. This is the grid point where α is being tuned. The Gaspari-Cohn function value at a particular grid point is used as a multiplicative factor to decrease the correction due to observation at that grid point. The functions are shown in one dimension along a latitude circle in Fig. 3. However actual functions are defined in two dimensions. Practically, observations at all grid points more than a distance of 2c from the point of interest do not have any impact on the corrections.
In the next section the results of tuning α (all over the domain) is described.

Tuning with uncorrelated α perturbations in OSSE
In the last section it was assumed that observations of AOD are available at all points in the domain, whereas in reality actual satellite observations are available for many (but not all) locations. At any given update cycle the satellite observations are sparse. This sparseness of satellite observations is mimicked in the OSSE by observing AOD 5 at 20 % of grid points in the domain. These 20 % grid points are randomly chosen at each update cycle. In the satellite observations however the sparse regions need not change randomly with time. The observational error is set equal to 10 % of the mean observation, consistent with instrument accuracy. We begin with an experiment in which the cutoff radius is set to zero. This means that α i ,j at any given grid point uses only the AOD observation at that grid point. A cutoff radius is not imposed in the vertical. The mean and standard deviation of the initial guess α is shown in Fig. 1c, d. The perturbations in α in initial guess are uncorrelated. The result of this experiment, that is, the ensemble mean of the tuned α (after 48 cycles) over the domain is shown in Fig. 1e. The uncertainty in this estimated mean  The success of the tuning experiment is further quantified by comparing the tuned α value at each grid point to the true α value at that grid point. α at a particular grid point is deemed to be (correctly) tuned if its tuned value lies within 0.05 of the true value at that grid point. Otherwise it is deemed to be untuned. This criterion of 0.05 (in absolute units of α) is an arbitrary choice. This criterion is used throughout this work 5 to determine the quality of tuning. The distribution of the tuned and untuned points over the domain is shown in Fig. 4a. The grid points colored with yellow are those tuned successfully. The white grid points are the untuned grid points. The blue contours enclose areas with high friction velocity. These contours correspond to areas in which the friction velocity is above 0.6 m s at least 20 % of times Fig. 4b. The red contours enclose areas with high true erodibility (more than 0.25). Some of the areas with high friction velocity are marked S1, S2 and S3. Notice that in these areas the tuning is successful. In the horn of Africa (S1) almost all the grid points are successfully tuned. Recall that the friction velocity gives rise to the covariance signal. Consequently areas with strong friction velocity tend to be tuned well. Some areas with weak friction velocity 15 are marked W1, W2, W3 and W4. These areas tend to be poorly tuned. Consider area W1. Note that W1 is an area of weak friction velocity sandwiched between areas of high friction velocity on its north and south. Not only does it have a weak signal but also high advection noise because it lies in an area of high erodibility (it is enclosed by the red contour). As pointed out in Sect. 3.3 the advection noise is additive noise. 20 The combination of low friction velocity (small local signal of AOD), and large amounts of advected AOD makes it difficult to correctly estimate the erodibility parameters. The situation is similar with area W3 in arabia and W2 in the center of the domain. The area W4 in the south Sahara has weak friction velocity and low erodibility.
Using this criterion of 0.05 the number of grid point successfully tuned is counted 25 and expressed as percentage of the total number of grid points. This percentage is shown in Fig. 5  The experiment discussed above (dashed magenta curve in Fig. 5) used a value of c = 0. However using a cutoff radius of 0 prohibits the update at any grid point from using observations in adjoining areas. To assess the impact of using more observations, tuning experiments are run with non-zero values of c. The solid magenta (squares) curve in Fig. 5 shows the percentage of grid points tuned correctly for an 5 experiment with uncorrelated initial α perturbations and a cutoff radius equal to 20 grid points. Clearly the results degrade compared to the c = 0 experiment (dashed magenta curve) even though the update of α at any given grid point uses more observations in c = 20 experiment than that in c = 0 experiment. Recall that apart from the observation of AOD, a good estimate of the covariance between α at the point being updated and 10 AOD at the location of observation is also important for correct tuning. Apparently, in the c = 20 tuning experiment the ensemble does not correctly estimate the covariance between α at any given grid point being updated and the neighboring location where the observation is available. This leads to the degradation of the tuning because along with many observations, the update uses many bad covariances. The reason for the 15 bad covariances is a combination of the effect of advective noise and the small size of the ensemble. The covariance between α at a particular point and AOD at another point is partly controlled by the correlation between α perturbations at these two points. For the experiment described in this section, the updates do not result in correlating the α perturbations, that is, the initially uncorrelated α perturbations remain uncorrelated 20 at the end of all the update cycles. In the current experiment the perturbations are uncorrelated and hence dust generated by all the points within the neighborhood of point of interest contributes to the advective noise. The small ensemble finds it difficult to capture the local signal, due to this advective noise resulting in spurious covariances. Hence including the observations of AOD from neighboring grid points degrades the  However one would like to use as many observations as possible by setting c > 0. The main hurdle to using c > 0, is the advective noise. What can be done to address this problem? A possible solution to this problem is to correlate the perturbations in neighboring α thereby reducing the advective noise. Also, the results of the experiments in this section suggest that the assimilation of observations does not impose a 5 correlation structure in the α field. That is, the observations are unable to recover the correlation structure (if any) between initially uncorrelated α. Can the assimilation of observations recover the correlation structure if the α perturbations are initially correlated? The next section considers the issue of initially correlated α perturbations. 10 In this section the initial perturbations of α are spatially correlated. Some examples of the correlation functions between the α perturbations are shown in Fig. 3. The point marked 0 on the x-axis is the point of interest. The solid blue curve gives the correlation between the α perturbations at point 0 and that at various neighboring points along the latitude circle corresponding to a correlation length scale of l = 20 grid points. The 15 standard deviation of this correlation function is l = 20. This correlation is constructed by first sampling from (uncorrelated) ξ(0. 25, 0.25) and then constructing an ensemblemember by ensemble-member weighted average. These weights are chosen proportional to a two-dimensional gaussian function with standard deviation of l = 20. The cyan curve shows the correlation function for l = 5. The gray curve shows the corre-20 lation function for l = 0, that is, independent perturbations. The correlation function of any grid point in the experiment described in the Sect. 4 looks like the gray curve. The term "correlation function" will imply correlations between α perturbations (between two grid points).

Tuning with correlated α perturbations in OSSE
The red ( for l = 5, c = 5. This is because for the l = 20 experiment as the update cycles proceed the correlation function narrows to about 5 grid points. Also because c = 5, effectively only observations within a radius of about 5 grid points are used to update α at any grid point. The tuned map at the last update cycle for l = 20, c = 5 experiment is shown in 5 Fig. 5i. The tuned map corresponding to the l = c = 20 experiment (solid red (squares) curve in Fig. 5) is shown in Fig. 5g. Clearly the tuned map in Fig. 1i recovers the perfect map shown in Fig. 1a more accurately than does the l = c = 0 experiment (Fig. 1e) or the l = c = 20 experiment (Fig. 1g). Comparing Fig. 1f, j the estimate from the l = 20, c = 5 experiment is constrained better than l = c = 0 experiment as can be inferred 10 from the lower values of spread in Fig. 1j. The spatial distribution of tuned points for l = 20, c = 5 experiment is shown in Fig. 4b. Comparing this figure with Fig. 4a correlating perturbations and using more observations leads to tuning gains in high advection/low friction velocity regions like W1, W2, W3 and W4. This confirms our hypothesis that correlating perturbations leads 15 to decrease in advection noise thereby improving the covariance estimates. This also indicates that for this particular problem, on an average over the domain, an emergent correlation length scale is about 5 grid points (400 km). Imposing a correlation function of l = 5 is leading to better covariances. This does not mean that advection mainly happens over a length scale of 5 grid points. Advection most probably is taking place over 20 longer length scales. However, the linear signal due to advection survives only over a length scale of l = 5 grid points.
Various other experiments with different values of l and c are run to further investigate the interplay between correlation length and cutoff radius. The red and blue curves correspond to experiments with correlation length scales l = 20 and l = 10 (800 km), This shows that if l > c than the correlation length is effectively l = c as far as the data assimilation is concerned. As the update cycles proceed, l converges to 5 grid points.

5
Before this convergence happens since c < l , the observational information beyond a distance of c is not used. For the l = 10 experiments with c = 10 (blue circles) and c = 20 (blue squares) the behavior is similar to the l = 20 experiment with similar values of c. This shows that l > 5 is too broad for this problem and if c is specified longer than 5 then the data assimilation narrows the correlation function to 5 grid points. Lastly  Figure 7 shows the tuning at a particular point marked x in 15 W1 area in Fig. 4. The dashed magenta line uses observations only at the same grid point and hence the updates take place only when data is available at that grid point. Though the solid magenta curve has access to more observations, the covariance estimates are not good enough because the perturbations are not correlated. The red curve (squares) uses observations over a length scale of c = 20 while as the updates 20 progress the correlation narrows to 5 grid points. Consequently, the estimate does not converge towards the perfect value of α at this grid point very well. The solid green, blue and red curves converge smoothly because the correlation is over a scale of 5 grid points. These curves have access to more observational information and improved signal because of correlation. 25 The results from all these experiments suggest that the observations are able to uncover the correlation scale between neighboring α field provided the initial α perturbations are correlated over a broad length scale. This correlation scale for this problem ACPD 12,2012 Ensemble filter based estimation of mesoscale parameters is about 5 grid points. As seen in Sect. 4, if the initial α perturbations are uncorrelated, the observations are not able to impose a correlation structure as the updates proceed. The sensitivity of the OSSE tuning results to ensemble size was found by running experiments with smaller ensemble sizes. As noted, for an ensemble size of N = 40, about 85 % of the grid points are tuned for the l = 20, c = 5 experiment (solid red curve 5 in Fig. 5). This percentage decreases to 75 %, 60 % and 45 % for an ensemble size of 20, 10 and 5, respectively.
These results from the OSSE experiments provide valuable insights into the tuning of erodibility. They show that under ideal circumstances the erodibility is amenable to tuning given realistic observational coverage and errors. Ideal circumstances mean that 10 the only model error is imperfect values of erodibility. Even so it provides confidence in the tuning methodology to proceed with experiments with real data. The next section describes the tuning experiments with real satellite data.

Real data
MODIS Deep Blue data (Remer et al., 2005;Hsu et al., 2004Hsu et al., , 2006Shi et al., 2011) is 15 used for the experiments with real data. The satellite data is averaged over a box of 3 grid points (about 240 km) to obtain super observations. The errors in the observations could be correlated. The averaging serves to decorrelate these errors. The super observations are assimilated into the COAMPS model using the ensemble based tuning methodology. Here the observational error is set equal to 0.15 + 10 % AOD units, but 20 realistically the errors for some locations can be considerably greater (Shi et al., 2011). Shi et al. (2011)  is set to 0.6 m s −1 . It has to be noted that the experiment with real data is completely separate from the OSSE experiment described in Sects. 3, 4 and 5.
The operational values of α (Fig. 1a) are used as the mean of the initial guess. The ensemble perturbations in are correlated over a length scale of 5 grid points. The standard deviation of α at each grid point is set equal to 0.25. The negative ensemble 5 members are set equal to 0.01. The mean and standard deviation of this initial guess are shown in Fig. 8a and b, respectively. The standard deviation in some areas in Fig. 8b is lower than 0.25. This is because in these areas the mean of the initial guess has low values and therefore the ensemble members below the value of zero are set equal to 0.01. This decreases the standard deviation below 0.25. In this experiment the correlation cutoff radius is set equal to 5. After 28 update cycles the mean of tuned α converges to values shown in Fig. 8c. The standard deviation in the mean of these tuned values is shown in Fig. 8d. Considering the tuned map, on an average in west Sahara and Arabia the parameter estimation results in lower values of α compared to the operational values (Fig. 1a). In the South Sahara region (between latitude 5 and 15

10
• N) α converge to higher values. During the tuning process the ocean values of α are set equal to zero within the model. Comparing Fig. 8b and d it is evident that the standard deviation in the mean of the tuned values decreases to about 0.05 compared to the initial guess standard deviation. The performance of these tuned maps in forecasting AOD is compared to that of 20 the operational map. The verification is done over a period independent of the tuning period. Recall that the tuning experiment for the real data runs from 12:00 Z, 12 June 2009 to 12:00 Z, 8 July 2009 (28 days). The period from 8 July 2009 to 30 July 2009 (19 days) is used for verification. Two separate data assimilation (verification) experiments are run over the verification period. In these experiments only the dust concentration 25 is estimated. The erodibility parameter map is held fixed. The first DA experiment uses the operational erodibility map (Fig. 1a) and the second uses the tuned erodibility map (Fig. 8c) also assimilated to generate the posterior. This is not a problem because 24 h is long enough for the dust generation and transport to render the forecast almost independent of the initial conditions. The source of dust, that is the erodibility values, plays a dominating role in deciding the spatial distribution of dust over the 24 h period. Consider the verifications of these two experiments on a particular day. Figure 9 10 shows the mean estimates of AOD on 20 July 2009 at 12:00 Z. Figure 9e shows the satellite observations of AOD at 12:00 Z, 20 July 2009. The right side panels (Fig. 9b, d) corresponds to the operational experiments. Figure 9a, c shows the estimates from the tuned experiment. The same MODIS data are assimilated into each of these experiments. The prior shown in Fig. 9a is the mean of the 24 h ensemble forecast launched 15 starting from the posterior AOD on 19 July 2009 for the tuned experiment. This forecast for the operational experiment is shown in Fig. 9b. Comparing Fig. 9a, b to e, the tuned forecast agrees with the observations more than the operational forecasts. Figure 9c shows the posterior AOD field corresponding to the prior in Fig. 9a. Figure 9d shows the posterior corresponding to Fig. 9b. The same data (Fig. 9e) is assimilated into the 20 tuned and operational priors to obtain posteriors in Fig. 9c, d and hence these posteriors are similar. These posteriors are used as initial conditions to launch the next 24 h ensemble forecasts. These forecasts (priors) valid at 12:00 Z, 21 July are shown in panels a, b in Fig. 10. The satellite observations on 21 July 2009 are shown in Fig. 10c. The tuned forecast (Fig. 10a) matches better with the observations (Fig. 10c) 25 than does the operational forecast (Fig. 10b). Note that these tuned and operational forecasts used similar initial conditions in AOD, which are given by panels c, d of Fig. 9.
In spite of these similar initial conditions the operational forecast (prior) is different from the tuned forecast on 21 July with operational forecasts giving higher AOD values on Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | 21 July. Note that the same meteorology is used in both the operational and tuned experiments. The only difference between the tuned and operational experiments is the different maps of erodibility. Therefore the difference between these forecasts is due to different values in the erodibility maps. The lower values of AOD in tuned forecasts are attributable to lower values of erodibility in the tuned map (Fig. 8c) compared to the 5 operational map (Fig. 1a). Both on 20 and 21 July the tuned forecasts give lower values of AOD thus resulting in better verifications compared to the operational forecasts. The comparison of verifications over the 19 days is performed by using mean absolute error which is calculated as follows. At each particular grid point, for each 24 h lead time the absolute difference 10 between the mean operational forecast and the MODIS observation is calculated.
Then at each grid point, the average of ε op over different forecasts is the mean absolute error for the operational model. Similarly, the absolute difference between the mean tuned forecast (AOD tu ) and observation is calculated. 15 ε tu = |AOD tu − AOD obs | At each grid point, the average of ε tu over different forecasts is the mean absolute error for the tuned model. At each grid point, the operational and tuned mean absolute errors are used to calculate the metric difference mean absolute error (dMAE), This metric is a simple and convenient way to quantify the comparative performance of the operational and tuned models in forecasting the AOD, at each grid point. If dMAE > 0 it means that the operational model errs more than the tuned model in forecasting the AOD. If dMAE > 0 at a particular grid point then the tuned model outperforms the operational model. On the other hand if dMAE < 0 it means that the operational map 25 performs better at that grid point. The dMAE corresponding to the tuned map in Fig. 8c is shown in Fig. 11 with contours of high friction velocity overlaid. Figure 11a shows the dMAE calculated for the forecast lead time of 24 h. The tuned map outperforms the operational map largely in the Western Sahara and Arabia regions. The tuned map gives better forecasts than the operational map to some extent in the horn of Africa. In most of the other regions 5 the MAE is within −0.1 and +0.1 indicating that the tuned and operational forecast are almost similar. There are a couple of pockets near Central Sahara where the tuned map gives degraded performance. These areas are blue in color. Panels b, c in Fig. 11 show the dMAE for longer lead times of 48 and 96 h respectively. Comparing Fig. 11ac it is clear that broadly the pattern of areas where the tuned model outperforms the 10 operational model are similar for all lead times. However, comparing the red areas in the vicinity of point S in Fig. 11a, b the tuned model performs better over a larger region for the 96 h forecast compared to the 24 h forecast. Also the magnitude of improvement of the tuned model is higher for longer lead time in this area. This is also true in the Arabian peninsula. An important dMAE feature that develops with longer lead times is 15 in the vicinity of points O and W off the coast of Africa. The red color near point O in Fig. 11c indicates that the tuned model gives a better forecast at 96 h; whereas the tuned model is as good as the operational model in this area at 24 h. In Fig. 12 the relative performance of the tuned and operational models is further probed by inspecting the AOD forecasts at each of the marked points in Fig. 11. 20 The time series of AOD forecasts at point S are shown in Fig. 12a. See the legend in Fig. 12b. The black curve shows the AOD observations. The dashed green curve shows u * scaled by a factor of 5, during the verification period. Clearly, u * is above the threshold level of 0. exceeds the threshold value during the tuning and verification periods, respectively. So at this point out of the total number of cycles (28) in the tuning period, u * exceeds the threshold value 78 % of times. This point is an example of a grid point where u * is very strong both during the tuning and verification periods. Because the signal is strong during the tuning period this point is "tuned correctly", decreasing the value 5 from 0.32 to 0.04. The phrase "tuned correctly" should be carefully interpreted. We don't know the values of erodibility in nature. Because the (tuned) forecasted AOD matches well with the observations at this point, we draw the conclusion that the tuned value is correct. The dashed blue curve matches well with the observations while the operational forecast (blue curve) is too high. The low value of tuned AOD can be directly 10 attributed to the lower value of tuned erodibility at point S. Note that the tuned AOD not only has a smaller bias (with respect to the observations) compared to the operational AOD values but also a smaller standard deviation. In both the tuned and operational model the 96 h forecast is higher than the 24 h forecast. This suggests that there is some accumulation of dust over the 96 h. This accumulation seems to be more for 15 the operational than the tuned model as the separation between red and blue curves is larger for the operational model. This accumulation might be because in the operational model the production is more because of higher erodibility of 0.32 (compared to 0.04). The higher dMAE of 2.9 at 96 h compared to 2.2 at 24 h means that the operational model errs more (compared to the tuned model) at 96 h than at 24 h. Note that both the 20 operational and tuned forecasts follow the trend in the friction velocity (green curve). In Fig. 11, consider the white area to the lower right of point S, around the point marked P. The verification for this point is shown in Fig. 12b. At this point u * exceeds the threshold value for about half the time during both tuning (60 %) and verification (55 %) periods. This point has low value of erodibility in the operational map. Because 25 the operational values are "correct" the tuning methodology does not change this value much. The inference that these values are "correct" is drawn from the fact that at point P both operational and tuned models do (equally) well in predicting the observations. ACPD 12,2012 Ensemble filter based estimation of mesoscale parameters  Fig. 11 consider the point N near the Red Sea. The time series of AOD at this point in shown in Fig. 12c. The downward correction in tuned value from 0.18 to 0.03 is reflected in a bias correction in the AOD forecasts. Also the tuned forecasts has a lower standard deviation. Note that at this point u * is strong only for 17 % of time and yet it gets tuned correctly. This grid point implies the quality of tuning need not directly 5 depend only on the number of times u * exceeds the threshold value. It is possible that at a given grid point u * is strong only a couple of times and the covariance signal is enough to tune the point correctly. Both the operational and tuned models follow the trend in the friction velocity. Now consider the point M in the red sea (Fig. 11). The time series at this point is shown in Fig. 12d. The erodibility value at any location in the ocean is 0 in the operational map. In the tuning experiment the erodibility is set to zero at all points in the ocean. Hence the tuned value of erodibility is 0 at the point M. At this point the tuned model performs worse than the operational model in forecasting the AOD. All the AOD at this location is advected from upstream areas. It is not clear which areas are upstream of this point. It is possible that the land areas south of this 15 point are the upstream areas. These southern areas have higher values in the tuned map than in the operational map.
The case of point V located left of point N is interesting. At this point the tuned value degrades the forecast resulting in a dMAE of −0.5 (Fig. 12e). The tuned forecasts over estimates the observations. The tuned value (0.25) of erodibility is higher than 20 the operational value. In the vicinity of this area the u * is quite weak during the tuning period. It is possible that the tuning at this point is due to spurious covariances. Note that the u * is quite strong during the verification period.
The point U, close to point V, at the center of the domain has a positive skill score of 0.4 (Fig. 12f). Though the u * at his point is not strong during the tuning period, this 25 point lies close to the areas of strong u * . Therefore it gets tuned correctly to a value of 0.17 which bring the tuned forecast more in agreement with the observations.
The verification at the points marked O and W in Fig. 11, off the coast of Africa is shown in Fig. 12g and h, Fig. 12h, the observed AOD is quite high between day 9 and day 11. Both the tuned and operational models fail to capture this event at 24 h lead time. However the operational model performs marginally better than the tuned model at 96 h, giving an AOD forecast of 1.1 for day 9. This contributes to the dMAE of −0.1 at the lead time of 96 h. The inspection of these verifications (panels in Fig. 12 quite close to the 24 h forecast curve, for the tuned and operational models separately. This indicates that the local sources play the dominant role in deciding the AOD rather than the errors in the AOD initial conditions. Consider Fig. 11c, the red areas coincide with areas with high operational erodibility. In these areas (Western Sahara and Arabia) the operational α was corrected by the 20 tuning to a lower value. In the (white) areas other than Western Sahara and Arabia, both operational and tuned maps perform equally well. The improvement of forecasts in Western Sahara and Arabia is because tuning leads to a better model of dust generation, by decreasing the erodibility. However an improvement in the dust generation over the red areas does not impact the forecasts in the other areas. This means that 25 the effect of tuning is localized in space. This might be because of the model error in dust transport. Both the tuned and operational models used the same meteorology and therefore the same winds. These might be different from the winds in nature. Both the tuned and operational model suffers from the model error in meteorology. Because of ACPD 12,2012 Ensemble filter based estimation of mesoscale parameters Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | this model error in dust transport, the improved dust generation in the red areas might not necessarily improve dust forecasts in other areas. In this work meteorological variables are not estimated.
In almost all the panels in Fig. 12, the 96 h forecasts are higher than the 24 h forecasts, for the tuned and the operational model separately. The higher AOD at longer 5 lead time points to accumulation of AOD either from local production or from upstream transport. The higher AOD at 96 h explains the larger coverage of the red area in the Sahara in Fig. 12c compared to Fig. 12a. For example, the area near (17.5 • N,0), is white for the 24 h lead time, but is red for the 96 h forecast. This area has a weak friction velocity. Most probably the AOD in this area is coming mainly from the adjoining areas in the north and south which are rich in friction velocity. The tuned and operational model give similar AOD forecast for 24 h, but at 96 h the operational model gives a much higher AOD than the tuned model's 96 h forecast resulting in a positive dMAE at 96 h. Because both the models use the same meteorology and sinks, the higher operational AOD at 96 h is due to higher production in the upstream areas. This higher 15 production is due to the higher operational erodibility in the Sahara. The tuned and operational forecasts were compared to climatology and it was found that neither were able to outperform the climatology.

Conclusions and further work
In this work the EAKF with 40 ensemble members is successfully applied to the tuning 20 of spatially distributed erodibility parameters over the Sahara and Arabia. OSSE experiments showed that these parameters could be successfully tuned given observations of AOD if an appropriate correlation structure is applied to the initial perturbations of erodibility and a supporting localization radius is applied. A priori one does not know the appropriate length scale that should be used for the correlation structure. If an initial correlation structure is not imposed, the data assimilation does not induce a correlation among the erodibility perturbations. However, if a long correlation length scale is im- posed along with a long cutoff length scale then at each grid point the correlation length scale converges towards the appropriate correlation length scale. Various experiments with different long length scales were run. It is found that the appropriate length scale for this problem is 5 grid points which is about 400 km. The tuning experiment with a correlation length scale of 5 grid points resulted in the best tuning in OSSE. This 5 methodology of specifying a long length scale and allowing it to converge to the appropriate value is introduced in this work. It would be exciting to investigate whether this methodology of determining the appropriate length scale is effective in other parameter estimation problems. In the current problem the true parameter map is static in that it does not change with time. It would be interesting to test this methodology in a problem 10 where the true parameter map evolves in time.
We have not done tuning experiments with different resolutions. Hence it is not clear if the correlation length scale of 400 km will prove to be best length scale for this problem if a different resolution is used. We suspect that this length scale will converge to a particular value if the resolution is gradually increased. 15 It is very important to choose an appropriate value for the cutoff radius. The cutoff radius limits the area within which observations are used, thereby preventing the filter from using spurious covariances. If the cutoff radius is larger than the correlation length scale the tuning degrades. This is because then the update uses covariances from regions that are uncorrelated in erodibility. These covariances are potentially spu-20 rious because the covariances between AOD at different points is partly controlled by correlation between the erodibility perturbations at these points.
The tuning methodology is implemented with MODIS satellite data. In general, it is found that the operational model overestimates observed AOD. The ensemble based tuning correctly identifies this high bias and corrects the operational erodibility maps to forecasts. In the Western Sahara and Arabia areas the gain in forecast accuracy due to tuning is as high as 1.5 AOD units. In both the OSSE and experiments with satellite data the tuned maps are robust to changes in the initial guess, even with only 40 ensemble members. It is found that the tuned forecast does not outperform the climatology of AOD observations.

5
The friction velocity plays a central role in the generation of dust. The quality of tuning depends on whether the signal due to friction velocity rises above the noise due to advection of AOD from upstream regions. A weak signal could result in spurious tuning. The areas that are expected to be tuned are those with strong friction velocities during the tuning period. Therefore areas with low friction velocity during the tuning period 10 and high friction velocity during the verification period might not show a gain in forecast accuracy. A potential solution to this problem would be to run the tuning experiment for a longer period, say for a whole year so that almost every region experiences strong friction velocities. Then the tuned map can be verified for another year. This approach too has a potential problem. The true erodibility probably changes from year to year. 15 Therefore having different years for tuning and verification might not be a good idea.
The meteorological state is not estimated in this work. Meteorological observations could be assimilated to estimate this state which can potentially correct the errors in transport thereby improving the forecasts. Another potential set of parameters to be tuned are the sink of the dust.   Wang, X., Barker, D. M., Snyder, C., and Hamill, T. Fig. 5. The spatial distribution of the quality of tuning for this experiment is shown in Fig. 4a. (g) and (h) Mean and standard deviation of the tuned α map, respectively, for the tuning experiment with correlation length scale l = 20 and cutoff radius c = 20 described in Sect. 5. For this experiment the initial guess of α map is correlated over a length scale of 20 grid points (∼ 1620 km). The mean and spread of the initial guess for this experiment is shown in (c) and (d), respectively. The tuned map performs worse at recovering the perfect values (a) than the tuned map in (e) as shown by the red (squares) curve in Fig. 5. (i) and (j) Mean and standard deviation of the tuned α map, respectively, for the tuning experiment with correlation length scale l = 5 and cutoff radius c = 5 described in Sect. 5. The tuned map in (i) is more successful in recovering the perfect map in (a) than the tuned map in (g). This is because the erodibility is correlated over a length scale of 5 grid points. A length scale of 20 grid points is too long. The spatial distribution of quality of tuning for this experiment is shown   colored contours enclose areas with strong friction velocity (Figure 1(b)) . The red contours enclose areas with high erodibility (Figure 1(a)). Higher friction velocities gives rise to a stronger covariance signal which helps tuning. Areas enclosed by red contours have higher advective noise which impedes tuning.
(a) Shows the quality of tuning for the tuning experiment corresponding to Figure 1(e). The areas with strong friction velocity, for example S1, S2 and S3 tend to be tuned correctly. Some areas with weak friction velocities are marked W1, W2, W3 and W4. These tend to be tuned poorly. Out of the total number of grid points about 70% are (correctly) tuned in this experiment. See the dashed magenta curve in Figure 5. (b) Shows the quality of tuning for the tuning experiment corresponding to Figure 1(i). Many untuned grid points in areas of weak friction velocity in panel(a) are successfully tuned in this experiment. This is because correlating the erodibility at neighboring grid points leads to a decrease in advective noise. More importantly any given grid point uses more observations because of the longer cutoff compared to panel (a). About 85% of the total grid points are tuned correctly. See the solid red curve in Figure 5.  (Fig. 1b). The red contours enclose areas with high erodibility (Fig. 1a). Higher friction velocities gives rise to a stronger covariance signal which helps tuning. Areas enclosed by red contours have higher advective noise which impedes tuning. (a) Shows the quality of tuning for the tuning experiment corresponding to Fig. 1e. The areas with strong friction velocity, for example S1, S2 and S3 tend to be tuned correctly. Some areas with weak friction velocities are marked W1, W2, W3 and W4. These tend to be tuned poorly. Out of the total number of grid points about 70% are (correctly) tuned in this experiment. See the dashed magenta curve in Fig. 5. (b) Shows the quality of tuning for the tuning experiment corresponding to Fig. 1i. Many untuned grid points in areas of weak friction velocity in (a) are successfully tuned in this experiment. This is because correlating the erodibility at neighboring grid points leads to a decrease in advective noise. More importantly any given grid point uses more observations because of the longer cutoff compared to (a). About 85% of the total grid points are tuned correctly. See the solid red curve in Fig. 5   It is seen that as the update cycles proceed the correlation length scale converges towards 5 grid points at this grid point.
The localization function does not change with the update cycles and hence the dashed yellow curve is the same in all the panels.    (e) Satellite observations. The tuned prior (panel (a)) looks more similar to these observations than does the operational prior (panel(b)). (a) and (b) Priors using the tuned (Figure 8(c)) and the operational maps (Figure 1(a)) respectively. These are the means of the 24 hour ensemble forecast launched from the ensemble analysis on 19 July.

(c) and (d)
Posteriors obtained by assimilating satellite data into the priors shown in (a) and (b) respectively.
(e) Satellite observations. The tuned prior (panel (a)) looks more similar to these observations than does the operational prior (panel(b)).  (c) Satellite observations. The tuned forecast (panel(a)) matches better with the observations (panel(c)) than does the operational forecast (panel(b)). Even the though the initial conditions in AOD are similar on 20 July (Figure 9(c) and (d)) the operational forecast gives higher AOD values than the tuned forecast. This is because the tuned values (Figure 8(c)) of erodibility are smaller than the operational values (Figure 1(a)). (c) Satellite observations. The tuned forecast (a) matches better with the observations (c) than does the operational forecast (b). Even the though the initial conditions in AOD are similar on 20 July (Fig. 9c, d) the operational forecast gives higher AOD values than the tuned forecast. This is because the tuned values (Fig. 8c) of erodibility are smaller than the operational values (Fig. 1a).

28887
ACPD 12,2012 Ensemble filter based estimation of mesoscale parameters   (Figure 1(a)). In the white areas the tuned and operational maps perform equally well. The green contours enclose areas in which the friction velocity is strong over the tuning period. Some points are marked with letters. The verification time series at these points is shown in Figure 12.   Fig. 11. Result of the verification experiment. The color shows the difference mean absolute error (dMAE). Each panel is for a different lead time. Red colors are areas where the tuned map (Fig. 8c) verifies better compared to operational map (a). In the white areas the tuned and operational maps perform equally well. The green contours enclose areas in which the friction velocity is strong over the tuning period. Some points are marked with letters. The verification time series at these points is shown in Fig. 12