Est modus in rebus : analytical properties of multi-model ensembles

Est modus in rebus: analytical properties of multi-model ensembles S. Potempski and S. Galmarini European Commission – DG Joint Research Centre, Institute for Environment and Sustainability, 21020 Ispra VA, Italy Institute of Atomic Energy, 05-400 Otwock-Swierk, Poland Received: 3 March 2009 – Accepted: 23 May 2009 – Published: 1 July 2009 Correspondence to: S. Galmarini (stefano.galmarini@jrc.it) Published by Copernicus Publications on behalf of the European Geosciences Union.


Introduction
The use of ensemble techniques in atmospheric dispersion is becoming more and more a popular research topic as well as application.A large number of modeling communities opt for joining forces in a common multi-model effort to improve their results, thus moving from the "deterministic" approach typical of the 80ies and 90ies to the statistically based ensemble approach of the last decade.
As described by Galmarini et al., 2004a there are several ways in which an ensemble can be constructed either as a group of model runs produced by different modeling systems, or with one model and different input data or model settings.In this work we will mainly focus on multi-model ensembles in which models in principle have "nothing more" in common than the modeled case.Examples of these kind can be found in Stull et al. (1997), Krishnamurti et al. (2000), Dabbert and Miller (2000), Ziehmann (2000), Galmarini et al. (2001Galmarini et al. ( , 2004aGalmarini et al. ( , 2004bGalmarini et al. ( , 2008)), Delle Monache and Stull (2003), Killip et al. (2003), Vijaya Kumar et al. (2003), Correspondence to: S. Galmarini (stefano.galmarini@jrc.ec.europa.eu)Williford et al. (2003), McKeen et al. (2004), Mutemi et al. (2004), Yun et al. (2005), Delle Monache et al. (2006a), Mallet and Sportisse (2006), Van Loon et al. (2007), Vautard et al. (2009), Wang et al. (2009), Potempski et al. (2008).The multi-model ensemble technique constitutes yet the extreme application case of the ensemble technique and therefore it is worth attention.Ensemble weather prediction systems should serve as an example of techniques build out of a robust theory that relates to predictability and uncertainty.Approaches based on either singular vector or bred vectors, have been developed from that theoretical framework and are used in operational activities such as ECMWF and NCEP (Atger, 1997;Buizza and Palmer, 1995;Buizza, 1997;Buizza et al., 1999;Buizza et al., 1999;Molteni, 1996;Toth and Kalnay, 1993;1997;Kalnay, 2004 for more bibliography).
Regardless of the methodology selected it is rather indisputably recognized that the treatment of several model results produces an overall improvement of the quality of the model simulations when compared with measurements.It is our opinion, however, that the technique deserves a yet renewed attention by the scientific community that should attempt to define its boundaries of applicability by formal methods other than through examples and applications.An attempt in this direction has been made, recently, in Riccio et al. (2007) where the ensemble results of Galmarini et al. (2004b) were reproduced in a Bayesian context and a formal explanation was presented.
To date, several questions remain unanswered.Among them one that is constantly present in the case of ensemble dispersion modeling relates to the way in which the ensemble should be set up.In other words, which criteria should be adopted to guarantee that the ensemble results will always be superior to those of any individual member?Moreover, how should the members be selected?
It is extremely interesting to note that all the multi-model ensembles presented in the last decade or so in the literature, found their reasons of existence in the opportunity of joining Published by Copernicus Publications on behalf of the European Geosciences Union.
S. Potempski and S. Galmarini: Analytical properties of multi-model ensembles forces rather than in the analysis of the model characteristics, the specificity and peculiarity of their results, and the way that might have affected the ensemble.This interesting aspect of science sociology can be easily verified.The ensembles are made out of existing groups of scientists that get together offering their model results and having in return the added value of the ensemble.It feels good to be on one side or the other of the ensemble mean since it does not make a difference as long as the latter compares well with the measurements and no one is neither too near nor too far from it1 .No a priori investigation is known to have been conducted on the degree of kinship each model has had in the constituted community, the models independence and complementarity.Models are not selected for a specific common or exclusive characteristic they show, their results are just assumed to be appropriate for that specific application.In this context, the ensemble practice relies on what we could define phenotypical model distinction.Models differ for a limited number of modules, characteristics, or simply for the data used.The differentiation of a common model genotype occurs whenever the model is adopted or used in a specific modeling environment (modeling group or modeling application).In most of the cases the ensemble practice brings together models that show no substantial differences and that are available since they survived the natural selection of individual model evaluation.Another important element motivating ensemble model practices and, in particular the multi-model ensemble, is the relationship between multi-model ensemble result and scientific consensus.Examples in this sense are the last IPCC reports or to a much smaller scale and different contest, the ENSEMBLE activity (Galmarini et al., 2004a).The concurrence of different results, originating from different sources, to the determination of an ensemble is an optimal method to represent all available scientific evidences (including their variability) and a way to facilitate agreement around a synthetic and relatively comprehensive result.
Whatever motivates the choice of ensemble modeling in atmospheric dispersion and air quality, we feel like pointing out that no investigation has ever been published on the fundamental elements that define an ensemble of atmospheric transport and dispersion model results and on the theoretical requirements that define it.This paper is a humble attempt to give formulas and ways to identify a priori the ensemble characteristics and to try to give a formal definition to a number of aspects that have never been discussed in the context of atmospheric dispersion modeling.The main aim of this paper is to introduce some basic properties of multi-model ensemble systems, which can be deduced from general characteristics of statistical distributions of the ensemble members with the help of math-ematical analysis.We have identified a precise number of questions that we wish to address in this work, namely: 1. Is the ensemble average result always superior to that of individual members?
2. If one of the models has essentially a higher variance, should we remove it from the ensemble while calculating the ensemble average to minimize the ensemble variance?Under which conditions?
3. How should an average of the ensemble be modified in order to extract an optimal representation from all its members?
4. Is there a condition which guarantees that the variance of the ensemble mean is less than that of any individual model?
The paper is structured as follows: in the second section we present basic assumptions, and then the analysis for uncorrelated multi-model ensemble follows.In the fourth section we generalize the results to the case where the models should be considered as correlated ones.In Sect. 5 we summarize multi-dimensional case.The last section contains the conclusions.In the Appendix we include some technicalities related to multi-dimensional case.
An explanation is probably due on the title of this paper.The expression: "Est modus in rebus" is a verse from the Satire (1, 1, 106-107) by the Latin poet Quintus Horatius Flaccus (aka Horace).The sentence should be translated as: "There is an optimal condition in all things" which in the original text is followed by the sentence: "There are therefore precise boundaries beyond which one cannot find the right thing" (sunt certi denique fines | quos ultra citraque nequit consistere rectum).We think that these expressions summarize quite well the central topic of this paper and we hope it will result clearly in the proceedings.

The starting point
Our work starts from the article of Van Loon et al., 2007 (VL2007) in which a simple formula was presented to explain the advantage of the multi-model ensemble system for long-term ozone simulations in comparison with a single model approach.The formula proposed was obtained under the assumption that statistical distributions of model results and observations were identical and independent.More precisely, they considered daily ozone maxima in summertime determined by seven models and observations made in a number of stations.After bias corrections, the Talagrand diagram 2 created from model results became flatter, which allowed them to conclude that the actual ozone concentration has similar statistical properties as the ensemble members.This justified the assumption that both simulated and observed concentrations could be sampled randomly from identical distributions (i.e.their variances are all equal), which in turn led to the following formula for the mean square error (msqe): In Eq. ( 1) σ 2 and b stand for the variance and ensemble bias respectively, while m represents the number of ensemble members.Expression Eq. ( 1) shows the advantage of ensemble approach as it demonstrates that the msqe for the ensemble (after bias correction) is always less than that of any individual model (m =1).
In general the assumption on identical statistical distribution is obviously a simplification as the models differ in terms of used parameterizations or numerical concept not mentioning application of distinct meteorological data.Also in the case considered in VL2007, the Talagrand diagram although improved after bias correction still was not perfect.An analysis we conducted on 204 different realizations produced by four atmospheric dispersion models applied to the ETEX-1 case (Girardi et al., 1998), using ECMWF-Ensemble Prediction System (EPS) weather fields, demonstrates that the spreads of the results sometimes can be significantly different among the models (see Fig. 1), and as a consequence also the variances.
Regardless of these additional considerations, the simple formulation given in Eq. (1) illustrates very well and quite synthetically the basic idea of the multi-model ensemble approach.Starting from this, we would like to give a slightly deeper look at the analysis based on statistical characteristics of the mean square error and present more general mathematical formalism.We will start our consideration with the simplest one-dimensional case i.e. when the simulation results can be described by a scalar-valued random variable for each model (for instance when multi-model ensemble is applied at a single point in space and time), then we will provide analogous formulas for the multi-dimensional case i.e. when the problem is described by a vector-valued random variable.
First we introduce the notation and some relevant assumptions.The model results we want to focus on are those of a prediction of the concentration levels of an unspecified substance at a single point in space and time.Let us assume that by using a set of m atmospheric dispersion models we obtain m simulations of the concentration evolution at that point.We assume that at the same point measurements were by counting the number of measured values that fall in the corresponding bin.An even distribution of the measurements guarantees that the ensemble covers the spectrum of measurement values.Any deviation from that structure is an indication of biased ensemble behaviour.collected for the same variable.Both model data and observations are characterized by some estimation of variability or the error.For the models this can be done for example, by perturbing model parameters and some input data using Monte Carlo technique or through sensitivity analysis (e.g.Saltelli et al., 2006).The formal description of this situation implies that the values predicted by the models are represented by random variables x j , j = 1,. . ., m, where x j corresponds to data produced by model j ; each x j has statistical distributions characterized by probability density functions (pdf) with bias and variance, which we denote as b j and σ 2 j , respectively.Analogously we also assume that the observed values have some uncertainty characterized by random variable y with a pdf, which is described by the variance σ 2 o (we can treat the measurements as not biased).In this context we do not need to specify any particular form of the pdf neither for the models nor for the observations.What we need to know however, are biases and variances.
An important aspect that requires clarification from the beginning is the level of diversity, independence, or correlation that we expect each of the m models of the ensemble to exhibit as a priori condition.These concepts or definitions are most of the time ignored, or given for granted thus leaving every reader with his own interpretation and sometime misunderstanding.To avoid that, we define explicitly the conditions we want the ensemble members to satisfy.The first condition we could impose on the members is that the individual models are independent3 .In this sense we will need to specify whether we intend independence of the systems or of the results.Two models could be defined independent if they are structurally different, in other words if they are based on different modeling approaches or philosophies or if they are based on different parameterization of physical processes.At the same time they might be considered independent (or partially so) because they calculate atmospheric dispersion starting from meteorological fields originating from different weather models.In this context we will not distinguish between these two instances and we will refer to both cases indistinctively as independent models.More formally, the independence of two systems can be expressed by the independence of random variables i.e.: two variables z 1 and z 2 representing two models are independent whenever their joint probability can be calculated as a product of individual ones i.e. p(z 1 ,z 2 ) = p(z 1 )p(z 2 ).This condition is reasonable in the case in which z 1 is the result of a model and z 2 is a measurement, but it is not difficult to imagine that it will not apply necessarily to two models.In fact in this case the condition applies to all results extracted from the two pdfs and implies that there is not possibility of a synergic contribution of the two models to the same result.In general this is a condition that is difficult to satisfy and verify for any atmospheric model.We will therefore relax the independence condition, thus requiring that the members of our ensemble are un-correlated in the sense of un-correlated random variables.This is a more realistic assumption to the extent that it applies to the average behavior rather than the intrinsic properties of each model.The un-correlation is in fact defined as: E{z 1 z 2 } = E{z 1 }E{z 2 } where E{} is the expectation value.The un-correlation includes the VL(2007) condition of independence of identical pdfs but also the condition of different pdfs or partly overlapping ones, as depicted schematically in Fig. 2. Hence we have transferred the notion of correlation (or independency) from random variables to the models.We are aware of the fact this is not commonly used term in ensemble systems but this allows us to use precise mathematical formulations.
Any consideration that will be derived for un-correlated models case, will then be extended to the correlated models case, thus producing as result that whole model space will be covered.In fact by looking at Fig. 2 it is also clear that when we cover the correlated and uncorrelated model spaces we will include automatically also the dependent model class (dotted area) which is complementary to independent model class.

The case of an ensemble of uncorrelated model results
We assume that the results predicted by the m models can be represented in the ensemble form.We define the ensemble representative as any combination of model results in the form of an average or a generic linear combination of model results or median or any other percentile and that we denote by x.The ensemble value is also supposed to be independent of measurements as all the members of the ensemble.
Finally we use standard notation E(z) and V (z) for the expectation operator and variance one respectively of any random variable z.
Using the notation from Sect. 2 we can introduce the bias (b) and mean square error (shortly written as S 2 ) of the ensemble as: (2) Under these assumptions the following formula holds: Equation ( 3) is a direct consequence of the definition of bias and msqe, namely: Using well known properties of the variance and under the assumed condition of the models independence of observations, from Eq. (3) we obtain: At this stage we define x explicitly as a linear combination of ensemble members i.e.: It is also reasonable to assume that the coefficients α j are normalized i.e.
Since the models are uncorrelated expression Eq. ( 5) leads to the following equation: Please note that Eq. (1) derived in VL2007, is a very special case of Eq. ( 8), as it can be obtained from the latter by taking the mean (i.e.all α j =1/m) and assuming that all the variances to be equal, i.e.: σ 2 j = σ 2 o = σ 2 .More in general Eq. ( 8) offers us the possibility of finding optimal coefficients α j so that they minimize msqe.This can be transformed into the optimization problem.
Find α 1 ,...,α m such that: We then proceed by removing the bias from the model results first (i.e.introducing new variables: or by seeking for the solution of minimization problem in the form: β representing the bias of the ensemble.In the second case we would need an additional equation for β, which can be simply E(x ) = 0 (i.e.we postulate that x is not biased), and leads to the following solution for β: β = j α j b j .Please note that the variances of biased and non-biased random variables are the same: V (x) = V (x ).In one way or the other the bias can be easily removed.Therefore from now on we will consider that this operation was done a priori and that the models are all unbiased.The solution of problem Eq. ( 8) is equivalent to minimizing the following Lagrangian function: which leads to linear system of equations: 2σ 2 j α j − λ = 0 for j = 1,...,m (10) Equation ( 9) can be solved explicitly.By determining α j from the first m equations and applying the normalization condition Eq. ( 7) we get: As a matter of fact Eq. ( 10) corresponds to the minimization of the variance V (x), which is widely used in a number of different applications for a large spectrum of problems, like the optimal interpolation in meteorology or Kalman filter (Gandin, 1964;Talagrand, 1997;Kalnay, 2003 and references there).Equation ( 10) allows us to determine the optimal variance of the ensemble as the variance of the optimal linear combination i.e.: From Eq. ( 12), because of the minimization of the msqe, the optimal variance is always less than any individual model variance.In fact for any k = 1,...,m we have: which immediately implies that: For the case where all the individual variances are equal, Eq. ( 12) leads to V (x) = σ 2 m , corresponding to the formula used in VL2007.
In summary Eqs. ( 10) and ( 12) can be applied to multimodel ensemble systems to produce optimal linear combination of model results, which minimizes msqe.This answers questions (a) and (c) set in the introduction.
In the remaining part of this section we try to demonstrate some generally valid properties of multi-model ensemble systems based on the formalism we have introduced and derived.
The first question that can be raised is about other than optimal linear combinations of ensemble (like ensemble mean) which would have the property that the variance of the combination is always less than the variance of any individual models.In other words -if the optimal weights are not chosen, what can be said about the statistical properties of the ensemble mean?What conditions guarantee that the variance of the ensemble mean is lower than the lowest model's variance?
Let us consider the simplest case with two models of variances σ 2 1 ,σ 2 2 such that σ 2 2 = pσ 2 1 for some p > 1.The combined variance is equal to: For example for α 1 = α 2 =1/2 (i.e. the mean of ensemble) this produces the condition p <3.This suggests that in case of models with different variances it would be better if the difference between them is not too large, otherwise the combination described above would not produce an ensemble variance smaller than that of the individual models.
In general the following implication holds: where x m represents the ensemble mean and we assume that we were able to enumerate models in the ascending order of their variances i.e. σ 2 1 ≤ σ 2 2 ≤ ... ≤ σ 2 m .Proof : Indeed Eq. ( 12) implies that: and thus: A question that follows from Eq. ( 12) is whether it is possible to define a priori limits of variability of the ensemble variance.Since σ 2 1 ≤ σ 2 j ≤ σ 2 m for j = 1,...,m we have and therefore the optimal ensemble variance will range from: Equations ( 12) and ( 13) show that it would be preferable to aggregate models whose individual variances are not very different (i.e.their relative ratio is close to 1).If it could be guessed that a model has a variance very large and different from the others, than it should be preferable to exclude it from the ensemble mean.However, when models' variances are known, there is no need to exclude any model, since the optimization constraints, given by Eqs. ( 10) and ( 11), assure that a small weight is assigned to a model with a large variance, and the optimal ensemble average variance is always lower than the lowest model variance.
Moreover, if a model were excluded, the optimal variance calculated by m-1 models would be greater than the optimal variance calculated by m models, as shown by the following inequality: The latter shows that σ 2 m can be large but its contribution to the optimal representation will be very small (i.e.α m → 0 as σ 2 m → ∞).In other words even a model (or models) with a huge variance cannot deteriorate the ensemble result if an optimal combination of model results is taken as ensemble representative.To corroborate this conclusion we can see that by combining inequality Eq. ( 14) with the fact that the optimal variance is always less than any individual model variance we get finally the following estimation: This shows that adding "a bad model" (i.e. the model with big variance) does not necessarily makes the estimation worse as msqe is bounded by the smallest individual variance anyway -hence it answers question (b) in the introduction.By the way Eq.( 14) is also valid for the variance of the ensemble mean, but as shown above there is no guarantee that the individual model does not produce smaller msqe than the mean of the ensemble.
On the other hand a big difference between highest and lowest variances indicates that there is no agreement among the models in estimating uncertainty, so the ratio σ 2 m σ 2 1 can be used as an indicator of the coherence of the multi-model ensemble simulations.If this ratio is close to 1 and the model predicted values are also close then there is a very good agreement within the ensemble.
At the end of this section we would like to add some comments related to the other possible way for obtaining weights Eq. ( 10).Namely this can be achieved by using the maximum likelihood principle (see for example Kalnay, 2003;Sasaki, 1969;Parrish and Derber, 1992;Lorenc, 1986).Let us assume that x represents the truth and that conditional probability distributions are given by Gaussian pdf i.e.: Then the likelihood of x being the truth is given by the following formula: Hence the most likely value of x can be found by maximization of the likelihood function x → L(x|x 1 ,...,x m ), which after taking logarithm and neglecting constant terms leads to the minimization of the so called cost function: Then the solution of this problem is given by Eqs. ( 6) and ( 10).The difference between this approach and preceding one is that here we assumed explicitly Gaussian distribution, while previously we did not take any particular assumption on pdf.On the other hand the minimization of the cost function is with respect to x not to parameters α j , hence it shows that for Gaussian pdf appropriate linear combination produces an optimal solution.It can be also mentioned that the same cost function can be obtained using Bayesian interpretation (Edwards, 1972;Kalnay, 2003).

What if the models are correlated?
While the formulas of the optimal combination for independent models correspond to the optimal interpolation in meteorology and therefore are generally well known, and have been already applied in a number of completely different areas (anywhere where independent measurements are considered), in this section we intend to extend the results to a more complicated situation, where the models cannot be no longer treated as uncorrelated ones.In particular we would like to derive analogous formulas for the optimal linear combination of multi-model results and variance.While this case might be perceived as an academic exercise, it is however a more realistic representation of the behavior of atmospheric dispersion models ensembles.We want to verify if properties analogous to those derived for the uncorrelated multi-model ensemble can be also obtained for the correlated case.
Let us consider the problem of minimizing msqe under the assumption that the statistical distributions of model results are not necessarily uncorrelated.If the ensemble is represented by a linear combination of the models results (according to the Eqs.6-7) then the formula for the ensemble variance is as follows: where Cov(x i ,x j ) stands for the covariance of random variables x i and x j ; we use the notation that σ 2 i = Cov(x i ,x i ), where indices i, j correspond to model numbers.
Then the minimization problem Eq. ( 8) can be reformulated as follows where, as in the previous case, we consider the unbiased case: Find α 1 ,...,α m such that: This leads to the system of linear equations that can be written in a block form as: where K is the covariance matrix of dimension m•m (its ij -th element is Cov(x i , x j )) and α is a vector of coefficients (α i ).
If we use notation (•,•) for a dot product and define vector l as l = [1,...,1] T this gives the equations: Actually this system solves the minimization problem for the quadratic form (Kα, α) with the normalization condition Eq. ( 7) -in fact V (x) = (Kα,α).This is also true for uncorrelated models case, in which the matrix K contains only variances as diagonal elements i.e.: K = diag(σ 2 1 ,...,σ 2 m ).To facilitate the treatment of the equations we take advantage of the fact that the covariance matrix is symmetric thus allowing us to apply the spectral theorem to write it in the following form: K = USU * (Strang, 2003), where U is a unitary matrix (i.e. the columns are orthonormal vectors, which means that they are mutually orthogonal and have norm 1 i.e. || • || 2 = (•,•) = 1; the same is true for the rows) and S is a diagonal matrix containing eigenvalues of the matrix K.It should be added that the columns of U are eigenvectors of the matrix K and U −1 = U * .The asterisk usually denotes generally adjoint operator i.e. complex conjugate and transposition defined by the following relation: (Uu,v) = (u,U * v) -since our case is real, U * = U T .The unitary matrix preserves also dot product i.e. (Uu,Uv) = (u,v).
Equation ( 15) can be also solved explicitly -we can repeat a similar procedure as used before to find that: where we have assumed that there is no zero eigenvalue of the matrix K.It is known that the covariance matrix K is nonnegative (Feller, 1968), so this also implies that all eigenvalues must be positive.In fact the case of zero eigenvalue would correspond to the situation where the model has zero variance, which means that the non-biased model is ideal (i.e. it does not produce errors).
www , where e j is the j -th versor (e j = [0,...,1,...0] T with 1 only on the j -th position) the denominator of Eq. ( 18) can be rewritten in terms of the elements of the matrix U = (u ij ) as follows: In comparison with the Eq. ( 12) an additional term appears (as an effect of correlation), while eigenvalues s j play the role of variances.Thus in the remaining part of this section we investigate properties of multi-model ensemble similarly as for uncorrelated models case to verify whether already obtained results can be extended by transforming the variance into the eigenvalues.First let us assume that all the eigenvalues of the covariance matrix are equal (s 1 = ... = s m = s), which for uncorrelated models case corresponds to the situation with all the variances being the same.From Eq. ( 18) we can easily conclude that: which is in accordance with the uncorrelated models case (of course all the weights α k will be all equal).Consider now the case of the ensemble mean (i.e.α = 1 m l).The implication analogous to Eq. ( 12) can be formulated as follows: If s m s 1 ≤ m, then V (x m ) ≤ s 1 ≤ ... ≤ s m , where x represents the mean of ensemble (we assume that eigenvalues are ordered such that s 1 ≤ ... ≤ s m ).
Proof.This can be easily verified because of: = m.In comparison with uncorrelated models case we see that the condition above is slightly more restrictive ( s m s 1 ≤ m versus , which is the effect of taking into account correlation terms.This condition cannot be changed to s m s 1 ≤ m + 1 as the following example shows.Consider the covariance matrix: , where ε >0 is a parameter.Then one can calculate that: we have 2 < s 2 s 1 ≤ 3 i.e. s m s 1 ≤ m+1 but the condition s m s 1 ≤ m is not satisfied (by the way V (x opt ) ≤ 1).Modifying this example by letting ε → − 1 2 and putting 3 as the second variance (instead of 2) one can conclude that slightly correlated models can have better bound than m.However, there is no easy way to find a general analytical expression which would be a continuous relation between correlated and uncorrelated cases.
Finally also in the correlated case general bounds for the optimal variance can be obtained.Namely the following estimations hold: and analogously: Hence we get equivalent estimations for the optimal variance as for uncorrelated models case.
In such a way we can conclude that the following estimations for the optimal variance are true: In summary we can say that we have obtained similar results as for uncorrelated models case but the variances of models have been replaced by the eigenvalues of the covariance matrix K.
An interesting point relates to the fact that taking into account correlation one may improve msqe.As an example let us consider the following covariance matrix: p 2 −2a+1 , while for a = 0 (i.e.uncorrelated models case) we have: V uncor opt = p 2 1+p 2 .It can be easily checked that V corr opt < V uncor opt for a ∈ ( 2p2 1+p 2 ,p).This example shows that if we consider two different ensembles: the first one consisting of two uncorrelated models with variances σ 2 1 , σ 2 2 , and the second with two correlated models with the same variances σ 2 1 , σ 2 2 , then there are conditions for which the second system can produce lower mean square error than the first one.
At the end of this section we want to add that there is another way to obtain the formulas for the solution of the minimization problem Eq. ( 14), namely, by applying the spectral decomposition of the matrix K (Strang, 2003).This means that by considering K as a linear operator we have: where ϕ j are eigenvectors of matrix K (hence ϕ j = Ue j ) forming the orthonormal basis.Using the same method as previously one can obtain the formula equivalent to Eq. ( 18): .

Summary of multi-dimensional case
In this section we provide a summary on multi-dimensional case -all the technical details are given in the Appendix.By multi-dimensional (or multivariate in general) case we consider the situation when the results of the simulations can be described by a vector-valued random variable.Typically this is the situation when we have a simulation domain with a number of spatial-temporal points and we want to include correlation between them.Another possibility can be a multivariate case when different variables are taken into account, like concentrations of various species.In fact the main difference between one-and multi-dimensional cases lies in taking into consideration correlation among different points or variables.We are therefore adding an additional level of complexity to the cases analysed so far.
Analogously to the one-dimensional case we take random vector X as the representative of the ensemble defined as a linear combination of random vectors X j = [x 1j ,...,x nj ] T representing multi-dimensional distribution for the model j : where m is the number of models and n denotes the dimension of the random vectors (for example number of points in some area).As previously we assume that the weights α ij are normalized i.e.
We assume that the models are independent of the observations represented by random vector Y = [y 1 ,...,y n ] T .We consider non-biased case as a similar procedure as for onedimensional case can be also applied.
The first question is how to extend the definition of the square error.It seems that a natural way is the following one: The mean square error can be expressed then as the average over all points i.e.: msqe=S (1) 2 /n.First we assume that the models are mutually uncorrelated.Then the problem of finding optimal coefficients which minimizes Eq. ( 21) leads to the minimization of the following Lagrange function (we can omit observation term): L(α 11 ,...,α 1m ,...,α n1 ,...,α nm ,λ 1 ,...,λ n ) It can be easily seen that in such a way we obtain n separated systems of linear equations of the form Eq. ( 9) and therefore the Eqs.( 10) and ( 12) can be applied for each point separately (for i = 1,..,n).However, if we want to include also correlations between points (i.e. to consider the situation when the distributions x 1j ,...,x nj are correlated for any j ), it seems that we should use the expression E{(X − Y )(X − Y ) T } representing the covariance matrix.This upon the assumptions on independence between models and measurements leads to E{XX T } i.e. n × n matrix: Then as a generalization of Eq. ( 21) we use the following formula: which corresponds to taking into account all the elements of the matrix Eq. ( 22).By msqe we put the average over all points i.e.: msqe=S two terms: the first one related to the covariance matrix for the models (V (X)) and the second one to the observations (V (Y )).It can be easily noticed that the previous Eq.( 21) corresponds to taking into account only diagonal elements of the matrix Eq. ( 22).The matrix Eq. ( 22) should be distinguished from the covariance matrix introduced in Sect. 4. The latter one describes correlation between different models at one point while Eq. ( 22) defines the correlation between number of points for a linear combination of model results.
It should be also added that due to well known property of covariance (Feller, 1968): saying that off-diagonal elements are bounded by diagonal ones, Eq. ( 21) can be used to estimate upper bound of Eq. ( 23).
An extension to the correlated case is straightforward -we generalize Eq. ( 23) by including also terms related to the correlations between the models apart from already considered the correlations between the points.Then by the generalized mean square error we put msqe=S 2 is expressed as follows: It can be observed that in both cases Eqs. ( 22) and ( 23) the first term of the formulas can be expressed as: V (X) = (Kα,α), where: -for uncorrelated case K is a block diagonal matrix: K = diag(C 1 ,...,C m ), where Using this notation we can obtain the formulas for optimal weights and covariance matrix shown in Table 1.
For both cases we can obtain similar as in one-dimensional case ensemble properties, namely: where by min σ (K) and max σ (K) we denote minimal and maximal elements of the spectrum σ (K) of the covariance matrix K, respectively i.e. the minimal and maximal eigenvalues.
where s (j ) i , i=1, . . ., n; j =1, . . ., m are the eigenvalues of K according to the block notation and put in the increasing order i.e.: s , where σ (K) is the spectrum of the matrix K and X m is the ensemble mean.
We can see that we have analogous a priori estimations as in one dimensional case -optimal msqe is always bounded by minimal and maximal eigenvalues divided by the number of models.Similarly optimal msqe is always bounded by the maximal eigenvalue of the best individual model.And finally the ratio between highest and lowest eigenvalues can be used to find the condition when msqe for the ensemble mean is less than the one produced by any individual model.

Conclusions
In this study, by means of analytical formulation we have tried to fix some aspects never presented before, regarding the relationship between statistical behaviour of ensemble members and related expectations of the ensemble.The considerations presented here have been deduced having in mind the well known and extensively applied practice of ensemble dispersion modeling.
The results obtained show the importance of the knowledge of bias and variance of the statistical distributions for the models used in multi-model ensemble systems and how useful this information can be in the definition of the ensemble characteristics and in guaranteeing that the behaviour of the ensemble will fulfill the expectations.The results apply to both categories of correlated and uncorrelated models (or model results) filling a whole model space and can be summarized as follows: -By choosing appropriate combination of model results we can find an optimal representative of the ensemble that after bias correction minimizes the mean square error.This is equivalent to the minimization of the quadratic form defined by the covariance matrix with normalization condition.In fact the mean square error is expressed in terms of quadratic form determined by the covariance matrix.
Table 1.Formulas in multi-dimensional case.

Uncorrelated case Correlated case
Optimal weights where C (−1) ij are sub-matrices of -Some general a priori estimations for the optimal variance and msqe have been obtained, which show that multi-model ensemble has clear advantages in comparison to one model approach.This is expressed by the analytical formula demonstrating that msqe is bounded by the maximal variance or eigenvalue of the covariance matrix divided by the number of models.Similarly the lower bound of msqe is determined by the minimal variance or eigenvalue also divided by the number of models.If we assume that all the variances or eigenvalues of the appropriate covariance matrices are uniformly bounded for all the models then a priori estimations show that asymptotic behavior of optimal msqe is O(1/m) when m → ∞.It can be also seen that putting the models in ascending order (with respect to variances or eigenvalues of covariance matrix) determines the lowest possible location of the starting point of asymptotic curve.This curve is "idealized" as in reality no precise information is provided on model biases and variances and adding new ensemble member can also increase the smaller or higher variance.Hence a real curve will be shifted up accordingly to the Eq. ( 5) and slightly deformed.However, a priori estimations remain true regardless of our knowledge of models' variances and biases.For any other than optimal ensemble representation we can expect some deviation from the "ideal curve" and this also shows, to some extent of course, how far we are from optimal combination.In fact we can simply say that other linear representations cannot behave better than the optimal one.
-We have devised the condition under which the mean of the ensemble still gives more accurate results in the sense of the minimization of msqe, than any individual model.This condition is expressed in the terms of the ratio between highest and lowest variances or eigenvalues of the appropriate covariance matrix.When the condition is not fulfilled, one can consider removing the responsible member from the ensemble.However in the sense of msqe, the ensemble results in principle cannot be deteriorated even by a model with a big variance if the optimal combination of models results is taken as a representative of the ensemble.It should be considered that eliminating a result from an ensemble is not as easy practice (especially for predictions) since there is no way to recognize when a single model is wrong or the ensemble is wrong or the case has a low predictability.
-If there is nothing wrong with any model then the ratio between the highest and lowest variances or eigenvalues can be considered as an indicator of the coherence of the multi-model ensemble.The biggest the ratio is, the highest disagreement among the models in estimating the uncertainty.In particular when this ratio is greater than the number of models this indicates that the ensemble mean may be worse than the best single models and special attention should be paid to take optimal representative of the ensemble.In this sense there is a relation between the coherence of multi-model ensemble and the applicability of the ensemble mean, and this relation can be expressed simply as the ratio between www.atmos-chem-phys.net/9/9471/2009/Atmos.Chem.Phys., 9, 9471-9489, 2009 the biggest and smallest eigenvalues of the appropriate covariance matrix.Thus the knowledge of the maximal and minimum eigenvalues has also practical meaning.
When there is no big difference between them one can guarantee good coherence among models.
-We have also demonstrated that the same properties of multi-model ensemble are valid in the most general case in which both correlations between the models and between the points or variables are taken into account.
Although it is out of the scope of the paper, we would like to add that it can be easily proved that the same formulas for the optimal covariance can also be applied in Kalman filter procedure to find optimal solution both for the gain matrix and ensemble representation at a time.This means that instead of using ensemble mean it is better to take optimal combination of models results accordingly to formulas shown in the paper.
The analysis produced points quite clearly toward the fact that one should acquire both the bias and variance of each ensemble member (e.g.Delle Monache and Stuhl, 2003;Mallet and Sportisse, 2006;McKeen et al., 2005;Pagowski et al., 2005).To estimate the variance three approaches can be used: 1. Some of the models have built-in features to calculate variability of their results (e.g.Dabbert and Miller, 2000;Draxler, 2001;Stohl, 2005).It can be done by incorporating a kind of Monte Carlo simulations into the models for example by perturbing some crucial parameters.
2. More advanced approaches could be based by using meteorological data from Ensemble Prediction Systems (EPS), for example the ones available at ECMWF or NCEP.Additionally one can perturb initial data or model parameters.
3. A general approach based on sensitivity analysis in principle can be also applied (e.g.Saltelli, 2002;Hanson and Hemez, 2004;Saltelli et al., 2006;Campolongo et al., 2007) All these approaches may not cover all possibly varying aspects of the model.However they are existing and applicable methods that can be used to produce yet useful information on the model variance.Of course the estimation of the variance can make simulation times much longer, in particular if it is to be based on EPS data.Bias correction methods have been already applied, in particular in air quality problems where there is enough amount of measurement data (Delle Monache et al., 2006;Delle Monache et al., 2008;Wilczak et al., 2006;Zupanski et al., 2007).
The estimation of correlation between the models can be based on various statistical tests (Lehman, 1986) for example, using Pearson correlation coefficient (see Rodgers and Nicewander, 1988 for an overview of different approaches).It may require however representative statistical material, which can be acquired from long term studies.
It should be also added that if individual models pdfs are known we can combine them using optimal weights to calculate ensemble pdf.This is in accordance with the general concept of applying the ensemble approach (Dabbert and Miller, 2000;Galmarini et al., 2004;Stull et al., 1997;Riccio et al., 2007;Potempski et al., 2009) to perform predictions in order to rely on stochastic paradigm rather than deterministic one.
Although mathematical framework used in the paper is not very sophisticated it shows how some basic results can be obtained.We wanted also to demonstrate that a formal mathematical approach can be useful to obtain general properties of the multi-model ensemble systems.It should be also added that a similar kind of analysis can be made for other than msqe metrics like the maximum norm.Obviously the estimations we presented can be applied to any multi-model ensemble system, not necessarily related to atmospheric dispersion models.
The real open issue still remains however, namely the connection between ensemble coherence or agreement among the models which as we have seen can be predicted quite nicely with the analysis above, and uncertainty dealing with the relationship between the ensemble and observational data.The two are not coinciding concepts, and they are extremely relevant especially in the case of atmospheric dispersion ensemble forecast.In some cases the confusion between the two and an excessive confidence in the ensemble coherence as proxy of good result can lead to unwanted consequences, nicely summarized already in 1568 by P. Bruegel The Elder in the painting presented in Fig. 3.We will try to address also this aspect in the formal way in the future.
Theoretical framing of practices is what we feel mostly needed at this stage of development of the ensemble dispersion technique and activities.

A1 Uncorrelated case
First we consider the problem of finding coefficients α ij which minimize expression Eq. ( 23) similarly as in onedimensional case.This leads to the minimization of the following Lagrange function (observation term can be omitted): L(α 11 ,...,α 1m ,...,α n1 ,...,α nm ,λ 1 ,...,λ n ) where Cov kl (X) denotes (k,l) element of the covariance matrix Eq. ( 22) i.e.: α kj α lj E{x kj x lj } Then after differentiating Eq. ( A1) with respect to all coefficients λ i and α rs we get the normalization Eq. ( 19) and: α ks E{x rs x ks } − λ r for r = 1,..,n and, where the first index shows point number and the second one model number.This system is not separable for each point but it can be written in a more convenient way, namely by grouping Eq. (A2) for each model we get: where α j = [α 1j ,...,α nj ] T for j = 1, . . ., m, λ = [λ 1 ,...,λ n ] T , l = [1,...,1] T and C j is the covariance matrix of n × n dimension for the model j : In fact the system Eq.(A3) solves the minimization problem for the quadratic form (Kα,α) with the condition Eq. ( 19) (analogous to Eq. 7), where K is a block diagonal matrix: K = diag(C 1 ,...,C m ) and α = [α 1 ,...,α m ] T .One can note that V (X) = (Kα,α), and this relation justifies the generalized formula of msqe Eq. ( 23) as it is natural extension of one-dimensional case.From Eq. (A3) we can observe that the equations are coupled only via Lagrangian multipliers and normalization equation.Then again we can find explicit formulas for the optimal weights -namely as: from the last equation of Eq. (A3) we have λ = This formula is a natural extension of Eq. ( 10) for multidimensional case.
In order to obtain the optimal formula for the first term of the square error we rewrite it in the following form: Then by applying Eq. (A5) we get: where which leads finally to the following formula: Hence we have obtained similar expression as for correlated models case Eq. ( 18), however with different covariance matrix.The matrix C defined by Eq. (A7) can be considered as the optimal covariance matrix.Before starting examination of the properties of the optimal msqe we would like to add that Eq. (A7) has also Bayesian interpretation.Namely, if we consider for example case m=2 and assume that X 1 , X 2 have Gaussian pdfs provided that X t is the truth (i.e.X 1 |X t ∼ N(X t ,C 1 ) and X 2 |X t ∼ N (X t ,C 2 )), then the pdf of X t |X 1 , X 2 is also Gaussian of the distribution N(X,C), where C −1 = C −1 1 + C −1 2 (Riccio, Giunta, Galmarini, 2007).We start investigating the properties of the optimal combination of model results from the simple situation where all sub-matrices C j are the same (i.e.all the models have the same distributions).Hence C j =C for all j , and from spectral representation we get: If we denote eigenvalues of C as s 1 ≤ ... ≤ s n then by applying the same technique as in section 3 (note that ||l|| 2 = n) we get similar estimations: s 1 m ≤ V (X) n ≤ s n m .Let us consider now the general case and use the spectral representation for each C j : C j = U j S j U * j , where U j and S j are the unitary and diagonal matrices respectively.For the sake of convenience we put eigenvalues of C j (i.e.elements of S j ) in the increasing order i.e.: s l, then (denoting by (v) i the i-th element of any vector v) we have: Hence by Schwarz inequality: , and finally we get the following estimation: To prove lower bound for optimal msqe we proceed in the following way -for any α = [α 1 ,...,α m ] T we have: where the vector α fulfils normalization condition Eq. ( 19).
In order to find lower bound we minimize the norm i.e.
Taking into account condition Eq. ( 19) this can be done by minimizing inner sum separately for each i = 1,..,n.It can be easily found that the minimum is reached at point α ij = 1 m for any i,j ; which leads to the following inequality: 1 m 2 = n m , true for any vector α satisfying Eq. ( 19).Hence finally we obtain: 1 , which gives: m .
In such a way we have shown that also for multidimensional case the same estimation is valid as for onedimensional case.
The other generally valid estimation is: Proof.This can be proved in the following way: as the weights α are chosen to minimize lagrangian function then for any j the following inequality holds: so taking minimum over j we get the estimation.
Combining this with previous estimation we have: Let us finally consider the case where the average is taken as the representative of the ensemble, which means that for any i, j , k we have: Then the following implication holds: 1 , where X m is the ensemble mean.
Proof.This is a simple consequence of the following estimation: Hence as for one-dimensional case the condition that ensemble mean produces lower msqe than any single model can be expressed by the ratio between highest and lowest eingevalues of the covariance matrix.
Finally we can add that some other estimations can be proved in a similar way like the following one: 1 , where X m is the ensemble mean.

A2 Correlated case
For correlated case in order to simplify calculations we will use block matrix notation.Let C ij (i,j = 1,...,m) be n × n matrix expressing dependence between two models i and j at all n points (strictly -how model i is correlated to model j ): Obviously we have C T ii = C ii and C T ij = C j i for j = i.Please note that strictly speaking the matrix C ij (except of C ii ) should not be treated as the covariance (or correlation) one (as it is not symmetric).In fact we ought to speak rather about the pair C ij and C j i as these both matrices describe fully mutual correlation between two models i and j .
If we introduce the matrix K of the dimension mn×mn as a block matrix: and the vectors: α j = [α 1j ,...,α nj ] T , for j = 1,...,m, α = [α 1 ,...,α m ] T (of dimension mn) then for the first term of Eq. ( 24) we get: The matrix K fully describes two types of correlations: between points and models, so it generalizes previously used covariance matrices -it has all the required properties.First of all due to the mentioned above properties of matrices C ij the matrix K is symmetric.
Secondly the matrix K is also positive semi-definite i.e. (Kv,v) ≥ 0 for any vector v.
Proof.To prove it, consider auxiliary random variables: α ij x ij for i = 1,...,n, and Then the variance of Z is equal to the first term of the Eq. ( 24) i.e.V (X).According to Eq. (A11) for any α we have: (Kα,α)= V (Z) ≥0 and as the variance is always nonnegative, this completes the proof.
This proof also shows that the first term of generalised msqe (i.e.V (X)) can be defined as an averaged variance of the sum of linear combinations of multi-model results, taken over all the points (this to some extent justifies also used notation).On the other hand it is represented by the quadratic form (Kα,α), which corresponds to the previously considered cases.
If we use as previously vectors: λ = [λ 1 ,...,λ n ] T and l = [1,...,1] T then the problem of finding coefficients minimizing Eq. ( 24) (or Eq.A11) can be solved by using the lagrangian function of the following form: Minimization with respect to all the elements of the vectors α and λ would lead to nm + n equations.We simplify it by operating on the vectors α j and λ and by calculating Gaetaux derivative with respect to all α k along some vector δ: As the above equation is valid for any vector δ then we get the following system of the equations: Please note that the matrix I T R K −1 I R has dimension n × n (so the inverse operator has sense), but the matrix K is of mn × mn dimension.By C we denote (I T R K −1 I R ) −1 .Then we can calculate the optimal first term of Eq. ( 24) V (X opt ) as: The formula is similar to the previous ones -if we write the inverse matrix K −1 in the block form as: Thus the Eq.(A15) generalizes previously obtained expressions for the optimal covariance.In fact the multidimensional case with uncorrelated models corresponds to the situation where all the off-diagonal sub-matrices C ij of the matrix K vanish.Then the inverse of the matrix K can be done block by block for matrices C ii , which corresponds to the Eq.(A7).On the other hand if we consider onedimensional case with correlated models then the dimension of matrices C ij is 1×1, so they become single elements of the matrix K −1 from Sect. 3, and the Eq.(A14) leads to Eq. ( 18).
The rest of the section will be devoted to prove similar estimations as previously for simpler cases.
We start with a general estimation for V (X opt ).The analogous estimations as before is valid: where σ (K) is the spectrum of the matrix K defined by Eq. ( 33) while min σ (K) and max σ (K) represent minimal and maximal eigenvalues respectively.
Proof.First we write the matrix I T R K −1 I R using spectral decomposition of the matrix K=USU * .We apply block notation i.e.Let w = Cl and denote eigenvalues of S according to block notation and put into the such an order that s Similarly as for uncorrelated models case one can also find the lower bound i.e.: V (X opt ) n m .This can be proved in the same way as before since the following inequality holds for any vector α: , saying that msqe is bounded also by the maximal eigenvalue obtained from the best single model.Now let us consider the case where the average is taken as the representative of the ensemble, which means that we have: α = α 1 = ... = α m = [ 1 m ,...., 1 m ] T .Let us assume that the matrix K has eigenvalues s 1 ≤ ... ≤ s N , where N = mn.Then we have: where X m is the ensemble mean.
In such a way we have obtained analogous results as for uncorrelated models case.

Fig. 3 .
Fig. 3. "The parable of the blind leading the blind", by Peter Bruegle The Elder (1568), courtesy of the Museo e Gallerie Nazionali di Capodimonte, Naples.
...,m and k = 1,...,n − 1.This leads to the following expressions:(Cl,l) = (w,I T R K −1 I R w) = of UU * = I and I T R I R = mI.Then as previously using Schwarz inequality we can get similar estimation:
high eigenvalue msqe is still bounded by the maximal eigenvalue obtained from the best single model.