Quantification of the modelling uncertainties in atmospheric release source assessment and application to the reconstruction of the autumn 2017 Ruthenium 106 source

Using a Bayesian framework in the inverse problem of estimating the source of an atmospheric release of a pollutant has proven fruitful in recent years. Through Markov chain Monte Carlo (MCMC) algorithms, the statistical distribution of the release parameters such as the location, the duration, and the magnitude as well as the likelihood covariances can be sampled so as to get a complete characterisation of the source. In this study, several approaches are described and applied to improve 5 on these distributions, and therefore to get a better representation of the uncertainties. First, a method based on ensemble forecasting is proposed : physical parameters of both the meteorological fields and the transport model are perturbed to create an enhanced ensemble. In order to account for model errors, the importance of ensemble members are represented by weights and sampled together with the other variables of the source. Secondly, the choice of the statistical likelihood is shown to alter the nuclear source assessment, and several suited distributions for the errors are advised. Finally, two advanced designs of the 10 covariance matrix associated to the observation error are proposed. These methods are applied to the case of the detection of Ruthenium 106 of unknown origin in Europe in autumn 2017. A posteriori distributions meant to identify the origin of the release, to assess the source term, to quantify the uncertainties associated to the observations and the model, as well as densities of the weights of the perturbed ensemble, are presented.

The shape of the posterior distribution strongly depends on the uncertainties related to the modelling choices and the data. 55 The objective of this study is to investigate the various sources of uncertainties compounding the problem of source reconstruction, and to propose solutions to better evaluate them, i.e., to increase our confidence in the posterior distribution reconstructed.
The quantification of the uncertainties largely depends on the definition of the likelihood and its components. The choice of the likelihood is the concern of section 2.1. As mentioned above, the likelihood term quantifies the fitness between the measurements y and a given parametrisation of the source x. To make these two quantities comparable, a set of modelled concentrations 60 corresponding to the observations are computed. These concentrations, also called predictions, are the results of the dispersion of the source x, and are therefore depending on the parametrisation of the transport model and on the meteorological fields.
In this paper and in the case of a source of unknown location, the predictions are written as y S = H x1,x2 q where H is the observation operator, the matrix representing the resolvent of the atmospheric transport model, and H x1,x2 its definition for a source of coordinates x 1 , x 2 . Therefore, the observation operator does vary linearly in accordance with the release vector.
However, H is not linear in the coordinates. When the coordinates are unknown, they may be investigated in a continuous space. The computation of a matrix H x1,x2 for a specific couple of coordinates being expensive, a set of observation operators linked to specific locations is computed on a regular mesh G prior to the Bayesian sampling presented in section 3.2.3. The observation operators are therefore interpolated from the set of observation operators pre-computed on G.
Equation (1) can be expanded as 70 p(x|y) ∝ p likelihood (y|H x1,x2 (m)q, R) p prior (x 1 , x 2 , ln q, R, ...) where uncertainties are embodied in the observations y; the physical models: the meteorological fields m and the dispersion H; the likelihood definition: its choice and the design of its associated error covariance matrix R;

75
the representation error: the release rates q as a discrete vector (while the release is a continuous phenomenon), the observation operator H x1,x2 as an interpolation, and the observations for which the corresponding predictions are calculated in a mesh containing them; the choice of the priors.
In this paper, we focus on the uncertainties emanating from the physical models and the definition of the likelihood.

Objectives of this study
This study is a continuation from a previous study from the authors (Dumont Le Brazidec et al., 2020). It aims at exploring the various sources of uncertainty which are compounding the inverse problem and proposing solutions to better evaluate them: three key issues are investigated. First, in section 2.1, we investigate the design of the likelihood distribution, which is the main ingredient in the definition of the posterior distribution. Secondly, we propose two new designs of the likelihood covariance 85 matrix to better evaluate errors in section 2.2. Finally, in section 2.3, we describe an ensemble based method for taking into account the uncertainties related to the meteorological fields and atmospheric transport model: we construct H as a weighted sum of observation operators created out of diverse physical parameters.
Subsequently, the interest of these three propositions is illustrated with an application on the 106 Ru release in September 2017. First, a description of the context, the observation dataset, and the state of the art of the release event is introduced in In the field of source assessment and more precisely, radioactive materials source assessment, most of the likelihoods are defined as Gaussian (Yee, 2008;Winiarek et al., 2012;Saunier et al., 2013;Bardsley et al., 2014;Yee et al., 2014;Winiarek et al., 2014;Rajaona et al., 2015;Tichý et al., 2016) or more recently log-normal (Delle Monache et al., 2008;Liu et al., 2017;Saunier et al., 2019;Dumont Le Brazidec et al., 2020) or similar to a log-normal (Senocak et al., 2008). The multivariate 100 Gaussian probability density function (pdf), of mean Hx and covariance matrix R, is written as In this section, we assume that the covariance matrix R is equal to rI where r is a positive diagonal coefficient. The cost function, i.e., the negative of the log-likelihood, of the Gaussian probability density function (pdf) is written (up to a normalisation constant) as with N obs the number of observations of the problem. The cost function is a matter of judgement; it measures how detrimental is a difference between an observation and a prediction. When the observations and the predictions are equal, the likelihood part of the cost should be zero and it should increase when the difference between the observation and the prediction values grows.

110
With the assumption R = rI, choosing Gaussian is equivalent to tremendously favouring high values: the Gaussian cost function value of an observation-prediction couple (y = 100 mBq.m −3 , y S = 120 mBq.m −3 ) is a hundred of times greater than of (y = 10 mBq.m −3 , y S = 12 mBq.m −3 ). In other words, reducing the difference between the observation and the prediction of the first couple is an objective a hundred of times more important for the algorithm than for the second couple.
studies. Every measurement which will be negligible in magnitude compared to the greatest ones will be negligible in the source retrieval.
We think that the whole measurement set should bring information. More generally, the following inventory lists the criteria that a good likelihood choice for nuclear source assessment should fulfil: positive support: should be defined on the semi-infinite interval [0, +∞[ since the observations and predictions are all 120 positive by nature; symmetry between the prediction vector and the observation vector, i.e., p(y; Hx, R) = p(Hx; y, R).
The couple y = 20mBq.m −3 , y S = 40mBq.m −3 should have the same penalty as relativity: the ratio of the cost function value of a couple (20mBq.
should be mitigated and close to 1 as a general rule; existence of a covariance matrix, and of a term able to play the role of the modelled predictions. Indeed, the likelihood measures the difference between the observations and the predictions, which should therefore appear as a parameter of the distribution. Distributions with a location parameter comply with this requirement.
In particular, three distributions were found to satisfy these diverse criteria: the log-normal distribution already used in several 130 studies, the log-Laplace and the log-Cauchy distributions, which have for cost functions up to a normalisation constant (Satchell and Knight, 2000;McDonald, 2008) J log−normal (y; Hx, R, y t ) = 1 2 ln(y + y t ) − ln(Hx + y t ) 2 2,R −1 + N obs 2 ln(r), J log−Laplace (y; Hx, R, y t ) = ln(y + y t ) − ln(Hx + y t ) 1,R −1 + N obs ln(r),

135
where y t is a positive threshold vector to ensure that the logarithm function is defined for zero observations or predictions. The l 2 and l 1 norms are defined as v 2 = √ v v and v 1 = i |v i |, respectively. These three distributions are subsumed by the Generalised Beta Prime (or GB2) (Satchell and Knight, 2000;McDonald, 2008) and share a common point; all three are shaped around the subtraction of the logarithm of the observation by the logarithm of the prediction. Due to the logarithmic property ln a b = ln(a) − ln(b), several criteria previously defined are met. First the cost is a function of the ratio of the observation to the 140 prediction. Secondly, a location term appears as ln((Hx) i + y t ). And finally, with the help of a square or an absolute value, a symmetry between the observation and the prediction is guaranteed. Each of these choices requires a threshold (and even two for the log-Cauchy case, discussed at the end of this section) whose value significantly impacts the results and will be discussed later on.
drives most of the penalty on the large (relative) differences, and removes almost all penalty from the small differences. With the l 1 norm, the log-Laplace curve of the relative quantity is flatter in comparison. This translates in the fact that the inverse modelling will not be sensitive to one couple in particular, even if for this couple the difference between the observation and the prediction is large. The motive therefore to use log-Laplace is to avoid having outliers driving the entire sampling, i.e., driving the entire search of the source. The log-Cauchy distribution is the one with the most interesting behaviour and mixes 150 log-normal and log-Laplace natures. The logarithm mitigates the penalty of large differences, but also removes any penalty from the small differences. The rationale of using the log-Cauchy distribution is consequently to avoid outliers, but at the same time to avoid taking into account negligible differences.
For all choices, the value of y t is crucial to evaluate the penalty on a couple involving a zero observation or prediction.
In other words, it figures how the cost of a zero observation and a non-zero prediction (or the contrary) should compare to 155 a positive couple. We consider that the penalty on a couple (20 mBq.m −3 , 0 mBq.m −3 ) should be a big multiple of the penalty on (400 mBq.m −3 , 100 mBq.m −3 ). As a consequence, it can be deduced that a "good" threshold for the log-normal distribution in a case involving important quantities released should lie between 0.5 mBq.m −3 and 3 mBq.m −3 . Using the same principle, acceptable thresholds for the log-Laplace or the log-Cauchy distributions range between 0.1 mBq.m −3 and 0.5 mBq.m −3 .

160
We should also consider that the log-Cauchy distribution needs a second threshold j t to be properly defined. Indeed, if for a couple both observation y i and prediction (Hx) i are equal (usually both equal to zero), then r will naturally tend towards 0 so that the J log−Cauchy tends to −∞ as it can be seen in equation (6) with j t equal to zero. To prevent that, we can define j t = 0.1 mBq.m −3 and J log−Cauchy (y; Hx, r, y t ) = As it will be shown later, the choice of the likelihood has in practice a significant impact on the shape of the posterior distribution. Hence, to better describe the uncertainties of the problem, the approach proposed here is to combine and compare the distributions obtained with these three likelihoods.

170
The likelihood definition, and therefore the posterior distribution shape, is also greatly impacted by the modelling choice of the error covariance matrix R. The matrix is of size N obs × N obs . In real nuclear case studies, the number of observations can be important whereas Bayesian sampling methods can usually estimate efficiently only a limited number of variables. To limit the number of variables, most of the literature Delle Monache et al., 2008;Winiarek et al., 2012;Saunier et al., 2013;Rajaona et al., 2015;Tichý et al., 2016;Liu et al., 2017) describe the error covariance matrix as a diagonal matrix with 175 a unique and common diagonal coefficient. This variance accounts for the observation (built-in sensor noise and bias), models (uncertainty of the meteorological fields and transport model), and representation error between an observation and a modelled prediction (e.g., Rajaona et al., 2015). Saunier et al. (2019) analyse the impact of such a choice on the same case study of this paper. Indeed, the error is a function of time and space and is obviously not common for every observation-prediction couple.
This critical reduction can lead to paradoxes. With R = rI, the variance r captures an average of the observation-prediction 180 couples error variances. As seen in appendix A, some observations can tamper the set of measurements and artificially reduce the value of r, which prevents the densities of the variables to be well spread. Specifically, let us consider the non-discriminant observations. These are observations for which, for any probable source x, the observation is almost equal to the prediction.
In other words, a non-discriminant observation is an observation which never contributes to discriminate any probable source from an other. It is an observation for which, if R was modelled as a diagonal matrix with N obs independent terms, the variance 185 r i corresponding to this observation would be very small or zero. If we model R as rI, then r, capturing an average of the N obs variances r i , decreases artificially.
To deal with the existence of these observations, we propose to use two variances r 1 and r nd . The discriminant observations will be associated to the variance r 1 while the non-discriminant observations will be associated to the variance r nd during the sampling process. Necessarily then, r nd tends to a very small value.

190
In the following, we refer to this algorithm as the observation sorting algorithm. A justification of the use of this clustering using the Akaike information criterion (AIC) is proposed in annex B.
We now propose a second approach to improve the design of the covariance matrix R and the estimation of the uncertainties.
We propose to cluster observations according to their spatial position in k groups, where observations of the same cluster are assigned the same observation error variance. This proposal is based on the fact that the modelling part of the observation error 195 is a spatially dependent function. With this clustering, we have x = (x 1 , x 2 , ln q, r 1 , ..., r k ) the source variables of interest, and R as a diagonal matrix where the i-th diagonal coefficient is assigned a r j with j ∈ {1, .., k} according to the cluster to which the observation y i belongs.
Using both methods, the set of variable x to retrieve becomes (x 1 , x 2 , ln q, r 1 , ..., r k , r nd ).

Modelling of the meterorology and transport 200
As explained in section 1.3, the linear observation operator H is computed with an atmospheric transport model, which takes meteorological fields as inputs. More precisely, the Eulerian ldX model, a part of IRSN's C3X operational platform (Tombette et al., 2014), validated on the Algeciras incident, the ETEX campaign, as well as on the Chernobyl accident (Quélo et al., 2007) and the Fukushima accident (Saunier et al., 2013), is the transport model used to simulate the dispersion of the plume, and therefore to build the observation operators. To improve the accuracy of the predictions, and therefore reduce the uncertainties, a distribution of the height of the release between two layers defined between 0 and 40 meters, and between 40 and 120 meters, the space being discretised vertically in layers of various heights as described in where each parameter range has been set up based on the work of Girard et al. (2014) and deposition and Kz processes are described in table 1. The ensemble of observation operators is therefore computed from a collection of parameters. This collection is obtained from sampling the values of the parameters inside the intervals.
Once the set of operators has been built, the idea is to combine them linearly to get a more accurate forecast. A weight w i 225 can be associated with each observation operator of the ensemble: which results in combining linearly the predictions of each member of the ensemble. Each weight w i is a positive variable to be retrieved. The weights can be included in the set of variables sampled by the MCMC algorithm. They are dependent on each other through the necessary condition Ne i=1 w i = 1 so this means N e − 1 weight variables will be added to x = 230 (x 1 , x 2 , ln q, R, w 1 , ..., w Ne−1 ).
Several methods are used to explore reliability, accuracy (level of agreement between forecasts and observations), skill, discrimination (different observed outcomes can be discriminated by the forecasts), resolution (observed outcomes change as the forecast changes), and sharpness (tendency to forecast extreme values) of probabilistic forecasts (Delle Monache et al., 2006). Rank histograms are used to evaluate the spread of an ensemble. Reliability diagrams (graphs of the observed frequency 235 of an event plotted against the forecast probability of an event) and ROC curves (which plot the false positive rate against the true positive rate using several probability thresholds) are used to measure the ability of the ensemble to discriminate (Anderson, 1996).

Choice of the priors
In a Bayesian framework, the prior knowledge on the control variables for the 106 Ru source must be described. Following Dumont Le Brazidec et al. (2020), and because no prior information is available, longitude, latitude, observation error, and member weights prior probabilities are assumed to be uniform.  (Louis, 1979) Multiple of the HRES vertical and (Troen and Mahrt, 1986)

Parallel tempering algorithm
We where FN is the folded-normal distribution, with σ w the prior standard deviation of the weights and w k−1 i the value of the weight of the member k before the walk.
The variances of the transition probabilities are chosen based on experimentations and are set to be σ x1 = σ x2 = 0.3, σ ln q = 0.03, σ r = 0.01, and σ w = 0.0005. All variables are always initialised randomly. Locations of the transitions probabilities are 300 the values of the variables at the current step. When the algorithm to discriminate pertinent observations presented in section 2.2 is used, we consider that a prediction and an observation can be considered as equal if their difference is less than d = 0.1 mBq.m −3 , which is inferred from receptor detection limits described in (Dumont Le Brazidec et al., 2020). Ten chains at temperatures t i = c i with c = 1.5 are used in the parallel tempering algorithm.

Summary
To see the impacts of the techniques proposed in section 2 (i.e., using diverse likelihoods, new designs of the error covariance matrix and ensemble-based method), the pdfs of the variables describing the 106 Ru source are sampled from various configurations: a comparison between the longitude pdf reconstructed with or without the observation sorting algorithm presented in 310 section 2.2 is provided to estimate its efficiency in section 3.3.2 with the enhanced ensemble; section 3.3.3 presents results obtained using the HRES meteorology to analyse the impact of using several likelihood distributions. The observation sorting algorithm is applied and two cases are explored: no spatial clustering of the observations, x = (x 1 , x 2 , ln q, r 1 , r nd ) and spatial clustering of the observations with the corresponding modelling of the error covariance matrix R as described in the end of section 2.2: x = (x 1 , x 2 , ln q, r 1 , ..., r k , r nd ) where we will use 315 k = 3; the enhanced ensemble with uncertainty of the dispersion parameters based on the ERA5 EDA of 10 members is analysed in section 3.3.4. Afterwards, pdfs of the source variables x are sampled using the enhanced ensemble: x = (x 1 , x 2 , ln q, r 1 , r nd , w 1 , ..., w Ne ). Results are reconstructed with the help of the observation sorting algorithm and diverse likelihoods.

320
Note that, when the observation sorting algorithm is used, in all cases, approximately half of the observations are sorted as non-discriminant.

Study of the interest of the observation sorting algorithm
We present here an experiment supporting the observation sorting method. A reconstruction of the source variables is proposed using the enhanced ensemble of observation operators, only on the first 30 members for the sake of computation time. The 325 enhanced ensemble is studied later in section 3.3.4. A log-Laplace likelihood with a threshold equal to 0.1 mBq.m −3 is used in two cases: with or without applying the observation sorting algorithm. The source variable vector is therefore x with = (x 1 , x 2 , ln q, r 1 , r nd , w 1 , ..., w Ne ) (with observation sorting algorithm) or x without = (x 1 , x 2 , ln q, r, w 1 , ..., w Ne ) (without).  Figure 2 represents the longitude variable pdf of the sources variables vectors. The blue histogram, which represents the longitude pdf in the case where the observation sorting algorithm is applied, is far more spread out than the case without. It 330 indeed ranges from 60 to 61.2 degrees while the red longitude density (without applying the observation sorting algorithm) ranges from 60.8 to 61.2, i.e., the pdf extent with sorting is the triple of the spread without.
The mean of the observation error variance r samples in the case without the observation sorting algorithm is 0.399. In the case with the observation sorting algorithm, the discriminant observation error variance r 1 is equal to 0.63 and the nondiscriminant observation error variance r nd tends to a very low value. We note x with the set of sources sampled (and therefore 335 considered as probable) when using the observation sorting algorithm and x without the set of sources sampled without. For example, a source x in x with is of longitude x 1 in [60, 61.2]. By definition of the log-Laplace pdf in equation (5), the fact that r nd tends to a very low value confirms that non-discriminant observations are observations which are not able to discriminate between any of the sources in x with .
What happens when the observation sorting algorithm is not used (red histogram), i.e., with the basic design R = rI, is that 340 the observation error variance r is common for all observations. The resulting r = 0.399 is therefore a compromise between r 1 = 0.63 and r nd ∼ 0 the variances sampled with the sorting algorithm. In other words, the uncertainty on the discriminant observations is reduced compared to the uncertainty found when applying the sorting algorithm. That is, the confidence in the discriminant observations is artificially high. This is why the extent of the resulting posterior pdfs is reduced comparing to the case with the observation sorting algorithm. More precisely, the probability of the most probables sources (here longitudes 345 13 https://doi.org/10.5194/acp-2020-1129 Preprint. Discussion started: 25 January 2021 c Author(s) 2021. CC BY 4.0 License. between 60.8 and 61.2) is increased and the probability of the least probable sources is decreased (longitudes between 60 and 60.8).
The observation sorting algorithm is a clustering algorithm that avoids this harmful compromise. Observations with very high likelihood (i.e., very small observation error variance, very high confidence for all probable sources) -the non discriminant observations -are assigned a specific observation error variable. In this way, the uncertainty variance associated to the other 350 observations is far more appropriate. This clustering is totally legitimate as explained in annex B.
Finally, note that sampling the longitude posterior distribution using y d the set of discriminant observations instead of y yields a pdf very similar to the blue density in Fig. 2. In other words, without the observation sorting algorithm application, it means that considering observations which cannot discriminate between a source of longitude 60 and a source of longitude 60.8 actually decrease the probability of the source of longitude 60 which is a non-sense. The significant difference between 355 the two pdfs makes the application of the algorithm necessary.

Sampling with the HRES data, several likelihoods, the observation sorting algorithm, and with or without observation spatial clusters
In this section, we study two cases. First, we assess the impact of the choice of the likelihood on the reconstruction of the control variable pdfs (x = (x 1 , x 2 , ln q, r 1 , r nd )) and secondly, we investigate how assigning diverse error variance terms to 360 the observations according to a spatial clustering can affect the results (x = (x 1 , x 2 , ln q, r 1 , r 2 , r 3 , r nd )). In this second case, clusters are computed with a k-means algorithm for k = 3. Observations clusters are presented in the map below. As it can be seen in Fig. 3, the first cluster gathers western and central Europe observations with a very low error variance, which indicates that predictions of the model are a good fit to the observations. A second cluster corresponds to the eastern Europe observations with a large variance (r = 0.64), which is consistent with the fact that most of the important measures belong in this cluster.

365
And finally, the third cluster corresponds to Russia with r = 0.24. We now present the reconstruction of the pdfs in the two scenarios. Figures 4.a, 4.c, 5.a show the marginal pdfs of the variables describing the source using the observation sorting algorithm and for several likelihoods, using the HRES meteorology. The longitude pdf support ranges from 58.3 to almost 61.5 degrees, and the latitude support from 55 to 57 degrees. The extent of the joined longitude pdfs is far greater than the extent of any longitude 370 pdf reconstructed using any likelihood distribution. This shows that using a single likelihood is not enough to aggregate the whole uncertainty of the problem. We can see on these graphs that the choice of the likelihood has a greater impact than the threshold choice on the final coordinates pdfs. Nevertheless, they are in general all consistent in the pointed area of Ruthenium release, especially given that the observation operators interpolation step is 0.5 degrees.
The daily Total Retrieved Released Activity (TRRA) was mostly significant on the 25 th of September. The extent of the 375 release pdfs overlap is bigger than the coordinates pdfs overlap extent; probable TRRA values range from 60 to 200-250 TBq.
However, the daily TRRA pdfs obtained from the log-normal and the log-Laplace choices are significantly impacted by the threshold value choice.

Sampling from the enhanced ensemble with weights interpolation
Before reconstructing the pdfs of the Ruthenium source using the observation operators ensemble, we study the dispersion of 385 this enhanced ensemble, created by sampling on the transport model parameters and the ERA5 EDA. A number of 50 members are used to create the enhanced ensemble.
The original ERA5 EDA meteorology is under-dispersive as it can be seen in Fig. 6a: in blue is drawn the HRES meridional wind predictions, and in red the EDA meridional wind mean prediction with the maximum and minimum values, for a random location in Europe. For most of the times, the ensemble values do not even recover the HRES value, which indicates that 390 the ensemble has a small dispersion. ECMWF ensemble forecasts indeed tend to be under-dispersive in the boundary layer (especially in the short range) (Pinson and Hagedorn, 2011;Leadbetter et al., 2020). Furthermore, the release height parameter  has low chances to be of significant impact. Therefore, with only two variables (Kz and dry deposition velocity) with high chances to be of significant impact to sample, we consider that 50 members is an adequate number.
To examine the spread of the ensemble of observation operators, we need to define a reference source x ref for which 395 predictions of the ensemble will be computed for each member and compared afterwards with the 106 Ru observations. Note that the reference source choice is necessarily an arbitrary choice which biases the results. We use the most probable source from Saunier et al. (2019) as reference source: the reference source is located in [60,55] and with a release of 90 TBq on the 25 th of September and of 166 TBq on the 26 th of September. The corresponding rank diagram (Fig. 6b) shows that the predictions often underestimate the observation value. A ROC and a reliability diagram are provided in annex C to study more 400 deeply the ensemble of observation operators.
We now study the impact of adding meteorological and transport uncertainties into the sampling process: x = (x 1 , x 2 , ln q, r 1 , r nd , w 1 , ..., w Ne ). The integration of an enhanced ensemble to deal with the meteorological and dispersion uncertainties has a very interesting impact on the reconstruction of the TRRA. In Fig. 7d, the TRRA ranges from 100-150 to 300-350 TBq, and the standard deviation (std) of the joint multi-model TRRA is therefore far more important than the std 405 of the joint HRES TRRA. Note also that with the log-normal likelihood and a threshold of 0.5 mBq.m −3 , 1 mBq.m −3 or 3 mBq.m −3 , or log-Laplace with a threshold of 0.5 mBq.m −3 , the release is split between the 25 th and the 26 th as it can be noted in Fig. 7e and 7f. In other words, the integration of weights member interpolation adds uncertainty not only over the magnitude of the release but also over the timing of the release (here, the day). However, pdfs of the longitude and latitude are not significantly impacted as it can be seen in Fig. 7a and 7b.

410
The pdfs of the member weights are displayed in Fig. 8 for several likelihoods and thresholds. Only weights pdfs with high medians are included in the graphs for reasons of visibility. Several remarks are in order: member 27 is always included in the mix of weights which define the interpolated observation operator used to make the predictions and is often one or the most important part of this mix. Since the ERA5 EDA is not very dispersive, we can make the hypothesis that the weight density of a member mainly depends on the dispersion parameters which are used for creating this member (or observation operator).

415
The height layer of the release for the operator member 27 is mainly the one between 40 and 120 meters (>90%), its deposition constant velocity is 0.6 × 10 −3 m.s −1 which is very close to the lower bound (minimal possible deposit) and the Kz has been multiplied by 0.47. It also corresponds to member 6 of the ERA5 EDA.
Member 44 is present 5 times and corresponds to a deposition velocity of 1.2 × 10 −3 m.s −1 , a release mainly on the second layer (80%) and a Kz multiplied by 0.34. Member 42 is present 4 times and corresponds to a deposition velocity of 2.1 × 420 10 −3 m.s −1 , a release also mainly on the second layer (83%) and a Kz multiplied by 0.41. Member 47 is present 2 times and corresponds to a deposition velocity of 0.92 × 10 −3 m.s −1 , a release split between the two layers and a Kz multiplied by 1.1.
Two hypotheses can be made: the weight of member 27 (and member 47) is large because it has a very small deposition velocity and that deposition velocity is overestimated when using the standard choice. Secondly, Kz may well be overestimated, and the release should be on the second layer (between 40 and 120m) since these are the two common points between the three 425 more relevant members (27,44, and 42). Note that a first try with 30 members (instead of 50 members here) led to the same conclusions (except for the height of the release on which nothing could be concluded).

Summary and conclusions
In this paper, we proposed several methods to quantify the uncertainties in nuclear atmospheric source inverse modelling.
We first investigated the choice of the likelihood. We described with both a theoretical discussion and an application why 430 it can have a major impact on the final probability density function shape. We also discussed the need to better design the error covariance matrix to avoid observations with low discrimination power to artificially decrease the spread of the posterior distributions. Moreover, we provided a method to add meteorological and dispersion uncertainties to the reconstruction of the distributions of a source, improving its evaluation. Following a Bayesian approach, each member of the enhanced ensemble is given a weight which is sampled in the MCMC algorithm.

435
A full reconstruction of the variables describing the source of the Ruthenium 106 in September 2017 and their uncertainty was provided. The three methods have proven to be useful. In particular, the choice of the likelihood has shown to be of great influence on the release coordinates distribution. On the other hand, the addition of weights as variables to be sampled had a major impact on the shape of the magnitude and time of the release pdf.
We intend to apply the methods developed in this paper to the release of the Fukushima-Daiichi accident.

440
Appendix A: False paradox of the discriminant and non-discriminant observations Let us suppose that the cost function is computed from a Gaussian likelihood, then we have Suppose we add to this set of observations y of size N obs a second set of observations y * = 0 N * obs of size N * obs for which the corresponding predictions are also zero. This can happen for instance if we add observations preceding the accident.
since for each observation of the new set, y i = (Hx) i = 0. Suppose now that N * obs goes to infinity, then the observation error variance r should tend to 0. As r → 0, the distributions sampled are being more and more peaked. Therefore in this configuration, adding a given number of observations anterior to the accident will degrade the distribu- where L(0) and L(d) are the maximum likelihoods when using the first and second model respectively. Using Gaussian likelihoods to facilitate calculations and out of normalisation contants: (y i − (Hx) i ) 2 and N 1 , N 2 the number of observations assigned to the first and second model, respectively (N 1 + N 2 = N obs ). Therefore, comparing the max likelihoods of the two models: We can note that the smaller S 2 (and therefore the bigger S 1 ), the better the model (d) comparing to (0). The observation 465 sorting algorithm exactly aims at selecting a large set of observations (the non-discriminant observations y nd ) with a very small maximum likelihood S 2 = S nd .

470
According to the AIC criterion, the model (d) with two variables is judged useless if the average likelihood of the observations y 1 linked to group 1 is close to the average likelihood of the observations y 2 linked to group 2. The observation sorting algorithm exactly aims at creating two groups of observations: one where the average likelihood of the observations is close to 0 and another one where the average likelihood of the observations is high. In other words, the ratio (S 1 /S 2 ) tends towards infinity (depending on the choice of d ) and the ratio (N 1 /N 2 ), for example, is equal to 1. That is, the sorting algorithm creates 475 two groups such that the corresponding coordinates (S 1 /S 2 ) and (N 1 /N 2 ) in Fig. B1 are as far away from negative values as possible. Therefore, the AIC criterion totally justifies the need of the clustering accomplished by the observation sorting algorithm.

Appendix C: ROC and reliability diagram of the observation operators ensemble
A ROC and a reliability diagram are computed using the reference source defined in section 3.3.4 to assess the ability of the 480 forecast to discriminate between events and non-events and its reliability, respectively (Delle Monache et al., 2006). A good