Comment on acp-2020-1129

The authors present a technical and modelling study of Bayesian inversion applied to the airborne Ru-106 observations made in Europe in 2017. The technical part of the paper focuses on three aspects. First, the choice of the likelihood function: three likelihood functions were proposed based on four proposed criteria which a likelihood function should ideally fulfill. Second, an alternative formulation for the error covariance matrix is proposed, in particular with the purpose to discriminate between informative and noninformative observations; the latter leads to artificially low inferred errors, which is remediated by the alternative error covariance matrix. Third, the observation operator is replaced by a linear combination using an ensemble, of which the weights are sampled by the Bayesian inference. In the second part of the paper, these techniques are illustrated by applying Bayesian inference using airborne observations of Ru-106 observations. The results show an (impressive) agreement with the most likely source location (the Mayak institute in Russia).


Major comments
1/ I suggest to reformulate parts of Section 1.2: -"An efficient way to use these forecasts to better estimate uncertainties is to combine them": it is not clear to me what the authors mean here. Please be more specific. (If you would combine the members of an ensemble into a best estimate of the true state of the atmosphere, and use only that best estimate, you would loose the uncertainty information.) -"This approach is known as multimodel ensemble forecasting (Zhou and Du, 2010).": It is not clear to me what "This approach" is. If you use the same model with perturbed input parameters and / or perturbed physics, I would not call it multimodel ensemble forecasting.
-What is "sequential aggregation"? Do you mean sequential in time (= an ensemble of deterministic forecasts run at different starting times)?
-"An aggregated forecast is then formed by the weighted linear combination of the forecasts of the ensemble": I wonder why the authors want to create a best realization, rather than extracting the uncertainty from the ensemble. I suggest to add some discussion to explain the reasoning for this.
2/ While the discussion on the choice of the likelihood function is valuable and interesting, I'm not sure if all arguments made in the paper are valid.
-I'm not sure if I can agree with the discussion in Lines 111 -117. There, you ignore the fact that the uncertainties will be larger for the higher observation-prediction couple; both observation-prediction couples could have the same penalty if they have the same relative uncertainty, which is not unlikely considering the error from the atmospheric transport and dispersion model. Therefore, I would rather say that the problem that you mention is a result of the oversimplified error covariance and not because of the Gaussian likelihood.
-Similarly, you cannot make a statement about relativity in line 125 without considering the uncertainty.
-Note also that some authors use Gaussian likelihood, but work with ln(y) and ln(Hx) -In Section 3.3.3, the posterior is shown for the different likelihood functions. It can be seen that the posteriors don't overlap too much for the longitude and latitude parameters, but they do overlap for the Total Retrieved Released Activity. While it is not explicitly stated in the paper, the operator H was calculated on a grid with grid spacings of 1°? That would imply that the likelihood for the location is extremely sharp, thereby hinting to an unphysically small uncertainty in the location for all considered likelihood functions. I would expect that, if the uncertainties would be larger for all likelihood functions, then they would overlap much more, as is already the case for the TRRA. The conclusion would then be that the likelihood has some impact on the posterior shape, but not too much, which is what I would expect a priori.

3/ Threshold values
Line 157: "As a consequence, it can be deduced that a "good" threshold for the log-normal distribution in a case involving important quantities released should lie between 0.5 mBq.m −3 and 3 mBq.m −3 ." Could you give some explanation on how the values were deduced? I have a concern that the thresholds that are mentioned here are large compared to instrumental detection thresholds, which might explain why many observations in Central Europe are non-informative (r = 0.09) in Fig 3. As an alternative, De Meutter and Hoffman (2020) formulated likelihood functions that explicitly consider detections, non-detections, false alarms and misses.

4/
Line 179: "Indeed, the error is a function of time and space and is obviously not common for every observation-prediction couple.". I wonder why the authors do not prescribe the uncertainty on the observation and the prediction, and make it observation-specific? In De Meutter et al. (2021), the observation uncertainties are combined with the prediction uncertainties, which were obtained from an ensemble. As a result, the uncertainty on the input is no longer a parameter that needs to be inferred. Ideally, the distribution of these uncertainties should also be consistent with the likelihood function, which could be mentioned in Section 2.1. 5/ There is limited discussion on the results using the spatial clustering (Lines 379-383). Could you provide some discussion, for instance whether you would recommend it or not, and why? And what is the effect of changing the threshold (please see also my comment 3)?

6/ Enhanced ensemble
It is not surprising that a pointwise comparison will give the result in Figure 6a: a tenmember global ensemble can only represent the uncertainty on large spatio-temporal scales (which is of interest here, since you do a long-range atmospheric transport and dispersion calculation). Also, it seems strange to suggest to compensate underdispersiveness in the weather data by perturbing the atmospheric transport and dispersion model. The latter has its own uncertainties which should ideally be taken into account. In Lines 419 -428, the discussion is inconsistent with the (incorrect) motivation for perturbing ldX. 7/ In the conclusions, it is stated: "Moreover, we provided a method to add meteorological and dispersion uncertainties to the reconstruction of the distributions of a source, improving its evaluation." However, no improvement is mentioned or discussed in Section 3.3.4.

Minor comments
Line 6: Firstly,... Secondly, ..., Finally, ... Line 55: "modelling choices": it is not clear what is meant with this. Is it the atmospheric transport and dispersion model, or does it also include the likelihood and error covariance?
Line 56: (see also the above comment): "The objective of this study is to investigate the various sources of uncertainties compounding the problem of source reconstruction": but the title suggested modelling uncertainties, which I would associate to the atmospheric transport and dispersion modelling.
Line 58: "The quantification of the uncertainties largely depends on the definition of the likelihood and its components." Could you clarify this? Section 1.4: the section numbering is confusing. I suggest to use more sections, for instance a new section for "Summary and Conclusions".
Line 108: "... the likelihood part of the cost should be zero and it should increase when the difference between the observation and the prediction values grows.": you mean the cost part of the likelihood.
Page 5, criteria for the likelihood function: there is a contradiction between the first and the fourth criterion. The likelihood should indeed measure the difference between observations and predictions (fourth criterion), so that the positive support requirement becomes invalid (first criterion). If you consider the differences, I would rather suggest that it should be symmetric around its maximum, which should be at 0 (zero difference between observation and prediction).
Lines 232 -238: I suggest to omit this. Table 1: the spatial resolution, vertical resolution and time resolution: this is for ldX and not for ERA5? Furthermore, ldX was run forward in time? With one simulation for each day and each grid point? And this grid had grid spacings of 1°, while the output grid spacings are 0.28125°? I suggest to make this information more explicit.
Line 299-300: units are missing for the variances.
Line 301: "When the algorithm to discriminate pertinent observations presented in section 2.2 is used, ..." What are "pertinent" observations? Previously, you used the terms "discriminant" and "non-discriminant"?
Line 333-334: units are missing for the error variances.
Line 341-342: same as above. Line 405: "... and the standard deviation (std) of the joint multi-model TRRA is therefore far more important than the std of the joint HRES TRRA." What is the meaning of standard deviation here? And what do you mean with "important"?