Articles | Volume 26, issue 2
https://doi.org/10.5194/acp-26-1229-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
Methane fluxes from Arctic & boreal North America: comparisons between process-based estimates and atmospheric observations
Download
- Final revised paper (published on 26 Jan 2026)
- Supplement to the final revised paper
- Preprint (discussion started on 04 Jun 2025)
- Supplement to the preprint
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
- RC1: 'Comment on egusphere-2025-2150', Anonymous Referee #1, 15 Jul 2025
- RC2: 'Comment on egusphere-2025-2150', Anonymous Referee #2, 27 Jul 2025
- AC1: 'Comment on egusphere-2025-2150', Hanyu Liu, 03 Nov 2025
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Hanyu Liu on behalf of the Authors (03 Nov 2025)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (06 Nov 2025) by Frank Dentener
RR by Anonymous Referee #1 (26 Nov 2025)
RR by Anonymous Referee #2 (28 Nov 2025)
ED: Publish as is (18 Dec 2025) by Frank Dentener
AR by Hanyu Liu on behalf of the Authors (29 Dec 2025)
Manuscript
This study presents an update of the WetChimp wetland model intercomparison that was published several years ago. The new inter-comparison makes use of model submissions to the Global Carbon Project. The results show a significant reduction in inter-model spread compared with the previous intercomparison, in closer agreement with atmospheric measurements evaluated using Stilt over North America. This is regarded as a sign of good progress in developing these models. In my view, explained below, this should consider another possible explanation. The comparison with WetChimp is only indirect, since the results were not included in the evaluation using atmospheric measurements. Furthermore, the evaluation using atmospheric data concentrates on R2, for a reason that remains unclear. After these concerns are repaired and accounted for, I see no reason to uphold publication of a study that could provide a useful new reference.
GENERAL COMMENTS
The risk of model intercomparisons is that they might steer model development in the direction of the “mean model”. It is tempting to interpret a convergence in model results as progress towards uncertainty reduction. This is only true, however, if the models converge to the true state. The evaluation that is presented does not provide evidence that this is the case.
Atmospheric measurements are used to test the quality of wetland emission estimates. But, for a reason that is not clear, they are not used to confirm that the WetChimp submissions are less realistic. The argument that they are is only based on the convergence of results and the size of the emissions. The analysis of new submissions suggests that models with lower emissions are more accurate, based on the amplitude of concentration increments, but the argument is again rather indirect as this comparison also did not include the WetChimp emission estimates. I propose to either redo the analysis using the WetChimp fluxes, or – if that is not possible – acknowledge this short coming of the method that is used.
The model evaluation method uses R2 as a metric of agreement with the observations. R2 is limited, however, in that it does not penalize a wrong enhancement amplitude. The observed concentration variability is explained mostly by the weather. Differences in emissions show up rather in the concentration increments, which are not captured by R2. A more logical choice would have been to use RMSE as evaluation metric. This should either be tried, or an explanation should be given of why it was not done. Note that RMSE is not the same as the metric shown in Figure 4, although that does provide an evaluation based on the size of the mean concentration increment.
Based in the results in Figure 6, it is suggested that simpler diagnostic models perform better than more sophisticated prognostic models. This raises the question, however, how independent the model results are of the data that are used to evaluate them. Simpler models are easier tuned to the existing measurements than sophisticated mechanistic models. Could that explain why they score better? I was surprised to see that the evaluation is based only on ambient air measurements, without the mentioning of flux measurements that are made at several sites in the study domain. They might even provide a less independent means of evaluation. It would nevertheless provide useful additional information to compare the performance of the different model categories that are distinguished.
SPECIFIC COMMENTS
Line 75, how about regional models for the study domain? I understand that this model inter-comparison evaluates global models, but results from regional models might nevertheless provide useful information for evaluating them.
Line 94, the purpose of this sentence in relation to the previous is not clear. Is it meant to provide further justification for afternoon measurements? Or is it meant to indicate a limitation that will anyway play a role? Please rephrase to clarify.
Line 99, Don’t the campaigns in Alaska offer a useful opportunity for further validation? If so, why was it not used?
Line 168: From a simple back of the envelop calculation it seems the 1 – 1.5 ppb represents high-latitudes already, because the global decay due to OH should be faster.
Figure 1: Does ‘daily’ mean that the footprints shown in this figure represent only the influence of a one day back trajectory? The text mentions that 10-day back trajectories are used, which raises the question why mean 1-day footprints are shown here. Is the ‘mean’ evaluated over 2007 – 2017 (if so then this should be mentioned explicitly).
Line 191: “The remaining sites …” You might want to add a reference to Figure 1 where these sites are indicated as red circles.
Line 205: Did you test how reliably the apparent Q10 approximates Q10 for the models that use a Q10 formulation? (and for which its value is known)
Line 215: How about the seasonality of anthropogenic emissions?
Line 220: But anthropogenic emissions inventories provide estimates for each year, so reasonably accurate IAV estimates exist for the anthropogenic part.
Line 227: Could it be that WetChimp led to a consensus about the mean flux that might explain some degree of convergence?
Line 229: Is this also true for the models that are common to both experiments?
Line 275: How are emissions from fresh water accounted for in the current study?
Figure 2: An explanation about the error bar should be added in the figure caption.
Line 315-317: It is not clear why Q10 would correlate with the average methane emission (which indeed seems not to be the case). Wouldn’t it have been more logical to assess Q10 against R2 or against the seasonal amplitude?
Line 357: It would be useful to add standard deviations to the points in figure 6 corresponding to the averages over climate forcing data and anthropogenic emission inventories.
Line 376: Figure 5 is referred to for a relation between Q10 and flux variations, but this figure relates Q10 to the mean flux rather than its variation.
Line 395-398: This rightly mentions that the explained variance of the PC1 has no relation with the true variance. However, more useful would have been to explain what the comparison of these numbers does mean. Right now, it is unclear why these numbers are even mentioned.
Line 405: ‘so this analysis of spatial distribution’ It is not clear what ‘this analysis’ refers to. The PCR analysis is not weighted to areas with stronger observational coverage, is it?
Figure 8: There is no reference in the text to panel d – f, and an explanation is missing of what the mean standardized flux is and how it was derived.
Line 420: ‘this change in magnitude improves …’ This cannot be concluded because the WetChimp flux estimates were not included in the comparison to observations.
Line 422: ‘most consistent with atmospheric observations’ only concerns the R2, whereas it is not clear that R2 is best metric to evaluate the consistency with atmospheric observations.
Line 432: ‘Overall, we argue …’ It should be made clear that this conclusion only holds for the current analysis of emissions from Northern America. Since the models are global, there is still the possibility that other regions turn the overall outcome in the opposite direction.
TECHNICAL CORRECTIONS
Line 180: ‘initially’ instead of ‘preliminary’?
Line 324: “contribute to<o>” (?) but are not “the primary >the< cause”