Articles | Volume 26, issue 10
https://doi.org/10.5194/acp-26-7261-2026
© Author(s) 2026. This work is distributed under the Creative Commons Attribution 4.0 License.
Technical note: A flexible framework for precision reduction of WRF inputs and outputs to balance storage efficiency and scientific fidelity
Download
- Final revised paper (published on 27 May 2026)
- Supplement to the final revised paper
- Preprint (discussion started on 29 Oct 2025)
- Supplement to the preprint
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2025-4811', Anonymous Referee #1, 25 Nov 2025
- AC1: 'Reply on RC1', Jiandong Wang, 24 Mar 2026
-
RC2: 'Comment on egusphere-2025-4811', Milan Klöwer, 27 Jan 2026
- AC2: 'Reply on RC2', Jiandong Wang, 24 Mar 2026
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Jiandong Wang on behalf of the Authors (24 Mar 2026)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (27 Mar 2026) by Duncan Watson-Parris
RR by Milan Klöwer (21 Apr 2026)
ED: Publish as is (24 Apr 2026) by Duncan Watson-Parris
AR by Jiandong Wang on behalf of the Authors (26 Apr 2026)
Manuscript
This manuscripts proposes to reduced storage size of weather and climate data without compromising scientific integrity, and investigates various precision truncation strategies (combined with lossless compression) with the data from the Weather Research and Forecasting (WRF) simulations. The authors choose 2016 for the WRF simulation period, with 4-D data assimilation. Results were compared with hourly 2m air temperature and humidity and 10m wind speed, and hourly precipitation.
Metrics of relative data compression are percentage of original data when further compressed using bzip2 or gzip.
Metrics on errors due to data compression consist of RMSE of the encoded values vs. reference values, Pearson Correlation R, and Normalized Mean Bias NMB. Additional metrics for assessing impacts on extreme precipitation include number of days exceeding the 95% or 99% percentile of wet days, the maximum 1-day or 5-day precipitation total, annual count of days with daily precipitation over 10mm, count and total precipitation in wet days over a year, simple daily intensity index derived from that.
The paper is generally well written. The results are encouraging but not new (see my point below on the literature review), and the authors are not providing final compression results for the optimal strategy; it is thus unclear why the reader should actually care about doing this extra work of data compression. The paper would stronly benefit from being improved for clarity.
The fundamental limitation of the paper is that it is not properly situated in the comprehensive literature of data truncation and data compression, beyond three references: Baker et al (2016), Poppick et al (2020, lossy), Walters and Wong (2023).
The following work extensively investigated truncation strategies:
M Klower, M Razinger, JJ Dominguez, PD D ¨ uben, TN Palmer, Compressing atmospheric ¨
data into its real information content. Nat. Comput. Sci. 1, 713–724 (2021).
Moreover several works have explored neural lossy compression:
L Huang, T Hoefler, Compressing multidimensional weather and climate data into neural
networks. ICLR (2023).
T Han, S Guo, W Xu, L Bai, , et al., Cra5: Extreme compression of era5 for portable global climate and weather research via an efficient variational transformer. arXiv preprint arXiv:2405.03376 (2024).
P Mirowski, D Warde-Farley, M Rosca, et al., Neural compression of atmospheric states. arXiv preprint arXiv:2407.11666 (2024).
How does this work differ from the conclusions in all these previous works - is it by using the compressed data as inputs to WRF simulation? This should be made explicit.
Several parts of the paper were unclear:
* The article would benefit from an illustration of what are the input and output variables for the Weather Research Forecasting models, and a schematic of how the data interact. Am I right that the truncation of input data to WRF has an impact on the output results coming from the WRF, and that this is the reason why, given the same output truncation, different input truncations can reduce the relative compression size of the outputs? Are input variables forcing variables, or are they also weather data? It is only on line 325 that we can infer that output variables are not recursively fed back into the model, since output-only truncation can happen after the model is run.
* The relationship between the 1622 stations and the surface data is unclear. Do the authors have access to simulated and data-assimilated dense surface data?
* What are the inputs and outputs to the WRF? Do the authors re-run the WRF at different input data truncation strategies?
* What is the N in the error formulas: is it the number of input/output datapoints in 2016? Or is it the number of discrete observation station measurements? How is the distribution of observation points compared to that of the data used in the WRF?
* It was unclear if 69% at 5 significant digits in the input data (WRF_5) meant that:
1) original data at full precision are further compressed using lossless gzip
2) 5-digit truncated input data are further compressed using lossless gzip
3) the ratio of the storage size 2) over 1) is computed.
* When the authors write that the baseline dataset has 837GB of input data, are these full precision data or data compressed using gzip / bzip2?
* A table summarising the effective storage space after various truncation strategies, as well as compression, would be very useful.
* The color scheme of Fig. 6 is confusing and does not correspond to previous figures.
* What is also missing is a visual showing one measurement (e.g., temperature at 2m) with the FORTRAN representation and corresponding truncations.