Technical note: A flexible framework for precision reduction of WRF inputs and outputs to balance storage efficiency and scientific fidelity

Wu, Shang; Wong, David C.; Wang, Jiandong; Jin, Yuzhi; Li, Junjun; Lu, Chunsong

doi:10.5194/acp-26-7261-2026

Articles | Volume 26, issue 10

https://doi.org/10.5194/acp-26-7261-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/acp-26-7261-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 26, issue 10

Technical note

|

27 May 2026

Technical note |

| 27 May 2026

Technical note: A flexible framework for precision reduction of WRF inputs and outputs to balance storage efficiency and scientific fidelity

Shang Wu, David C. Wong, Jiandong Wang, Yuzhi Jin, Junjun Li, and Chunsong Lu

Download

Final revised paper (published on 27 May 2026)
Supplement to the final revised paper
Preprint (discussion started on 29 Oct 2025)
Supplement to the preprint

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-4811', Anonymous Referee #1, 25 Nov 2025

This manuscripts proposes to reduced storage size of weather and climate data without compromising scientific integrity, and investigates various precision truncation strategies (combined with lossless compression) with the data from the Weather Research and Forecasting (WRF) simulations. The authors choose 2016 for the WRF simulation period, with 4-D data assimilation. Results were compared with hourly 2m air temperature and humidity and 10m wind speed, and hourly precipitation.

Metrics of relative data compression are percentage of original data when further compressed using bzip2 or gzip.

Metrics on errors due to data compression consist of RMSE of the encoded values vs. reference values, Pearson Correlation R, and Normalized Mean Bias NMB. Additional metrics for assessing impacts on extreme precipitation include number of days exceeding the 95% or 99% percentile of wet days, the maximum 1-day or 5-day precipitation total, annual count of days with daily precipitation over 10mm, count and total precipitation in wet days over a year, simple daily intensity index derived from that.
The paper is generally well written. The results are encouraging but not new (see my point below on the literature review), and the authors are not providing final compression results for the optimal strategy; it is thus unclear why the reader should actually care about doing this extra work of data compression. The paper would stronly benefit from being improved for clarity.

The fundamental limitation of the paper is that it is not properly situated in the comprehensive literature of data truncation and data compression, beyond three references: Baker et al (2016), Poppick et al (2020, lossy), Walters and Wong (2023).

The following work extensively investigated truncation strategies:

M Klower, M Razinger, JJ Dominguez, PD D ¨ uben, TN Palmer, Compressing atmospheric ¨

data into its real information content. Nat. Comput. Sci. 1, 713–724 (2021).
Moreover several works have explored neural lossy compression:

L Huang, T Hoefler, Compressing multidimensional weather and climate data into neural

networks. ICLR (2023).

T Han, S Guo, W Xu, L Bai, , et al., Cra5: Extreme compression of era5 for portable global climate and weather research via an efficient variational transformer. arXiv preprint arXiv:2405.03376 (2024).

P Mirowski, D Warde-Farley, M Rosca, et al., Neural compression of atmospheric states. arXiv preprint arXiv:2407.11666 (2024).
How does this work differ from the conclusions in all these previous works - is it by using the compressed data as inputs to WRF simulation? This should be made explicit.

Several parts of the paper were unclear:
* The article would benefit from an illustration of what are the input and output variables for the Weather Research Forecasting models, and a schematic of how the data interact. Am I right that the truncation of input data to WRF has an impact on the output results coming from the WRF, and that this is the reason why, given the same output truncation, different input truncations can reduce the relative compression size of the outputs? Are input variables forcing variables, or are they also weather data? It is only on line 325 that we can infer that output variables are not recursively fed back into the model, since output-only truncation can happen after the model is run.
* The relationship between the 1622 stations and the surface data is unclear. Do the authors have access to simulated and data-assimilated dense surface data?
* What are the inputs and outputs to the WRF? Do the authors re-run the WRF at different input data truncation strategies?
* What is the N in the error formulas: is it the number of input/output datapoints in 2016? Or is it the number of discrete observation station measurements? How is the distribution of observation points compared to that of the data used in the WRF?
* It was unclear if 69% at 5 significant digits in the input data (WRF_5) meant that:

1) original data at full precision are further compressed using lossless gzip

2) 5-digit truncated input data are further compressed using lossless gzip

3) the ratio of the storage size 2) over 1) is computed.
* When the authors write that the baseline dataset has 837GB of input data, are these full precision data or data compressed using gzip / bzip2?
* A table summarising the effective storage space after various truncation strategies, as well as compression, would be very useful.
* The color scheme of Fig. 6 is confusing and does not correspond to previous figures.
* What is also missing is a visual showing one measurement (e.g., temperature at 2m) with the FORTRAN representation and corresponding truncations.

Citation: https://doi.org/10.5194/egusphere-2025-4811-RC1
- AC1: 'Reply on RC1', Jiandong Wang, 24 Mar 2026
  
  Thank you for your comments. Please refer to the attached PDF file for our detailed responses.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4811-AC1
RC2:
'Comment on egusphere-2025-4811', Milan Klöwer, 27 Jan 2026

# Summary

The authors present a study on the impacts of lossy compression in WRF simulations over the contiguous United States. The precision of input or output fields is truncated (rounding) and errors are analysed across different variables and derived diagnostics. The authors conclude with a strategy who to truncate WRF data in their case.

The paper addresses lossy compression the prevailing solution to growing data archives in weather and climate modelling. Given that lossy compression is either not applied or analyses hardly reach the level conducted here I also see this as a timely contribution as it suggests safe levels of compression errors. Novel is the analyses of errors arising from rounding input data, most compression studies deal with output data and therefore ignore the effect of rounding errors on a simulation. In that sense I am generally happy to accept this study but only after major changes: The methods aren't well reported, some subjective choices are unjustified and much of the literature in this field from recent years is not cited and their conclusions therefore in relation to results here not discussed. I find much around the chioce of error metrics problematic and insufficiently motivated and discussed particularly when they are used to define an "optimal truncation strategy".

I start with some major points, note that some of the minor points may repeat issues as I wrote the minor points first. Feel free to cross-reference your answers.

# Major

1) Precision truncation method

It is not stated how the precision of floating-point numbers is actually truncated. Generally this is known as rounding (given you apply the truncation to numbers and not e.g. spectra) so I suggest you to adapt the term "rounding". There are many rounding modes available, so please state the details here. I highly suggest you to apply IEEE 754 standard round to nearest tie to even, it's an IEEE standard for a reason. Other rounding modes are bitshaving, setting or grooming (as terminology used by Zender et al.) however they have a towards or away from zero and don't deal with ties properly. You never talk about bits so it's unclear to me whether you actually round in binary (zeroing trailing mantissa bits) or whether you round in decimal like round(x*10^N)/10^N. I can't find this in the provided Fortran code (no readme provided). If you don't round in binary then you truncate the precision without actually setting bits to 0, e.g. 0.1 has a 001100110011... mantissa in binary. The lossless compressors will still be able to compress this somewhat but you're essentially giving the compressors a much harder time than if it was rounded in binary. Please state your methods, also the cited Walters and Wong (2023) don't seem to explain this in detail. Also how to deal with ties is important, IEEE 754 introduced alternating rounding of ties up and down (to the nearest even number) for the reason to avoid an away from zero bias. Given you talk about biases in your error analyses, the role of ties is unclear a priori.

Related is that you don't state whether you're dealing with single or double precision numbers. If you're doing binary rounding then the compressor can take care of the additional 4 byte of zeros but if you round in in another base that may matter and also is relevant to the stated size of your baseline dataset being ~3TB.

2) Lossless codecs

There are many other lossless codecs available yet you focus on bzip2 and gzip both are about 30 years old. Newer ones are Zstandard, Blosc, LZ4, or pcodec. Why don't you compare with those? There might be a reason why you want to use those older codecs but if they have an advantage over newer one please state them? Also related is that bzip2 just seems to be the better choice in your case but you don't actually report on (de)compression speed (gzip might be faster). I suggest to compare against at least one compressor of the newer generation, e.g. Zstandard. Each of them also have compression options to trade in speed for compression that need to be stated. With Zstandard for example you can choose a really high compression level, which produces smaller compressed files but takes forever. So generally you should always state this tradeoff and the limits you apply, e.g. you don't want a compressor to be slower than 100MB/s or so.

There is currently no discussion on data formats in the manuscript. While the rounding can be applied to any format as it just operates on floating-point numbers in some arrays the use of different lossless codecs may be integrated into the format (e.g. netCDF with zlib compression, compresses fields but not the header so you don't have to decompress to read the metadata). So for the tool you built, which data formats does it operate on?

3) Subjective choice on the significant digits

You round input and or output data to 3, 4, or 5 significant digits. However, this range seems to come out of nowhere. Why not 6 significant digits or 2? For some variables 3 digits might be an overkill if the uncertainty is only within 2x of the values. For others like CO2 5 significant digits might be at the edge of that's acceptable as it's a well mixed concentration with the variance being relatively small compared to its mean value. For a more systematic analysis on this see Kloewer et al. 2021 who also advocate for the round+lossless compression method but employ IEEE rounding, information theory to determine the number of bits to keep and use newer lossless codecs that generally achieve 10-20x compression. You achieve at most 4x compression (Fig 2), which isn't better than ECMWF's/ERA5s linear packing (also called quantization) into 16-bit integers (but bounds an absolute rather than a relative error).

4) Missing literature

The word "compr" occurs in only 3 independent (not from the authors) studies cited. So you are clearly not discussing well your findings against the existing literature. Under References below I'm listing a few studies that are relevant for this study here. It's not just about citing them but actually writing a manuscript (especially introduction and discussion) that builds on top of their findings. Please rewrite your manuscript to account for the results of these studies.

5) Error metrics

You use 3 error metrics: an absolute error (RMSE), correlation and a bias. They seem to be subjectively chosen and their use isn't motivated. What about a relative error or an error in variance, a maximum error (yielding a much stronger bound), number of zeros preserved (important for precipitation). What about other suggested metrics like the structural similarity index measure (SSIM), see Baker et al 2019 or a spectral error (used to identify the introduction/removal of grid-scale variability). These error metrics are also important to discuss relative to the distribution of variables. E.g. temperature is more linearly distributed (with a higher entropy on a linear scale compared to a logarithmic scale) but other variables may be logarithmically distributed (wind speed, precipitation, global specific humidity). This affects the meaning of absolute vs relative error. Either error metric may be dominated by compression errors on outliers. Please include this into the discussion of your results.

Then for the "optimal truncation strategy" you decide that NMB < 1% is a sufficient condition for an acceptable compression error. Why is that? If I have [1.5001, 0.4999] and truncate this to [2, 0] (round to nearest integer) then NMB = 0 but we have increased the variance (from 1/2 to 2) and the maximum absolute error is 1/2 which might be unacceptably high. I generally propose to use a (normalized) mean and maximum absolute error or a (normalized) mean and maximum relative error depending on the data distribution. Furthermore a spectral error is often used to investigate the impact on the small scales as rounding can introduce artifical gradients (jumps from one representable number to the next) or smooth out gradients if neighbouring cells are rounded to the same value. I would reject the idea to formulate an "optimal strategy" based on solely one metric (NMB) and definitely expect a discussion around the chosen error metrics (what they measure and what they don't) and a strong justification of why you choose what you choose.

6) Inconsistent conclusions

On line 205 you write "truncation impacts are variable-dependent" highlighting the need for precisions chosen differently by variable (supported by Fig 3). However when you present your "optimal truncation strategy" while still mentioning the variable-dependency, you don't conclude that an adaptive strategy shoud adjust to different variables. Instead you talk only about seasons and regions. Applying a different precision by variables clearly seems to be the more optimal way, so why call your strategy "optimal" when you're leaving potential on the road? If this isn't possible for practical reasons then state this. But any modern data format (netCDF, HDF5, Zarr, ...) would allow you to round variables differently, see connection to (1).

# Minor

40: Not sure what you mean by "shift" here. Both global and regional models are used operationally?

52: But this is not the fault of gzip, bzip2, to problem is that tailing mantissa bits are high entropy and hence incompressible. Rephrase this to make this clear to the reader? Especially because you're using lossless compression later.

54: terminology: absolute and relative error? (Or generally any error metric?) Yes a relative error can be expressed in significant digits but that's just the unit, significant bits would be another?

56: State that this is also known as (bit) rounding? "both operational workflows": what does "both" refer to here?

57: What does "lightweight" mean? It surely doesn't consume memory but maybe you want to say "fast" or "cheap"?

57: "Straightforward to implement" I somewhat disagree: IEEE-754 round to nearest tie has its complexities but it's certainly an (IEEE) standard and therefore widely available and accepted.

58: "utilities" -> "compressors"?

59: You certainly make that statement based on previous publications. References?

64: Certainly agree with this statement but for a analysis across variables see Klöwer et al. 2021

66: climate -> weather events? This sounds like floods, storms or heatwaves/coldsnaps to me?

67: Given this sentence I don't know what you mean in the former. Please clarify?

69: "can alter" -> this discussion is missing that the changes introduced by lossy compression may not be statistically signficicant? If lossy compression is applied right then the compression error should be masked by the analysis error.

76: on "nonlinear sensitivity" missing the point here that you also apply lossy compression to the initial conditions used for WRF. State that explicitly? Otherwise the "nonlinear sensitivity" of data compression is confusing

104: I'm missing here a discussion how significant digits are translated to bits, see major point

104: It's unclear why you chose 3-5 significant digits and not more or less. For some variables 5 significant digits are more clearly an overkill, say instantaneous cloud cover of e.g. 0.12345 whereas for others it's not, e.g. CO2 at 428.63ppm. Motive this range here?

108: Delete "simultaneously"? There's a 1-yr simulation run in between "input" and "output", so hardly simultaneous?

139: Why do you use RMSE as an error metric (which quantifies an absolute error) although your precision truncations yields relative compression errors? Sure the relative error is therefore somewhat predictable (given it's bounded) but I suggest a discussion here whether all variables should be evaluated using an absolute error. Surely a wind speed of 0 vs 1m/s makes a difference but if it's 80 or 81 m/s probably not?

165: "efficiency" -> "factor" or "ratio". If you actually mean efficiency, then introduce what efficiency means. Intuitively I think of efficiency as performance per resource. So it's unclear what this refers to here, could also be compressed size per (de)compression speed/time?

171: I don't understand why input compression should affect output compression? Isn't there an entire simulation in between introducing high-entropy mantissa bits again?

181: This has been highlighted by Zender et al. and others, please cite those and present your result in discussion to their findings?

182: Do you want to add a discussion about compression speed here? How fast are both compressors in your case?

190: Is Temperature in ˚C or Kelvin? O(300K) rounded to 3 significant digits rounds to 1K/˚C increments? Units are otherwise not relevant but an offset from ˚C to Kelvin is. You state this later, state it here?

190: Relative humidity is likely only a post-processed output variable, calculated from temperature to get the saturation vapour pressure. So if you see an error in relative humidity, are you sure it's not due to errors in temperature? Please clarify this dependency.

200: How do you know it's not the relationship to temperature? I would see your point if it was specific humidity, but you are analysing relative humidity here. Also the errors between Fig 3b&c are very similar but not to precipitation?

205: The meaning of "5 significant digits" also depends on whether precipitation is accumulated or a rate in the output. Can you clarify this?

Fig. 3: Swap red-blue colours in e-h to signal worse with red and better with blue?

Fig. 3: Why are errors being reduced for wind? IEEE rounding is theoretically bias-free (due to round to nearest tie to even) so I don't understand what's happening here.

222: Please mention units earlier, see above.

Fig 4: Why do you take the absolute value of RMSE changes? One is better the other one worse, could you clarify first why lower RMSE follows from rounding?

Fig 4: Can you please change the colours for the regions? The red and the green are pretty much indistinguishable for someone with deuteranopia (the most common color vision deficiency), make one brighter the other one darker for example

226: See comment above, can you show that humidity here is actually independent of temperature?

250: How do you know that +-3% is modest? For temperature a 1% error was not acceptable.

316: Why is NMB < 1% a good (single) metric to decide whether your compression error is acceptable? See major point.

322: I don't disagree with that but that's the holy grail of lossy data compression: How do choose an acceptable compression error for a (not yet decided) set of scientific objectives.

324: As far as I understand your nudging applied it's not just an initial or boundary condition but also a forcing term in the upper atmospheric levels that's altered.

331: fx6...15 would be even more conservative, given the subjective choice of 3...5 why do you conclude that 5 is the "most conservative"? Of course higher values would defeat the point of lossy compression, but the conclusion drawn seems therefore very subjective.

332: I don't understand why you don't suggest fx3 for winds, from Fig 3 that would follow as acceptable? But maybe you first have to explain the impact rounding has on the winds, see comment above.

367: Thanks for providing the code but please provide a readme or documentation, I can hardly look through hundreds of lines of Fortran code to understand what you did or how you organize your code. I was looking for where you actually apply the rounding but struggled to find it.

# References

- IEEE Standard for Binary Floating-Point Arithmetic ANSIIEEE Std 754-1985 1–20 (IEEE, 1985); https://doi.org/10.1109/IEEESTD.1985.82928

- Silver, J. D. & Zender, C. S. The compression-error trade-off for large gridded data sets. Geosci. Model Dev. 10, 413–423 (2017).

- Zender, C. S. Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+). Geosci. Model Dev. 9, 3199–3211 (2016).

- Delaunay, X., Courtois, A. & Gouillon, F. Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files. Geosci. Model Dev. 12, 4099–4113 (2019).

- Baker, A. H., Hammerling, D. M. & Turton, T. L. Evaluating image quality measures to assess the impact of lossy data compression applied to climate simulation data. Comput. Graph. Forum 38, 517–528 (2019).

- Klöwer, M., Razinger, M., Dominguez, J.J. et al. Compressing atmospheric data into its real information content. Nat Comput Sci 1, 713–724 (2021). https://doi.org/10.1038/s43588-021-00156-2

- R. Underwood, J. Bessac, S. Di and F. Cappello, "Understanding the Effects of Modern Compressors on the Community Earth Science Model," 2022 IEEE/ACM 8th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD), Dallas, TX, USA, 2022, pp. 1-10, doi: 10.1109/DRBSD56682.2022.00006.

- https://arxiv.org/abs/2510.22265

- https://arxiv.org/abs/2503.20031

- https://arxiv.org/abs/2410.03184

Citation: https://doi.org/10.5194/egusphere-2025-4811-RC2
- AC2: 'Reply on RC2', Jiandong Wang, 24 Mar 2026
  
  Thank you for your comments. Please refer to the attached PDF file for our detailed responses.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4811-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Jiandong Wang on behalf of the Authors (24 Mar 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (27 Mar 2026) by Duncan Watson-Parris

RR by Milan Klöwer (21 Apr 2026)

ED: Publish as is (24 Apr 2026) by Duncan Watson-Parris

AR by Jiandong Wang on behalf of the Authors (26 Apr 2026) Manuscript

Download

Article (8065 KB)
Full-text XML

Short summary

High-resolution weather and climate simulations produce massive amounts of data, creating major storage challenges. This study explores a method that reduces unnecessary numerical detail by keeping only a limited number of significant digits. The results show that substantial data reduction can be achieved while preserving key physical features.

Technical note: A flexible framework for precision reduction of WRF inputs and outputs to balance storage efficiency and scientific fidelity

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection