A significant uncertainty in assessments of the role of clouds in climate is the characterization of the full distribution of their sizes. Order-of-magnitude disagreements exist among observations of key distribution parameters, particularly power law exponents and the range over which they apply. A study by

The broad range of cloud sizes in the atmosphere poses a significant challenge to the modeling of weather and climate. Small clouds tend to be most numerous, while large clouds have more significant meteorological and climate impacts. An approximate balance means that all size classes contribute to overall cloud cover

Revealingly, independent of the spatial scale or cloud type considered, the measured horizontal dimensions of clouds tend to follow power law distributions such that the number of clouds is proportional to their size to some power

Quantities that exhibit scale-free behaviors, however deterministically complicated they may be, allow for an important mathematical simplification. That is, phenomena measured at any one scale shed light on the behavior at others. They also present a practical challenge, which is the unavoidable limitation that geometrically defined objects must inevitably be measured within a domain of some finite size, i.e., a domain that is

For cloud areas

Estimates of the location of the scale break at

The power law exponent

The lack of consensus among studies on the value of

Here we argue that the choice of fitting method is less important than whether past studies properly accounted for the finite size of the study domain. A finite domain size is a general problem for measuring scale-free quantities. For example,

Similarly, cloud sizes must necessarily be measured within a non-scaling finite domain. It is easy to appreciate that the area of clouds larger than the domain size cannot be measured. A more subtle effect is that the measured numbers of clouds of a given area, even those smaller than the domain area, are highly sensitive to whether clouds that cross the domain edge are included or removed in the measured distribution (an example is shown in Fig.

An example cloud mask derived from GOES satellite imagery, where cloudy pixels are white or orange and clear pixels are dark blue. Clouds which are truncated by the domain edge are marked in orange. The areas of such “truncated clouds” cannot be properly quantified as some unknown portion lies beyond the measurement domain.

Whether truncated clouds are included or removed from distribution fits is an issue rarely mentioned in past studies, but those that do consider the effect tend to remove truncated clouds without applying any correction factor

In this study, Sect.

The most straightforward method to fit a power law to empirical measurements of cloud areas is to bin the data into discrete bins of constant width

However, even with logarithmic binning, linear regression has been found to produce a biased estimate of

There are two other linear-regression-based approaches worth mentioning, i.e., cumulative distributions and rank-frequency plots, both of which approximate the integral

An alternative method of fitting a power law to data, maximum likelihood estimation, is argued on empirical grounds to be generally more accurate than linear-regression-based approaches

Evidence supporting the superiority of maximum likelihood estimation put forth by

In fact, because the truncation at

To evaluate the accuracy of the linear regression approach for fitting a power law with finite

Estimated values of the power law exponent, denoted as

As shown in Fig.

Failure rates for fitting

Statistical error in measured counts

Applying a simple rule that least-squares linear regression only be applied to those bins with sufficiently large counts may seem obvious: estimating

In summary, whether binning is done linearly or logarithmically, there may be bias in previously calculated values of

Truncated clouds, which span the domain edge (Fig.

For measurements of atmospheric clouds, we use data from the Advanced Baseline Imager (ABI) aboard the GOES-West (GOES-17) satellite. GOES-West is a geostationary satellite centered at 137° W with a nadir-imaging resolution of approximately 2 km. A preprocessed cloud mask product that attempts to identify every pixel as “cloudy” or “clear” is used, and so each “image” is a binary array of pixels specified as 1 for cloudy or 0 for clear. A total of 10 processed images are used, each taken at local noon (21:00 UTC) between 1 and 10 June 2021.

We use the

We also consider size distributions for more idealized objects. The uniform square lattice, adopted from percolation theory, is a two-dimensional square lattice where every site (or cell) is occupied with uniform probability

A central result of percolation theory is that, as

The percolation model is useful here for studying distributions of object size distributions in finite domains because the distribution of cluster sizes is known exactly. In particular, any deviation from power law scaling at the large end of the cluster size distribution is known to exist because the lattice has a finite size. Models similar to the uniform square lattice used here have also been leveraged previously to explain the fractal dimension of precipitating regions

We simulate three 10 000

To test how the domain or lattice size affects the measured area distributions, the binary arrays representing cloud fields or percolation lattices are subdivided as follows: if the shape of the original array is

For each subdomain considered in the cloud imagery, if truncated clouds are removed from the size distributions, bin counts are increasingly undercounted at larger object areas as shown in Fig.

Histograms of cloud areas for several sizes of subdomains from GOES-West. Filled shapes indicate histograms which do not include truncated clouds, while hollow shapes include truncated clouds. Hollow shapes are offset vertically by a factor of 10 for clarity. The vertical dashed lines mark the smallest bin in which 50 % of the objects are truncated by the domain edge for each domain size.

Histogram of cloud areas measured in the

Alternatively, if truncated clouds are included in the histogram, they are placed in a smaller-size bin than that in which they belong. This leads to an

As for Fig.

The effect of miscounting large clouds in a finite domain is also mirrored in the percolation lattices, where either a cutoff regime (an undercounting) or a local maximum (an overcounting) is introduced into the size distribution, respectively (Fig.

Fits from the hypothetical scenario where cloud areas are measured within the 100

The simple remedy of calculating

Example of how a measurement

We recommend, as a simple solution for the errors introduced by domain truncation effects, only analyzing bins containing a small number of truncated clouds

Estimated values of

In Table

Regardless of the domain size, truncation effects occur. For robust power law fits, the resolution

In principle, because the 50 % threshold removes larger objects in the distribution that may be of scientific interest, an algorithm could be devised to correct cloud truncation effects. One such algorithm was used by

One commonly employed method for reducing artifacts caused by domain boundaries in cloud simulations is to utilize doubly periodic simulations that allow fluxes out of one side of the numerical grid to re-enter on the opposite side

The impact of employing periodic domains may easily be examined within percolation lattices. Because each site has an occupation probability that is independent of the surrounding sites, the model can be made periodic simply by changing the site connectivity to be periodic at the lattice boundaries. Specifically, if a lattice of size

Histogram of cluster areas in doubly periodic percolation lattices for several domain sizes.

In this case, as Fig.

Even if the distribution of object sizes does not follow a power law, domain truncation effects may still bias measured size distributions. As an example, consider the distribution of raindrop sizes as measured by the new Differential Emissivity Imaging Distrometer (DEID). The DEID measures raindrop mass by measuring the time it takes for raindrops to evaporate after landing on a hotplate

The main difference between precipitation and cloud size distributions is that precipitation size distributions tend to follow an exponential rather than a power law

Histogram of percolation cluster areas generated in lattices with a site occupation probability equal to 0.5. Plotted counts do not include truncated clusters. For the

Figure

As with power laws, sufficiently large bins in an exponential distribution are dominated by truncated clusters. Applying the same 50 % truncated cluster criterion provides a straightforward method to identify which bins are most influenced by the choices of including or removing truncated clusters. A more accurate size distribution can still be obtained provided that these bins are omitted from the fit.

There is significant disagreement in the literature on what the appropriate choice of distribution should be to describe cloud horizontal areas. Most studies find that cloud areas follow a power law

The present study shows that the choice of fitting method cannot explain the disagreement among observations, particularly for the range of scales over which a power law applies. We find that a linear regression to logarithmically spaced bins is an equally accurate fitting method for power-law-distributed data provided the simple requirement is adopted that bins with fewer than

We suggest that different accounts of cloud power law behavior in the literature are best explained by treatments of clouds whose geometries are “truncated” by the edge of the measurement domain. Removal of truncated clouds from the distribution introduces an artificial “cutoff scale” beyond which clouds can be significantly undersampled, with a resulting distribution consistent with many previous findings

While size distributions measured within any domain size are affected by truncation effects, they are most important only for the largest clouds. The affected scale is easily identified by counting for each bin the fraction of clouds that are truncated relative to the total in that bin. We recommend that power law fits be applied only to bins in which the fraction of these clouds is less than 50 %.

Truncation effects are not limited to power law size distributions, as exponentially distributed objects can be similarly affected. Fortunately, the 50 % truncated object criterion is applicable regardless of the underlying form of the distribution.

The issues and remedies discussed here are not specific to atmospheric clouds and can be applied to size distributions characterizing any other phenomena measured within a finite geometric domain, e.g., with ecological predator–prey models

The result that linear-regression-based fitting methods can accurately estimate the power law exponent

The central issue is the statistical error of bin counts in a histogram. As a conceptual model, consider a large number of experiments that all measure some variable many times and bin the results into a histogram. The count in each bin can be expected to be roughly similar from experiment to experiment but not exactly the same. The “statistical error” is the standard deviation of the bin counts, which could be estimated, e.g., by sampling a large collection of experiments.

This conceptual model can be made more precise by considering the experiments to be a random counting process consisting of

The advantage of introducing

Standard linear regression packages assume that each data point has Gaussian error. In their Appendix A,

This is incorrect because the central limit theorem also states that the variance of

Kolmogorov–Smirnov

Figure

We suggest that this result explains why the linear regression technique used in Sect.

The above argument applies to measurements that are statistically independent because statistical independence implies that the bin count

A priori, one might expect statistical errors for cloud sizes to be lognormal instead (implying that

Regardless, lognormality in statistical errors of

The method we propose to address domain truncation effects, i.e., to omit bins in which the truncated clouds are greater than 50 % of the total, effectively removes a large portion of the size distribution. If the large portion is of interest, an algorithm could be derived in principle for the effects of the removal of clouds that are truncated by the domain edge.

Consider the case of cloud area distributions. If cloud locations are statistically independent of the domain edge location, the probability of a cloud being truncated by the domain edge

In general, obtaining an appropriate correction algorithm can be a surprisingly difficult problem. For clouds specifically, there are several issues. First, cloud lengths would likely not be proportional to

This last point is particularly problematic, since it makes simply measuring the relationship between cloud area and cloud length difficult and affected, again, by the choice of the domain size. Consider a hypothetical case where most large clouds are much longer zonally than they are meridionally but whose dimensions are measured in a square domain. The only clouds whose zonal lengths can be accurately estimated are those not truncated by the western or eastern sides of the domain. Such clouds will be predominately

For a more in-depth exploration of the subtleties involved in correcting object size distributions, see Chap. 4 of the MS thesis by

To create an exponential distribution of cluster sizes, in Sect.

Interestingly, the

As in Fig.

Tables

Rate of reliable estimates of the power law exponent

Continuation of Table

Python code for calculating size distributions, which automates the procedures recommended for finite-domain effects, is freely available at

TDD: conceptualization, formal analysis, methodology and writing (original draft preparation). TJG: conceptualization, funding acquisition, supervision, methodology and writing (review and editing).

At least one of the (co-)authors is a member of the editorial board of

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Karlie N. Rees, Steven K. Krueger and Corey Bois all contributed to discussions about the research. The Center for High Performance Computing at the University of Utah provided data storage and computing services. George Craig and Theresa Mieslinger provided constructive feedback that improved the manuscript during review.

This research has been supported by the National Science Foundation (grant no. PDM-2210179).

This paper was edited by Peer Nowack and reviewed by Theresa Mieslinger and George Craig.