The impact of fluctuations and correlations in droplet growth by collision-coalescence revisited. Part II: Observational evidence of gel formation in warm clouds

In recent papers (Alfonso et al., 2013; Alfonso and Raga, 2017) the sol-gel transition was proposed as a mechanism for the formation of large droplets required to trigger warm rain development in cumulus 15 clouds. In the context of cloud physics, gelation can be interpreted as the formation of the “lucky droplet” that grows by accretion of smaller droplets at a much faster rate than the rest of the population and becomes the embryo for raindrops. However, all the results in this area have been theoretical or simulation studies. The aim of this paper is to find some observational evidence of gel formation in clouds by analyzing the distribution of the largest droplet at an early stage of cloud formation, and to 20 show that the mass of the gel (largest drop) is a mixture of a Gaussian and a Gumbel distributions, in accordance with the pseudo-critical clustering scenario described in Gruyer et al. (2013) for nuclear multi-fragmentation.


Introduction
A fundamental, ongoing problem in cloud physics is associated with the discrepancy between the times modeled and observed for the formation of precipitation in warm clouds.Observational studies show that precipitation can develop in less than 20 min.For example, in Göke et al. (2007), an analysis of radar observations in the framework of the Small Cumulus Microphysics Study (SCMS), demonstrated that maritime clouds increased their reflectivity from − 5 to +7.5 dBZ in a characteristic time of 333 s.Simulations of the collision and coalescence process, such as those described in the review published by Beard and Ochs (1993), require longer times for precipitation formation, unless giant nuclei (aerosols with diameters greater than 2 µm) are incorporated in the simulation.
Numerous mechanisms have been proposed to close the gap between observations and simulations.Some theories explain this phenomenon as an increase in collision efficiencies due to turbulence (Wang et al., 2008;Pinsky and Khain, 2004;Pinsky et al., 2007Pinsky et al., , 2008)), turbulence-enhanced collision rate of cloud droplets (Falkovich and Pumir, 2007;Grabowski and Wang, 2013) or turbulent dispersion of cloud droplets (Sidin et al., 2009).
More recent papers (Onishi and Seifert, 2016;Li et al., 2017, 2018, and Chen et al., 2018) also investigated the effect of turbulence in early development of precipitation.
Other research points to the supersaturation fluctuations resulting from homogeneous (Warner, 1969) and inhomogeneous mixing (Baker et al., 1980), which allow some droplets to grow faster by condensation in areas with larger supersaturation.Cooper (1989) found evidence of faster growth of the larger droplets due to the variability that results from mixing and random positioning of droplets during cloud formation.Shaw et al. (1998) explored the possibility that vortex structures in a turbulent cloud cause variations in droplet concentration and supersaturation (at the centimeter scale), allowing droplets in areas of higher concentration to grow more rapidly.Their calculations show an important widening of the spectrum from this mechanism.Roach (1976) showed that the growth of larger droplets increases due to radiative cooling at the top of stratiform clouds and the addition of sulfate cloud condensation nuclei (CCN), activated as droplets as a result of aqueous-phase chemical reactions (Zhang et al., 1999).In the same manner, Feingold and Chuang (2002) proposed the theory that certain organic compounds (filmforming compounds) can create a layer around droplets that inhibits their growth, causing a fraction of droplets to grow under conditions of higher supersaturation with the consequent widening of the spectrum.The existence of giant CCN is another of the proposed mechanisms.Even at concentrations as low as 1 L −1 , they can contribute significantly to the broadening of the spectrum (Johnson, 1982;Feingold et al., 1999;Yin et al., 2000;Van Den Heever and Cotton, 2007).
More recently, the sol-gel transition has been proposed as a possible mechanism for the formation of embryonic drops that trigger the formation of precipitation (Alfonso et al., 2010(Alfonso et al., , 2013)).Although this phenomenon is not as well known in the field of cloud physics, the sol-gel transition (also known as "gelation" in English-language literature), has been widely studied in other fields of research to explain, for example, the formation of planets (Wetherill, 1990) and aerogels in aerosol physics (Lushnikov, 1978) or the emergence of giant components in percolation theory (Aldous, 1999).
In the framework of cloud physics, the sol-gel phenomenon can be interpreted as the formation of the lucky droplet that becomes the embryo for raindrops and is defined by a transition from a continuous system of small droplets, to another system with a continuous spectrum plus a giant drop (runaway droplet, embryonic drop, gel) that interacts with the system increasing its mass by accretion with the smallest drops.Telford (1955) may be the first to propose the "lucky droplet" model for collision-coalescence of cloud droplets.One of the novelties of Telford's approach was to recognize the shortcomings of the "continuous growth model" and took into account the statistical fluctuations inherent to the collision-coalescence process and its discrete nature.He performed his analysis for a cloud consisting of identical 10 µm droplets together with collector drops with twice the volume (12.6 µm radius).From this initial bimodal distribution, he found that 100 of the 12.6 µm droplets per cubic meter (a 10 −6 fraction), will grow more rapidly than predicted by the continuous growth model, experiencing their first 10 coalescences after a time of approximately 5 min, while the time to undergo 10 collisions assuming continuous growth was about 33 min.
The lucky droplet model was further developed by Kostinski and Shaw (2005), who presented numerical evidence that their model can lead to a rapid development of precipitation.Their analysis was based on the derivation of the distribution of times for N collisions (which gave the result of an Erlang distribution).They concluded that the 10 −6 lucky droplets are expected to reach 50 µm 10 times faster than the average droplet.More recently, Wilkinson (2016) further ad-vanced the model by using large deviation theory (Touchette, 2009).He derived the probability for the time T to undergo N collisions being a very small fraction of its mean value and showed that the timescale for the initiation of precipitation is smaller than the mean time for a single collision.
The results obtained by Kostinski and Shaw (2005) were tested by Dziekan and Pawlowska (2017) by calculating the "luck factor", i.e., how much faster the luckiest droplets grow to r = 40 µm compared to the average droplets.They estimated that the luckiest 10 −3 fraction will cross the size gap around 5 times faster, and the luckiest 10 −5 fraction was around 11 times faster, in good agreement with the results obtained by Kostinksi and Shaw (2005) (about 6 and 9 times faster, respectively).
However, previous efforts in this direction were mainly focused on finding the distribution of times for N collisions (Telford, 1955;Kostinski and Shaw, 2005;Wilkinson, 2016), while we were concentrated on studying the lucky droplet size distribution to determine whether or not the runaway growth process due to collision-coalescence has started.
Recent studies that address the sol-gel transition interpretation in cloud physics (Alfonso et al., 2013;Alfonso and Raga, 2017) analyze the problem from the theoretical and simulation point of view.The aim of the present work here is to find observational evidence of gel formation, taking as a reference recent studies in percolation theory (Botet and Płoszajczak, 2005) and nuclear physics (Botet et al., 2001;Gruyer et al., 2013), which can shed some light on the gel (largest droplet) size distribution during the initial stage of precipitation formation.
The paper is organized as follows: Sect. 2 presents an overview of previous results for both infinite and finite systems.An analysis of the largest droplet distribution from synthetic data obtained from Monte Carlo simulations (for the product and hydrodynamic kernels, respectively) is presented in Sect.3. Sect. 4 is devoted to the analysis of experimental data.Finally, in Sect. 5 we discuss our results accompanied by the relevant conclusions.
2 An overview of previous theoretical and experimental results

Results for infinite systems in coagulation and percolation theory
The most commonly accepted approach to modeling the collision coalescence process in cloud models with detailed microphysics relies upon the Smoluchowski kinetic equation or kinetic collection equation (KCE), governing the time evolution of the average number of particles.The discrete form of this equation can be written as follows (Pruppacher and Klett, 1997): where N (i, t) is the average concentration of droplets with mass x i at time t, and K(i, j ) is the coagulation kernel related to the probability of coalescence of two drops of masses x i and x j .In Eq. ( 1), the first term on the right-hand side describes the average rate of production of droplets of mass x i due to coalescence between pairs of drops, whose masses add up to mass x i , and the second term describes the average rate of depletion of droplets with mass x i due to their collision and coalescence with other droplets.However, the KCE may have a serious limitation in some cases (Lushnikov, 2004) and hence cannot accurately describe the coagulation process.The limitation essentially lies in the fact that the coagulation equation inevitably creates particles with infinite mass.For example, for a multiplicative coagulation kernel (K(i, j ) = Cx i x j ), an attempt to calculate the second moment of the droplet mass spectrum: leads to the result Thus, after t = T gel , the second moment may become undefined, and the total mass of the system starts to decrease (see Appendix A for more details).This result applies to infinite (with negligible fluctuations and correlations) coagulating systems in the thermodynamic limit, which is the limit for a large number K of particles where the volume V is taken to grow in proportion with the number of particles.Then, in the limit K, V → ∞, K/V → N < ∞.The infinite system interpretation of the sol-gel transition assumes the presence of a gel phase (which is not predicted by the KCE equation) and introduces an additional assumption as to whether or not the gel interacts with the infinite size clusters that are not described by the KCE equation.
The other scenario considers that coagulation takes place in a system with a finite number of monomers in a finite volume.This approach is based on the scheme developed by Markus (1968) and Bayewitz et al. (1974) and was studied by Lushnikov (1978Lushnikov ( , 2004)), Tanaka andNakazawa (1993, 1994), and Matsoukas (2015) by using analytical tools and more recently by Alfonso (2015) and Alfonso and Raga (2017) numerically.Within this approach there is no mass loss, and the phase transition is manifested in the emergence of a giant particle that contains a finite fraction of the total mass of the system.Solutions in the post-gel regime were obtained analytically by Lushnikov (2004) and Matsoukas (2015) and numerically by Alfonso and Raga (2017).
The sol-gel transition has been observed experimentally.For example, aerogels in aerosol physics (Lushnikov et al., 1990) and in other theoretical models, such as that of percolation (Botet and Płoszajczak, 2005;Kolb and Axelos, 1990), where there is a close analogy between percolation and droplet coagulation.In bond percolation, each lattice corresponds to a monomer, and a proportion p of active bonds is set randomly between sites.Then clusters of size s are defined as an ensemble of s-occupied sites connected by active bonds.For a definite value of p = p c , a macroscopic cluster appears, corresponding to the sol-gel transition.
Recent results in percolation theory show that the largest cluster follows the Gumbel distribution for subcritical percolation (Bazant, 2000) and, at the critical point, follows the Kolmogorov-Smirnov (K-S) distribution (Botet and Płoszajczak, 2005).The K-S distribution is the distribution of the maximum value of the deviation between the experimental realization of a random process and its theoretical cumulative distribution, and it has following the cumulative distribution: or the equivalent expression: Botet and Płoszajczak (2005) also found evidence (from numerical solutions of the KCE equation) that, for multiplicative coalescence (with a collection kernel proportional to the product of the masses), the largest cluster follows the distribution in Eqs.(5a) and (5b) at the time of the phase transition.At this point, a hypothesis is formed in which the results obtained in percolation are extrapolated in order to find the probability distribution of the largest (runaway) droplet at t = T gel .

Some theoretical and experimental results for finite systems in coagulation theory and nuclear physics
We will now consider some results obtained for finite systems in coagulation theory (Botet, 2011) and in nuclear physics (Gruyer et al., 2013).Unlike those in infinite systems, fluctuations and correlations in a finite system are not negligible.
We must emphasize that phase transitions cannot take place in a finite system.This is due to the fact that a phase transition is defined as a singularity in the free energy or any thermodynamic property of a system.For finite-sized L. Alfonso et al.: The impact of fluctuations and correlations -Part 2 systems, the free energy is proportional to the logarithm of a finite number of exponentials, which are always positive (Bhattacharjee, 2001).Consequently, those singularities are only possible within infinite systems by taking the thermodynamic limit.Thus, for finite systems, the notion of pseudocritical region is introduced (which is the finite system equivalent of a sol-gel transition time).
Some interesting simulation and experimental results were obtained for these systems in Botet (2011) for the Smoluchowski model (1) and in Gruyer et al. (2013) for nuclear multi-fragmentation.Botet et al. (2001) found, from stochastic simulations of coagulation process with the product kernel (for a system of N = 512 monomers), that the distribution of the largest cluster in the pseudo-critical region can be described as a mixture of the well-known Gaussian and Gumbel distributions: In Eq. ( 6), the coefficients θ and (1 − θ ) are the mixture weights (probabilities associated with each component).The individual distributions Gumbel(x, µ 1 , β) and Gauss(x, µ 2 , σ ) are the mixture components.
The Gumbel distribution is one of the asymptotic distributions from extreme value theory (EVT) and has the following form: where µ is the position parameter and β the scale parameter.
The distribution in Eq. ( 6) has its origin in the fact that, for finite systems in the pseudo-critical zone, the system experiences large fluctuations and the gel distribution is a combination of both distributions, a Gumbel and a Gaussian (Gruyer et al., 2013).A similar result was obtained by Botet (2011) using synthetic data from stochastic simulations, for collision probabilities proportional to the product of the masses.The fundamental hypothesis of our work is that the gel mass (largest drop) in the initial phase of precipitation formation is distributed as a mixture of two asymptotic distributions: one Gumbel and one Gaussian, following the pseudocritical clustering scenario described in Gruyer et al. (2013).
3 Analysis of the largest droplet distribution obtained from synthetic data 3.1 Results for the product kernel (K(i, j ) = Cx i x j ) For synthetic data analysis, the empirical distributions of the largest droplet mass (M max ) were obtained from Monte Carlo simulations, following Botet (2011).The species-accounting formulation (Laurenzi et al., 2002) of the stochastic simulation algorithm (SSA) of Gillespie (1975), which rigorously accounts for fluctuations and correlations in a coalescing sys-tem, was used for the stochastic simulation in this work (see Appendix B).
The main difference between the Gillespie's SSA and other Monte Carlo methods based on the simulation particles (SIPs) approach (like the super droplet method developed by Shima et al., 2009) is that the Gillespie's SSA involved the collision of only two physical particles (droplets in our case) per MC cycle, while the approach based on SIP in each MC cycle collides SIP (super droplets, for example), which represents multiple numbers of droplets with the same attributes (radius r or mass in the simplest case) and position.However, Gillespie's SSA works perfectly for our purposes because, due to the finiteness of our systems, our simulations are performed for small volumes with a small number of droplets (in the range 50-300 cm −3 ).
Our methodology of synthetic data analysis consists of generating N-realizations (at each time step) using the algorithm of Gillespie.For each realization, there is a certain distribution of droplets.The largest droplet mass obtained from each distribution at each realization (for a fixed time step) would be the distribution to be fitted to the distribution in Eq. ( 6).Thus, the sample size would be equal to the number of realizations of the Monte Carlo algorithm.
Simulations were performed for the product kernel (K(i, j ) = Cx i x j ), with an initial mono-disperse distribution of 100 droplets of 14 µm in radius (droplet mass 1.15 × 10 −8 g) in a cloud volume of 1 cm 3 , with C = 5.49 × 10 10 cm 3 s −1 .
The product kernel is proportional to the product of the masses of the colliding droplets.It is widely used because analytical solutions of the KCE or Smoluchowski equation (Eq. 1) have been obtained for this kernel by Golovin (1963), Scott (1968), Drake (1972), and Drake and Wright (1972).The value of the constant C (C = 5.49 × 10 10 cm 3 g −2 s −1 ) in the product kernel is the result of the polynomial approximation K(x, y) = A + B(x + y) + Cxy (Long, 1974) of the hydrodynamic collection kernel (Eq.11).
The empirical distribution of the maxima was obtained for 1000 realizations of the stochastic algorithm.There is no need for a larger number of realizations to get better statistics, since the number of realizations in our Monte Carlo algorithm must be equal to the sample size in the application of the block maxima (BM) approach (see the next section for more details).On the other hand, this number is not much bigger than the number of blocks in the data for which the largest droplet maxima was fitted to fog data.
Figure 1a-d present the largest droplet mass empirical distributions obtained at four different times.Note that Eq. ( 6) provides a good fit for the distribution of the mass of the largest droplet (M max ) both around and far from the sol-gel transition time (T gel ), which was calculated from Eq. ( 4) and found to be equal to 1378 s.
Figure 2 presents the time evolution of the coefficient θ , which represents the mixing fraction in Eq. ( 6) for the time interval [500 s, 2000 s].Despite the noisy behavior of the co- efficient θ (due to the finiteness of the system), there is a decreasing trend with time, showing larger values of θ (∼ 0.65) for times close to 500 s and values down to 0.2 at the end of the time interval.This figure indicates that, although the largest droplet distribution is adequately described by a mixture of Gaussian and Gumbel distributions, it has a larger Gumbel component (see Eq. 6) during the early stages of the coagulation process.As time progresses, the Gaussian contribution becomes more important (smaller values of θ ) in providing a better fit to the largest droplet mass distribution.
These findings are in accordance with Gruyer et al. (2013) and Botet (2011): at an early stage of coagulation development, correlations are negligible, and, consequently, the largest fragments can be considered independent random variables.Therefore, the probability distribution of the largest fragment is given by the limit theorem for extremal variables, which states that the maximum of sampleindependent and identically distributed random variables can only converge in distribution in the form of one of three possible distributions: Gumbel, Fréchet or Weibull.
As the coagulation process continues, fluctuations and correlations between droplets increase and the system reaches a critical point (Alfonso and Raga, 2017).Where the largest droplets are no longer independent random variables, the limit theorem for extremal variables no longer applies, and the largest droplet distribution is no longer described by a Gumbel distribution.At later times, away from the pseudocritical region, the Gaussian contribution is the most important part of the largest droplet mass distribution.This can be explained by the additive nature of the process at this stage  6), obtained for a simulation with the product kernel.(Botet, 2011;Gruyer et al., 2013;Clusel and Bertoin, 2008), and the central limit theorem applies.
In the intermediate zone (which can be defined as the pseudo-critical zone), the distribution is well described by a mixture of Gumbel and Gaussian distributions and the weights associated with each distribution are comparable.It is expected that it can be observed that θ = 0.5 at the infinite system critical point, T gel , found to be 1378 s from Eq. ( 4).However, due to the finiteness of the system, the critical point corresponds approximately to a value θ = 0.35 (see Fig. 2).
We can find whether or not a system is in the pseudocritical region by defining the following ratio (Botet, 2011;Gruyer et al., 2013): where w Gumbel = θ and w Gaussian = 1 − θ are the relative weights of the Gumbel and Gaussian distributions, respectively (see Eq. 6).By definition, η = +1, −1 corresponds to pure Gaussian and Gumbel distributions.If −1 < η < 1, the system is in the pseudo-critical domain.
Alternatively, Botet (2011) estimates the limits of the pseudo-critical region as the times when the largest droplet mass standard deviation σ (M max ) calculated from Eq. ( 9) is small.
In Eq. ( 9), N r is the number of iterations of the stochastic simulation algorithm of Gillespie (1975), M max the mass of the largest particle and M max its ensemble mean over all the realizations.Even though the second moment of the distribution M 2 (τ ) diverges (see Eq. 3) for the infinite system, there is no di-vergence of the second moment for a finite system (with no critical behavior).For that case, the standard deviation for the largest particle mass (σ (M max )) is expected to reach a maximum in the vicinity of T gel = [CM 2 (t 0 )] −1 .Moreover, computing the time evolution of the normalized standard deviation σ (M max )/ M max instead of σ (M max ) yielded better results in estimating T gel in Inaba et al. (1999), Alfonso et al. (2008Alfonso et al. ( , 2010Alfonso et al. ( , 2013)), and Alfonso and Raga (2017).
Figure 3a shows the time evolution of σ (M max )/ M max , as an example for the system defined at the beginning of this section.Note that the maximum occurs at T = 1315 s, close to T gel = 1378 s calculated from Eq. ( 4), and the time when the maximum of σ (M max )/ M max occurs is a reliable estimate of the sol-gel transition time for the corresponding infinite system.Botet (2011) defines σ = 0.1σ max as the limits of the pseudo-critical interval, which corresponds to t inf = 0.37T gel and t sup = 1.5T gel (see Fig. 3b).While Eq. ( 8) could be used to determine if a sample collected inside a cloud is in the pseudo-critical region, Eq. ( 9) implies that the time evolution of σ (M max ) is needed, and therefore a practical application is only viable in the case of synthetic data obtained from stochastic simulations or cloud droplet data collected dynamically at different times or cloud levels.

Numerical results for turbulent conditions
In our simulations, turbulent effects were considered by implementing the turbulence-induced collision enhancement factor P Turb (x i , x j ) that is calculated in Pinsky et al. (2008) for a cumulonimbus with dissipation rate ε = 0.1 m 2 s −3 and Reynolds number Re λ = 2 × 10 4 and for cloud droplets with radii ≤ 21 µm.The turbulent collection kernel has the following form: where K g (x i , x j ) is hydrodynamic kernel, which considers collisions between droplets under pure gravity conditions and has the following form: The hydrodynamic kernel takes into account the fact that droplets with different masses (x i and x j and corresponding radii, r i and r j ) have different terminal velocities V (x i ), which are functions of their masses.In Eq. ( 10), E(r i ) and E(r j ) are the collection efficiencies calculated according to Hall (1980).Monte Carlo simulations were performed with an initial bi-modal distribution (200 droplets of 10 µm in radius and 50 droplets of 12.6 µm) for a cloud volume of 1 cm 3 .
As we want to perform simulations for small systems (with a small number of particles) for which fluctuations and correlations are relevant, the number of droplets per cubic centimeter used in the simulations are small.They are of the Figure 3.For the finite system, the normalized standard deviation σ (M max )/ M max of the largest droplet mass versus time (a).The initial number of droplets was set equal to N = 100 droplets of 14 µm in radius in a volume of 1 cm 3 .Simulations were performed with the product kernel K(i, j ) = Cx i x j (with C = 5.49 × 10 10 cm 3 g −2 s −1 ), and N r = 1000 realizations of the stochastic algorithm were performed.The maximum value of σ (M max )/ M max is found to be 1315 s (dashed vertical line) and is very close to the sol-gel transition time (continuous vertical line) for the infinite system (1378 s).In panel (b) the small end of the pseudo-critical domain is estimated as the time where σ (M max ) = 0.1σ max .
same order of the droplet concentrations for each block obtained from observations, which fluctuate between 0 and 392 cm −3 , with an average of 146 cm −3 (see Fig. 6).
The empirical distribution for the largest droplet mass was generated by extracting the maximum from the droplet distribution at each realization for a fixed time step.Additionally, the ratio σ (M max )/ M max is evaluated from 1000 realizations of the Monte Carlo algorithm (see Fig. 4), which reaches its maximum at around 1815 s and serves as an estimate for the sol-gel transition time for the infinite system.Four empirical probability distributions were fitted to the combined distribution (Eq.6) for times in the vicinity of T gel .The results are displayed in Fig. 5a-d.Note also that for this case, the combined distribution (Eq.6) provides a good fit for the largest droplet mass.Moreover, the coefficient θ decreases in time (check Fig. 5), in concordance with the scenario described in Sect.3.1.

Analysis of the largest droplet (gel) radius distribution from observations
In this section, the methodology of analysis described before is applied to a dataset of cloud droplet size distribution (2-50 µm) collected with a Droplet Measurement Technologies fog monitor (FM-120) installed on a hilltop in Are, Sweden.The FM-120 is a single-particle optical spectrometer (Spiegel et al., 2012) that derives size from light scattered from individual droplets that pass through a focused laser beam.The equivalent optical size ranges from 2 to 50 µm.The fog monitor sample volume has a cross-sectional area of 0.25 mm 2 and a flow speed of 14 m s −1 .The raw data consist of each droplet's radius and inter-arrival time (elapsed time since previous particle).More than 7 million droplets were processed over a period of 4 h.The block maxima (BM) approach in extreme value theory (EVT) was applied, which requires dividing the observation period into nonoverlapping periods of equal size and www.atmos-chem-phys.net/19/14917/2019/show the simulated M max distributions in a system with an initial bi-disperse distribution (200 droplets of 10 µm in radius and 50 droplets of 12.6 µm) at four different times.The solid line shows the fit using Eq. ( 6).The simulations were performed for the turbulent hydrodynamic kernel.restricts attention to the maximum observation in each period (see Gumbel, 1958).
Following the BM approach, considering the sectional area and flow speed, the time series was divided into consecutive unit blocks of 1 cm 3 in volume, corresponding to a cloud length of approximately 400 cm (∼ 0.3 s interval in the time series).The droplet distributions in each unit block are equivalent to the distributions obtained for each realization (for a fixed time) of the Monte Carlo algorithm described in the previous section, and each block can be interpreted as an independent realization of a stochastic process.
The maximum (radius of the largest droplet) is recorded from each consecutive unit block in order to generate the distribution for comparison with the theoretical combined dis-tribution described in Eq. ( 6).The sample size corresponds to the number of consecutive blocks in which the time series was divided, which in this case is 49 647 blocks, equivalent to about 4 h of data.Figure 6 displays the number of droplets in each block, which fluctuate between 0 and 392, with an average of 146.Since each block is considered a realization of a random process, the largest droplet radius series must be fitted to the combined distribution in Eq. ( 6) for samples with certain conditions of homogeneity.
The average sample size (number of unit blocks) for which the largest droplet maxima can be fitted to the combined distribution in Eq. ( 6) is then estimated.This expected value can be calculated from the following procedure.The conditional probability P (Admixture |x ), where x is the sample size, is calculated using Monte Carlo simulations.This calculation uses a given number of consecutive blocks with a mixture of distributions.The simulations are carried out by randomly choosing N total samples from the measurements (that consist of consecutive blocks) of size x, fitting the data to the distribution in Eq. ( 6), and determining if they do or do not follow that distribution.The decision is based on application of the Kolmogorov-Smirnov (K-S) goodness of fit test for a confidence level α = 0.05.The experimental statistics for the K-S test can be obtained by arranging the data in ascending order (x 1 , x 2 , . .., x n ) and deriving the maximum difference between the rank statistics (i −1)/n and the theoretically calculated cumulative density function F (x i ): If this value of D n is smaller than a certain threshold value D α n , we accept that the data obey the probability distribution under consideration, and the null hypothesis H 0 cannot be rejected at a significance level α.The significance level α refers to the probability of the assumed distribution pattern being rejected.The limiting values of D α n can be calculated from the K-S cumulative distribution (see Eqs. 5a and 5b).Tables with limiting values can be found in, e.g., Gnedenko (2017).
However, given that the parameters of the distribution F (x) were estimated from the observed data, theoretical limiting values provided by the K-S cannot be used.In this case, the limiting values D α n are smaller than the case with known parameters and must be obtained via Monte Carlo simulations (see Appendix C for more details).Thus, the conditional probability can be calculated as follows: where N 0 is the number of cases for which the null hypothesis H 0 ) at α = 0.05 cannot be rejected.However, what is really needed is the conditional probability P (x |Admixture ), which is the probability that a sample has size x, given that the data (viewed as a time series of maxima for each block) in that sample follow a mixture of distributions.This probability can be calculated using Bayes' theorem from the following expression: By writing this theorem in the form ( 14), we are assuming that the marginal likelihood is considered a normalization factor.Therefore, P (x |Admixture) can be computed using expression ( 14) and then normalized under the requirement that it is a probability mass function (pmf).In Eq. ( 14), the prior probability π(x) is assumed to have a uniform distribution.Thus, the expected value x can be calculated from the following expression: Turning to a concrete example, N total = 100 samples with sizes x = 100, 200, . .., 1000 were randomly selected from the data, and the probability P (Admixture |x ) calculated following Eq.( 13).The probability mass function P (x |Admixture) (pmf) was obtained by applying the procedure previously described and the expected value was found to be x = 544 (about 163 s).
A thorough statistical analysis was conducted by fitting M max to the combined distribution in Eq. ( 6) for 100 samples with sizes at and below the average (100, 200, 300, . . ., 500) that were randomly selected from the entire dataset (49 647 blocks).For each random sample three null (H 0 ) hypotheses were verified: (i) the sample comes from a mixture of distributions (Eq.6), (ii) the sample comes from a Gumbel distribution or (iii) the sample comes from a Gaussian distribution.The three hypotheses were examined by the K-S method with limiting values calculated from Monte Carlo simulations (see Table C1).
The results for sample sizes 100, 200, 300, 400 and 500 are shown in Table 1.As an example, for case 1 (sample size 100) the null hypothesis H 0 at α = 0.05 was rejected for 13, 35 and 92 samples for the mixture, Gaussian and Gumbel distributions, respectively.For case 2 (sample size 200), the null hypothesis was rejected for 27, 58 and 96 samples.Using n = 500 for the mixture of distributions (Eq.6), the null hypothesis H 0 was rejected for 50 samples.For the Gumbel distribution, the null hypothesis was rejected for all the samples (100) and the null hypothesis for the Gaussian distributions was rejected for 83 samples.
The results shown in Table 1 confirm that, for all sample sizes, the mixture of distributions provides a better fit than the Gumbel and Gaussian distributions, confirming the correctness of the choice of the mixture of distributions (Eq.6) for modeling the largest droplet radius.As an example, Fig. 7ad present, for a sample size of n = 500, the largest droplet mass empirical distributions obtained for four different samples that are distributed following the mixture and the corresponding fit of Eq. ( 6).

Discussion and conclusions
An infinite system has two possible evolutionary phases: the ordered phase and the disordered or statistical phase.In the disordered phase there is a continuous droplet distribution and a near equality of the largest and second-largest mass.After the sol-gel transition, there is an ordered phase characterized by the existence of a giant macroscopic droplet (gel) coexisting with an ensemble of microscopic particles.
A finite system can be in the ordered, disordered and pseudo-critical phases, according to the scenario described in Botet (2011) and Gruyer et al. (2013).The ratio η, defined in Eq. ( 8), takes values between η = +1, −1, which correspond to pure Gaussian and Gumbel distributions, and when −1 < η < 1 the system is in the pseudo-critical domain.In the disordered phase, fluctuations and correlations are negligible, there are only a few collision events, and M max is the largest part of randomly distributed droplets.In that case, the distribution of the mass of the largest droplets follow a Gumbel distribution.At later times in the evolution of the finite system, there are many collision events and M max is the result of the coalescence of other droplets.There is an additive process, the central limit theorem applies and the mass (or radius) of the largest droplets follows a Gaussian distribution.
In the pseudo-critical phase, the fluctuations and correlations are no longer negligible and the distribution is of neither of the asymptotic forms (Gumbel or Gaussian).In this case, the fit of the largest droplet mass (gel), is a mixture of a Gumbel (disordered state) and Gaussian (ordered state) distributions.As was demonstrated in the preceding section, this combined distribution (Eq.6) is a good approximation to the largest droplet distribution (gel) in the pseudo-critical region.The fact that the mixture of distributions provides a better fit than the Gumbel and Gaussian distributions shows that the samples selected in our study are mainly in the pseudocritical phase.To confirm this fact, the ratio η was calculated for 1000 samples of size n = 500 selected randomly from the data.Figure 8 shows that for 90 % of the samples the ratio η lies in the interval [−0.9, 0.9], clearly indicating that samples are in the pseudo-critical region.
We could show that the gel radius (largest droplet) is described as a mixture of the two asymptotic distributions because the effect of the collision-coalescence process was in some way isolated for the orographic cloud data analyzed in this report.A similar analysis could be performed for the early stage of a convective cloud formation, before some other processes, e.g., entrainment, mixing, turbulence or ice formation, could obscure the finite system pseudo-critical scenario, and the gel formation that is basically a consequence of the collision-coalescence process could no longer be observed.In this work, the early stage of formation of a warm cloud is viewed in the context of critical phenomena theory and can be thought of as being in ordered, disordered or pseudocritical phases.The disordered phase corresponds to a cloud with a droplet spectrum formed mainly by the cloud condensation nuclei activation process, with an almost random distribution of particles, and the distribution of the mass of the largest droplets is Gumbel.In the pseudo-critical phase a giant droplet (gel) locally coexists with a continuous ensemble of small droplets.As the system considered is finite, there is no sudden change from disordered to ordered phase (solgel transition), but instead there is a pseudo-critical phase in which fluctuations are important and the gel distributes according to Eq. ( 6).The analysis presented here of the largest droplet distribution provides useful insight into the early stages of cloud development in warm clouds.In follow-up studies, the analysis of cloud data at different times or distances from the cloud base would be helpful in identifying the pseudo-critical phase and tracking the transition from the disordered to the ordered phase dynamically.

Figure 1 .
Figure 1.Panels (a)-(d) (histograms) show the largest droplet mass distributions calculated from Monte Carlo simulations at four different times, for a system with an initial mono-disperse distribution of 100 droplets of 14 µm in radius.The solid line shows the fit using Eq.(6).

Figure 2 .
Figure 2. Time evolution of the coefficient θ in Eq. (6), obtained for a simulation with the product kernel.

Figure 4 .
Figure 4. Time evolution of the normalized standard deviation σ (M max )/ M max of the largest droplet mass versus time estimated from the Monte Carlo algorithm.The simulations were performed for the turbulent hydrodynamic kernel with a bi-disperse initial condition (200 droplets of 10 µm in radius and 50 droplets of 12.6 µm) in a volume of 1 cm 3 .

Figure 5 .
Figure 5. Panels (a)-(d) (histograms) show the simulated M max distributions in a system with an initial bi-disperse distribution (200 droplets of 10 µm in radius and 50 droplets of 12.6 µm) at four different times.The solid line shows the fit using Eq.(6).The simulations were performed for the turbulent hydrodynamic kernel.

Figure 6 .
Figure 6.Time series of the number of droplets per block, sampled at a hilltop in Are, Sweden.

Figure 7 .
Figure 7.For four random samples that are distributed following the admixture distribution (with sample size 500), observed (histogram) and fitted (solid line) using Eq.(6).Also shown for each distribution are the p value of the goodness of fit test and the parameter θ indicating the weight of the Gumbel component.

Figure 8 .
Figure 8. Histogram of the ratio η = (w Gaussian − w Gumbel )/ (w Gaussian + w Gumbel ), which measures the distance to the critical point.

Table 1 .
For each sample size, the number of samples with the null hypothesis H 0 rejected at α = 0.05 for all the distributions.