A new Description of Probability Density Distributions of Polar Mesospheric Clouds ( PMC )

In this paper we present a new description about statistical probability density distributions (pdfs) of Polar Mesospheric Clouds (PMC) and noctilucent clouds (NLC). The analysis is based on observations of maximum backscatter, ice mass density, ice particle radius, and number density of ice particles measured by the ALOMAR RMR-lidar for all NLC seasons from 2002 to 2016. From this data set we derive a new class of pdfs that describe the statistics of PMC/NLC events which is different from previously statistical methods using the approach of an exponential distribution commonly named g-distribution. 5 The new analysis describes successfully the probability statistic of ALOMAR lidar data. It turns out that the former g-function description is a special case of our new approach. In general the new statistical function can be applied to many kinds of different PMC parameters, e.g. maximum backscatter, integrated backscatter, ice mass density, ice water content, ice particle radius, ice particle number density or albedo measured by satellites. As a main advantage the new method allows to connect different observational PMC distributions of lidar, and satellite data, and also to compare with distributions from ice model 10 studies. In particular, the statistical distributions of different ice parameters can be compared with each other on the basis of a common assessment that facilitate, for example, trend analysis of PMC/NLC.


Introduction
First studies of probability distributions of Polar Mesospheric Clouds (PMC) and noctilucent clouds (NLC) were reported by Thomas (1995) using data from the UVS instrument on board the Solar Mesosphere Explorer (SME) satellite and from the Solar Backscatter Ultraviolet (SBUV) instrument on the Nimbus-7 satellite over the period [1978][1979][1980][1981][1982][1983][1984][1985][1986], measuring scattered limb albedo at 265 nm and nadir albedo at 273.5 nm, respectively.Thomas (1995) introduced empirical measures in the statistical analysis of PMC brightness distributions.He showed that the frequency distribution of PMC albedo derived from both SME and SBUV satellite data can be approximated by a (normalized) exponential probability function, see Figure 3 in Thomas (1995).Secondly, the author also proposed to use cumulative frequency numbers (the so-called g-function) of clouds g(A) exceeding a certain albedo A, in order to better represent the exponential populations.Examples of g-distributions are The general form of the exponential distribution E(x) with scale parameter α > 0 is defined as a probability density function (pdf) given by E(x) = α exp(−αx) which fulfills the normalization condition of a pdf with ∞ 0 E(x)dx = 1.Thomas (1995) defined the g-function g(x) as the cumulative probability E cum with (1) Taking the logarithm of E yields a straight line ln(E) = ln α − αx.For a given class of values [x 1 ; x 2 ] the likeliness of this class is proportional to the area enclosed by the continuous probability distribution and is obtained by integrating E on the segment length (bin size) ∆x = x 2 − x 1 as x2 x1 Edx = −e −αx2 + e −αx1 .A statistical analysis of ice parameters has to take into account the aspect of specific sensitivities of different instruments.For example the ALOMAR lidar is generally sensitive to a backscatter signal larger than a threshold about 2 -3 •10 −10 m −1 sr −1 (Fiedler et al., 2017).When considering a threshold (x th ) the exponential pdf E(x) is normalized according to A ∞ x th E(x)dx = 1 with a scaling factor A = exp(αx th ).We summarize the properties of the exponential distribution taking into account a threshold in Appendix A.
For a given data sample x i (i = 1, ..., N ) assuming a threshold x i > x th we use the common estimates of mean m and variance s 2 (standard deviation s) with m = 1 In addition we also calculate the median m and mode m which is the value that occurs most frequently in the data sample.
Hence testing a data sample to be exponentially distributed means that mean, median and standard deviation of the sample have to fulfill the following identity: We will use this condition to analyze the ALOMAR data with respect to possible exponential (g-function) distributions.
3.1 Analysis of ALOMAR data on exponential distributions (g-function)

Analysis of maximum backscatter data
We investigate the frequency distribution of maximum backscatter (MBS) data in units of 10 −10 m −1 sr −1 .We assume a threshold of 3 that corresponds to the instrumental sensitivity of the ALOMAR lidar.Then we sort the x=MBS data to a bin size of one per class starting from the threshold value and calculate a frequency histogram.Finally, we normalize the histogram so that the sum of all frequency classes equals one.
Figure 1a shows the frequency distribution of x=MBS data in a semi-logarithm scale.The first impression is that the data points are almost perfectly approximated by a linear regression besides some statistical noise.This indicates that a exponential function describes the distribution of data with a high accuracy.Figure 1b shows the distribution histogram in a original nonlogarithmic representation.We see that the exponential fit matches the data histogram with a high precision.Consequently, the relative error is rather small (6.5 %).The high quality of the fit is also supported by the fact that theoretical mean, median, mode, and standard deviation (µ, ν, η, σ) using Eq.(A1-A4) and estimates of mean, median, mode, and standard deviation ( m, m, m, s) from the data sample derived from Eq. (3) all coincide within their error bars.Now we perform the proposed exponential (g-function) test with m − x th = s = ( m − x th )/ ln(2), see Eq. ( 4), and insert the values from the data sample of mean ( m = 12.0 ± 0.3), median ( m = 9.0 ± 0.4), and standard deviation (s = 9.2 ± 0.5).The error uncertainties have been estimated with bootstrap methods.We find that m − x th = 12.0 − 3 = 9 ± 0.3, s = 9.2 ± 0.5, and ( m − x th )/ ln(2) = (9.0− 3)/0.69315 = 8.7 ± 0.6.Hence the identity is fulfilled when allowing for uncertainties introduced by statistical errors.We conclude that lidar MBS-data are very likely exponentially distributed and follow a g-function, respectively.

Analysis of ice mass density, ice radius, and ice number density data
Now we investigate other ice parameters from the ALOMAR data set with respect to exponential distributions, namely the frequency distributions of ice mass density (IMD) in units of mg • m −3 (threshold 20, bin size of 2), ice radius r in units of nm (threshold 20, bin size of 1) and ice number density n in units of cm −3 (threshold 30, bin size of 10).We will show that these parameters do not follow an exponential distribution (g-function) as expected.In Figure 2a we plot the frequency distribution for y = IMD data in a semi-logarithmic scale.Obviously, the data points have no dominant linear shape.There exist systematic deviations between data and theoretical exponential fit.In comparison to the fit curve, data points are systematically smaller at y = 20 -40.Vice versa, data points exceed substantially fit values in the range y = 40 -90.Also, frequencies in all classes below the threshold are significant smaller than a proposed exponential fit.Indeed, the frequency histogram in the nonlogarithmic frame shows these systematic deviations between data and exponential fit even more pronounced, see Figure 2b.
With a relative error of about 19 % the exponential curve fails to fit satisfactorily the data.Also significant differences exist between fit and data parameter of mean, median, mode, and standard deviation (Figure 2b).Finally, we apply the exponential   fits with e.g.relative errors about 29 %, indicating that also both r and n are very likely not exponentially distributed.This is supported by the fact that the test of mean, median, and variance fails again and shows large inequalities.
We summarize that ice mass density, ice radius, and ice number density do not follow an exponential distribution in contrast to maximum backscatter.In the following section we will show that this is reasonable and is based on the fact that a functional link between MBS to the other data sets of IMD, r and n does miss a linear relationship.

Test on linearity between maximum backscatter and ice mass density, ice radius, ice number density data
Linearity between maximum backscatter (MBS) and ice mass density (IMD), ice radius r and ice number density n data is a necessary and sufficient condition that also IMD, r and n data samples are exponentially distributed, see also next section.In the following we will test this constraint.Figure 3a shows a scatter plot in a logarithmic frame for simultaneously measured MBS and IMD data.In order to test a linear relationship between x=MBS and y=IMD we introduce a general fit function described by a power law condition as with the two constants c (linear constant) and d (power constant).Only for d = 1 we expect a perfect linear dependence between x and y.

A new probability density function for PMC parameters
In this section we will present the major part of the new statistical approach in order to describe frequency distributions of different PMC/NLC parameters.
There exists a general mathematical method ('integration by substitution') that provides the opportunity to transform between probability density functions (pdf) with different statistical variables.This is done by the following procedure: Assuming a given pdf P (x) with variable x, then the transformation from x to a new variable y(x) with a new pdf Q(y) is specified by with x(y) being the inverse function of y(x).Here the absolute value of the derivative ∂x/∂y has to be calculated so that the new pdf Q is defined positively everywhere.In order to apply this approach one needs generally two requirements: (1) Any transformation between the two pdfs P and Q needs an initial guess in one of the two pdfs, either P or Q.
(2) An analytic formula of a forward and backward model must be available that describes the functional dependence between the two statistic ice variables x and y.In the following we discuss how we satisfy these two requirements.
We apply this method for two ice parameters, namely MBS with variable x and an unknown ice parameter named u (e.g. this unknown ice parameter might be ice particle radius).For condition (1) we use the hypothesis that the distribution of maximum backscatter data (MBS) is perfectly represented by an exponential pdf and its cumulative distribution is described by a gfunction according to Eq. (1).For condition (2) we assume a power form of a fit function used in Eq. ( 5) that also allows to calculate analytically the inverse function.We discuss a suitable justification of this assumption in Sect.6.2.Hence the forward model is u(x) = cx d and the backward model is x(u) = (u/c) 1/d .Then the new distribution U for the arbitrary ice parameter u using Eq. ( 6) is given by  7) can be simplified to a more general form with In a next step we introduce in an arbitrary manner a third ice parameter named z, for which we assume again the same power law (Eq.5) now valid between z and u as Again we apply Eq. ( 6) and calculate the unknown pdf Z(z): At first glance the algebraic expression for Z looks particulary complex, but Z can be transformed to a general form with Equation ( 9) represents our final result.The pdf Z(z) describes the general form of the new statistical distribution.Note that the algebraic expressions of Eq. ( 8) and Eq. ( 9) formally coincide.This means that any probability distribution of a new ice parameter that is connected to other ice parameters through our functional power law (Eq.5), can be described by the general pdf given by Eq. ( 9).The constants a and b represent two free parameters in the Z-distribution which we name the scale parameter a and the shape parameter b.Obviously, the Z-pdf is identical with an exponential pdf (or g-function) in the limit b = 1.This shows the close interconnection of the new Z-pdf to the commonly used exponential (g-function) approach.We will show in the following that any distribution from so different ice parameters as maximum backscatter, ice mass density, ice radius, and number density of ice particles can be described on a uniform basis with a high accuracy by Z. Vice versa this indicates that these ice parameters are connected depending on each other by the uniform power law relation (Eq.5), more details are discussed in Sect.6.2.
5 Application of the Z-distribution to real data

General properties of the Z-distribution
In this section we first show some general characteristics of the new Z-distribution.From these properties we derive conditions and constraints that will allow to estimate the specific values of the two free constants in Z, the scale parameter a and shape parameter b, for a given data sample.
First we show that Z is a correct pdf satisfying the normalization condition (10a) For a negative b the distribution Z is described by The cumulative form of Z for b > 0 is given by For b < 0 we have to choose the cumulative calculation in reverse order starting the integration at zero.Naming the reverse cumulative with index zero as Z 0 cum we get Only, the cumulative descriptions from Eq. (11a,b) allow in principle to estimate roughly the constants a and b for a given data sample using the double logarithmic functional dependence, whereas the direct logarithm of Z (Eq 10a,b) offers no possibility to solve for a and b.However, the method calculating the double logarithmic cumulative is not recommended.
Several numerical tests showed that a stable estimation of a and b from noisy data applying this double logarithmic approach is an almost impossible task.Instead, we propose two different methods that rely on much more powerful principles (see next Sect.5.2).Additionally, we have to take care about a possible negative value of b that can be only identified using Eq.(11b).
In fact such a case occurs in the analysis of ALOMAR data.We will give in Sect.5.3 an example that only a negative slope parameter describes the distribution of number density of ice particles.
Generally, the Z-distribution has the ability to characterize many different types of distributions, see   10).(d-f) same but for ln(Z) from Eq. ( 10).
It is interesting to note that our new Z-distribution is closely related to a more general Weibull-distribution (Wilks, 1995).
Nevertheless there is a difference concerning the shape parameter b which in our case is not only defined for positive values but also for negative values.Such a case is disregarded by a classical 2-d Weibull-distribution.
Now we shortly summarize the mathematical descriptions of median, mode, mean, variance, and standard deviation parameters of Z for the case of a zero threshold.The calculations are described in detail in Appendix B for the general case of a Variance and standard deviation: The expressions of mean and variance use the Gamma-function Γ(t) = ∞ 0 x t−1 e −x dx.Notice that the Gamma-function is defined for all real values of t except t = 0 and all negative integer values of t.Note also that median, mode, mean, variance, and standard deviation parameters of Z coincide with those of an exponential distribution in the limit as b equals one.
5.2 Two computational methods to estimate the free parameters a and b of Z from a given data sample In this section we present two numerical methods to calculate the scale parameter a and shape parameter b describing the new Z-distribution.First of all, since any measurement depends on a specific instrumental sensitivity, we have to introduce a threshold which we name z th .The remaining data sample consists of N observations z i with z i > z th .Then we calculate the mean m and standard deviation s of data z i using Eq. ( 3), and also the median value m from data z i .
Method (1): we investigate the corresponding theoretical moments from Z.In Appendix B we derive the theoretical mean µ (Eq.B5) and median ν (Eq.B3) for the Z-distribution with a threshold constraint.Taking the estimates of mean m and median m from the sample as best proxies for the theoretical mean µ and median ν values of Z, we get the following equations: Note that the use of a threshold constraint involves the introduction of a scaling factor A = exp(az b th ) which is present in Eq. (13b).Inserting the algebraic term of a (right side of Eq. (13a) into the right side zero-equation (Eq.13b) and using the threshold value of z th yields an equation only for b which has to be computed iteratively.Once a numerical value of b has been estimated with a sufficient accuracy, we insert this b value into the upper right equation to get the numerical value for a.
We note that in classical statistics the method of moments determines a and b from the mean and variance equations.
In principle this approach should be here possible too, but in practise the algebraic structure of the variance equation is too complicated, see Eq. (B6) in Appendix B. This means that the variance equation, if at all, is only iteratively solvable whereas the Atmos.Chem.Phys.Discuss., https://doi.org/10.5194/acp-2018-642Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 20 July 2018 c Author(s) 2018.CC BY 4.0 License.use of the median equation offers an analytical transformation to a. Generally, we recommend to apply the proposed method using the mean and median equations.This straight-forward method is easy to program and produces reliable estimates of parameters a and b.
Method (2): we also present a second method using a maximum likelihood approach, see Appendix C. The parameters are again calculated from two equations (Eq.C5) with Interestingly, the left equation includes a term 1/N z b i which is the mean of the sample values weighted by power b whereas the right equation includes the mean 1/N ln z i of logarithmic data and 1/N z b i ln z i .This shows a similarity to the computation of regression points used in Figure 3.We insert a into the right equation which yields a unique equation for b which again can be solved iteratively.Once b is fixed, the left equation allows to determine a.In the following we will test our lidar data samples with these two procedures and we will show that both methods produce almost identical results.

Z-distributions applied to ALOMAR data
Applications of the Z-distribution to ALOMAR data of maximum backscatter (x), ice mass density (y), ice particle radius (r) and ice number density (n) are shown in Figure 5.Note that thresholds have been computed from the regression functions (Eq.5) described in Sect.3.2 on the basis of x th = 3 •10 −10 m −1 sr −1 , and resulting y th = 22 mg/m 3 , r th = 22.3 nm and n th = 662 cm −3 .The values of scale parameter a and shape parameter b have been calculated with the method of mean and median equations (method 1).Then the theoretical curves of Z and theoretical values of mean, median, mode and standard deviation have been calculated by inserting the values of a, b and threshold z th into Eq.B1-B6.Obviously the pdf Z has sometimes no simple exponential shape which is the case for ice mass density, ice radius and ice number density.As we see in Figure 5 all Z-pdf curves (in blue) match the original data histograms with a high accuracy.The relative error is in a range about 6-10 percent except that ice number density has a relative error of 15 percent.When we compare the mean, median, mode, and standard deviation derived from the theoretical distribution and corresponding estimates from data samples, we see a precise coincidence of mean and median values.Not surprising this is due to the fact that the parameters a and b have been computed by the mean and median method which guarantees the preservation of mean and median values.Nevertheless standard deviation and mode also show always a good agreement within the error range.A closer look to the maximum backscatter distribution shows that MBS data are almost perfectly exponentially distributed with b = 0.931 which is not too far away from b = 1 for an exact exponential pdf.As we had already shown, see Sect.3.1.1,MBS data are very likely exponentially distributed, now the Z-distribution analysis confirms this result.Hence we conclude that the commonly used exponential (g-function) analysis might be only a reasonable statistical method in case of analyzing MBS lidar data.
In contrast to MBS, the Z-distribution of IMD shows a function that converges rapidly to zero for small IMD values.The distribution is described with b = 1.355 which significantly deviates from b = 1 for a precise exponential function.Note that the mode of the data sample at 40 mg/m 3 differs from the theoretical mode of 23 mg/m 3 because of a relatively large statistical noise in the data.But mean, median and standard deviation values agree almost perfectly.Similar to IMD, the ice radius The sample of ice number density shows a completely different behavior with a slope parameter that is negative with b = -0.819.The physical meaning is that the parameter ice number density is negatively correlated with all other ice parameters.For example, large ice numbers n correspond to small ice radii, IMD and MBS values.As a consequence this leads to a threshold of n in the reverse direction, that is from large values to small values defined by n < n th = 662 cm −3 .One can see this feature in the right tale of Z(n) plotted as a dashed curve, see Figure 5d.The reversal behavior is also present for small values of n.
Small values of n are measured for very bright PMC events with large MBS that have small occurrence rates.Therefore, the number of small ice particles has a relatively high uncertainty due to their low occurrence frequency, and it is this statistical  error which produces some deviations from the fit curve to the data in the range of n = 0 -80 cm −3 .We note that the numerical procedure computing the pair (a, b) from the method of mean/median (Eq.13a,b) has automatically detected the existence of a negative slope parameter b without any a priori information.Now we repeat the analysis using method 2. Figure 6 summarizes the (a,b)-values and statistical moments calculated from the method of maximum likelihood estimators.As can be seen the maximum likelihood approach computes almost identical results for all ice parameters.We have added in Figure 6a-c

Construction of artificial data
In the derivation of the Z-distribution we used the assumption that all ice parameters of maximum backscatter x=MBS, ice mass density y=IMD, ice particle radius r and ice number density n are connected with one another by the power law given in Eq. ( 5).In Sect.5.3 we showed that the Z-pdf describes with a high accuracy each distribution of these ice parameters which, in turns, means that indeed there exists at least an approximative power law between ice parameters.We discuss a suitable justification of this power law relation in more detail in Sect.6.2.In the following we will show that the use of a Z-distribution allows to construct artificial unknown data samples of various ice parameters which approximate true data to a high degree.
We think that such an application is one of the most beneficial outcome from the new Z-distribution approach.We explain the numerical procedure by the help of an practical example.
We already showed a linear dependance in the logarithmic frame using linear regression (LR) for maximum backscatter and ice particle radius, see Figure 3b.Hence we can compute artificial ice radius proxies ri , named as LR-proxy of true data r i , as a function of MBS-data x i from the regression power law function (Eq.5) with ri = cx d i and with power law coefficients c = 13.509 and d = 0.497.Figure 7a shows a comparison between LR-proxy and original ice radius data where we test the identity of the two data samples.The correlation coefficient is the same as shown in Figure 3b with R = 0.55.Mean and median values of proxy and original data are almost identical, and a regression analysis shows a perfect identity (c = 1.000 and d = 1.000).Now we calculate the frequency histogram of LR-proxy ice radii, see Figure 7b, and compare the histogram with the original Z-distribution of ice radii already shown in Figure 5c.We find that the LR-proxy approximates the mean, median, mode and standard deviation values of the original Z-distribution with an relative error of 9.5% comparable to the original error of 9.1%.We conclude that a linear regression analysis of logarithmic data offers a good opportunity to approximate data provided that a pair of data samples exists that allows the calculation of power law coefficients c and d from regression methods.
Now the Z-distribution approach offers a more general possibility to derive artificial data samples without any knowledge of correlation and regression coefficients.Indeed we will show that results from the Z-approach are very close to results from a regression analysis.Again, our goal is to approximate ice radius data from a given maximum backscatter data sample.But now we suppose that no data of ice particle radii r exist, hence any correlation and regression analysis is not possible.Have in mind that such information about scale and shape parameters of the ice radius distribution could be also provided from independent satellite measurements that are capable to measure ice particles radii, e.g.AIM-SOFIE.Our new proxy method (Z-proxy) requires the following transformations.We first transform the x i -values (i = 1, ..., N ) into the z-domain with x i → z i : z = a x x bx i followed by a second transformation with z i → r i : r i = (z i /a r ) 1/br resulting in Note that the derivation of c and d in Eq. ( 15) is based on the same mathematical steps when we developed the Z-distribution from Eq. ( 8) to Eq. ( 9).Inserting the a x , b x , a r and b r values into Eq.( 15) determines the power law coefficients for Z-proxy r with c = 12.994 and d = 0.508.These values do not exactly coincide with c and d values obtained from the regression method, see above, but the identity test between Z-proxies and true ice radii shows a very good coincidence, see Figure 7c.Again, mean and median values of proxy and original data are practically identical, and a regression analysis shows almost a perfect identity (c = 1.095 and d = 0.980).Finally we calculate a frequency histogram of Z-proxies, see Figure 7d, and find a good agreement between proxies and true pdf.Mean, median, and standard deviations of Z-proxy data correspond perfectly to original ice radius data, and the relative error has now even decreased to 9.2%.
We summarize that we present a new method in order to construct artificial data samples provided Z-descriptions of these data sets exist.By means of a consecutive arranging of ice parameters starting at one given data sample this method allows to construct any artificial data sample within (x, y, r, n).This method can be also applied to other data sets, e.g.ice parameter measurements from satellite observations.For example, a data sample of ice water content (IWC) obtained from satellite measurements might be analyzed in terms of a Z-distribution estimating the scale and shape parameters a and b of the IWC distribution.This would allow to establish a connection of satellite IWC data to lidar data samples (x, y, r, n) through Eq.
(15), hence the satellite IWC data could be transferred to lidar maximum backscatter, ice mass density, ice particle radius and ice number density.Vice versa the knowledge of an satellite IWC Z-distribution would allow to transform lidar observations into IWC proxies and compare these with the original IWC observed by the satellite.We think that our proposed transformation method could be very helpful to connect different ice parameter data from different instruments, either from satellite observations or ground based measurements.We also think that this new approach might be important in trend analysis of NLC/PMC.
In the next section we will discuss the power law assumption (Eq.5) and the physical meaning of the shape parameter b which might be introduced as a new trend variable in the analysis of NLC/PMC long-term changes.

Discussion of the power law assumption between PMC parameters
In this section we discuss some theoretical aspects of the power law dependence on ice parameters in order to validate the justification of Eq. ( 5).We use again the assumption as already discussed in Sect. 2 that at the altitude of maximum brightness (MBS) and ice mass density (IMD) there exists in the real atmospheric background an ice particle distribution that is perfectly Gaussian (N i ) distributed as We also assume that the geometric shapes of these ice particles are spheres with ice radii τ with mean radius r i and variance σ 2 i .Again index i = 1, ..., N relates to the i-th measurement in a given data sample of number N .N i is normalized to ∞ 0 N i (τ )dτ = 1.When we assume an ice number density of n i particles per cm 3 we get the expression ∞ 0 n i • N i (τ )dτ = n i .Note that mean ice radii r i and ice number densities n i are elements of our lidar data climatology which we have introduced in Sect. 2. Furthermore, we assume from the analysis of lidar observations (3-color measurements) that the relation of mean radius and variance is according to σ i = 0.4 r i for r i > 37.5 nm and σ i = 15 nm for r i ≥ 37.5 nm (Baumgarten et al., 2010).This assumption has been also applied in the analysis of AIM/SOFIE-CIPS PMC satellite data (Lumpe et al., 2013;Hervig and Stevens, 2014).In order to simplify calculations we apply this relation σ i = 0.4 r i also for r i larger than 37.5 nm.We now investigate the question which backscatter lidar and ice mass signal result from such an ice distribution?
We compute the mass of an spherical ice particle with radius τ as 4/3πρ ice τ 3 with density of ice ρ ice = 932 kg/m 3 .The backscatter signal from a single ice particle is calculated as a L τ 5.8 with the lidar constant a L = 1.5 • 10 −11 m 2 .Then the maximum backscatter x i and ice mass density y i are estimated by an integration of the radius distribution from zero to infinity as The integral for x i includes the term τ 5.8 that arises from Mie-scatter theory for light scattering of a wavelength of 532 nm (ALOMAR RMR-lidar) at spheres in a range of radii with 1-100 nm.The exponential value of 5.8 approximates exact Miescatter calculations with an relative error less than 0.5 % in this radii range.Unfortunately, the integral can only be solved analytically if the exponent is an integer number as 5 or 6, respectively.Nevertheless, we are able to solve this integral by means of numerical methods with the specific exponent of 5.8.In a next step, we construct analytical approximations f x (r i ) and f y (r i ) for both integral solutions using a typical value of n = 200 cm −3 with f x (r i ) = x i = a 1 a L nr 5.8 i , f y (r i ) = y i = a 2 4 3 π ρ ice nr 3 i .
The linear constants a 1 and a 2 with values a 1 = 4.20 and a 2 = 1.47 are optimal dimensionless parameters.f y approximates the analytical solution of y i with a relative error less than 0.7 percent in the range r i = [0 nm;45 nm], and less than 1.

Figure 1 .
Figure 1.(a) Logarithm of frequency distribution of maximum backscatter (x=MBS) in units of 10 −10 m −1 sr −1 (gray points x > 3; black circles 0 < x < 3).The bin size is ∆x = 1.The straight line (solid red) has been derived from a least squares fit to MBS data with x > 3. (b) Same as panel a, but original, non-logarithmic frequency distribution (gray bars x > 3; black bars 0 < x < 3).The exponential fit derived from panel a is shown as a red curve.Values of mean, median, standard deviation are given to compare fit and original data taking into account a threshold of x th = 3.The relative error given in percent describes the quality of exponential fitting.
(g-function) test for IMD data and get the following results: finding the mean ( m = 62.5 ± 1.3), median ( m = 53.5 ± 1.4), and standard deviation (s = 35.2± 1.2) directly calculated from the data sample, we get m − y th = 62.5 − 20 = 42.5 unequal s = 35.2unequal ( m−y th )/ ln(2) = (53.5−20)/0.69315= 48.3.Hence the condition of identity is not satisfied even allowing for uncertainties introduced by statistical errors again calculated from bootstrap methods.That is why we have to conclude that the lidar IMD data are very likely not exponentially distributed.When we investigate a possible exponential distribution for ice radius r and ice number density n data, see Figure 2c-f, we even see larger discrepancies between data and exponential Atmos.Chem.Phys.Discuss., https://doi.org/10.5194/acp-2018-642Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 20 July 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 2 .
Figure 2. (a) Logarithm of frequency distribution of ice mass density (y=IMD) in units of mg/m 3 (gray points y > 20; black circles 0 < y < 20).The bin size is ∆y = 2.The straight line (solid red) has been derived from a least square fit to IMD data with y > 20.(b) Same as panel a, but original, non-logarithmic frequency distribution (gray bars y > 20; black bars 0 < y < 20).The exponential fit derived from panel a is shown as a red curve.Values of mean, median, standard deviation are given to compare fit and original data taking into account a threshold of y th = 20.The relative error given in percent describes the quality of exponential fitting.(c) and (d) Same, but for ice radius r in units of nm with bin size ∆r = 1 and threshold r th = 20.(e) and (f) Same, but for ice number density n in units of 1/cm 3 with bin size ∆n = 10 and threshold n th = 30.

Figure 3 .
Figure 3. (a) Maximum backscatter (x=MBS) versus ice mass density (y=IMD) in a logarithmic frame for all data with correlation coefficient R and regression parameters c and d, see text for more details.Regression points p 1/2 = [m ± ∆m; n ± ∆n] are calculated with m = 1/N N i ln xi, n = 1/N N i ln yi, ∆m = 1/N N i (ln xi − m) 2 , and ∆n = 1/N N i (ln yi − n) 2 .Mean and median are calculated from original, non-logarithmic data.(b) Same for x=MBS versus ice radius r.
First of all, the logarithmic values of x and y do not yet have a high linear correlation (R = 0.54), see Figure3a.Since the correlation coefficient R is unequal one, the two regression lines resulting from y(x) : ln x → ln y and x(y) : ln y → ln x differ from each other.This means that the best choice of a regression fit is determined by a straight line through the two regression points which are defined by the means plus/minus standard deviations of logarithmic x and y data.Note that the Atmos.Chem.Phys.Discuss., https://doi.org/10.5194/acp-2018-642Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 20 July 2018 c Author(s) 2018.CC BY 4.0 License.positions of regression points also relate to the half width of the angle which is spanned by the two regression lines y(x) and x(y).For our mean regression line we estimate d = 0.873.The statistical error for d is ∆d = ±0.012with a confidence level of 95 % which indicates a significant non-linearity.Hence we conclude that the pdf describing the distribution of IMD data is very likely not an exact exponential function and its cumulative distribution does not follow precisely a g-function description because the criteria of 'linearity' is violated.In Figure3bwe show a second example for the correlation between MBS and ice radius n.Again the correlation is about R = 0.55, but now the power constant is even much smaller with d = 0.497 which is far away from unity.Finally, we investigated the linearity between MBS and ice number density n where we find a weak negative correlation of R = −0.15(not shown here).A best fit analysis yields a power value of d = -0.534which again fails significantly the constraint of unity.Hence we conclude that also ice radius and ice number densities distributions should not follow exponential (g-function) distributions.

Figure 4 .
Especially, the shape parameter b determines the shape of the Z-distribution describing non-linear exponential, exponential, right-skewed, leftskewed or symmetric curves.For 0 < b < 1 the pdf increase is non-linear exponentially accelerated to infinity as z approaches zero.For b = 1 the pdf is exactly an exponential distribution having a positive finite value for z equal zero.For b > 1, the function tends to zero as z approaches zero.When b is between one and two, the function is right-skewed and rises to a peak quickly, then decreases for large z.When b has approximately a value between three and four, the function becomes symmetric and bell-shaped like a normal distribution.Note that exact symmetry is given for a skewness equal to zero which is true at z = 3.60232.For b values larger than approximately five, the function becomes again asymmetric changing the skewness to the left.For b < 0, the function is skewed to the right and decreases steeply towards zero as z approaches zero.Note that Z is never negative and owns a local maximum described by the mode whenever b is negative or larger than one.Finally we see that a double logarithmic presentation of cumulative functions describes linear shapes with slope b, see Figure4j-l.Atmos.Chem.Phys.Discuss., https://doi.org/10.5194/acp-2018-642Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 20 July 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 5 .
Figure 5. Frequency distributions and Z-function analysis of ALOMAR data.Parameter a and b have been estimated with the mean and median method.The relative error given in percent describes the quality of the Z-function fit.(a) Maximum backscatter data x.(b) Ice mass density y.(c) Ice particle radius r.(d) Ice number density n, see text for more details.

Figure 6 .
Figure 6.Same as Figure 5, but parameters a and b have been estimated from the maximum likelihood method.(a-c) Maximum backscatter, ice mass density and ice radius: gray bars indicate values larger than the threshold, black bars indicate values smaller than the threshold.(d) Ice number density: gray bars indicate values smaller than the threshold, black bars indicate values larger than the threshold.
also the histogram bars (in black) for all data being smaller than the threshold.Have in mind that the calculation of theoretical distribution curves is based exclusively on data larger than the threshold.Hence, decreasing or increasing a threshold will change the specific values of a and b. Figure 6d shows the ice number density distribution where we have added in the histogram (in black) all data being larger than the threshold.Again, Atmos.Chem.Phys.Discuss., https://doi.org/10.5194/acp-2018-642Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 20 July 2018 c Author(s) 2018.CC BY 4.0 License.also the maximum likelihood method has automatically detected the existence of a negative slope parameter b for the ice number density distribution.
First,    we assume that a data sample of x = MBS of number N exists and also its Z-distribution Z(x, a x , b x ) with a x = 0.140 and b x = 0.931 is well known, see Figure5a.Secondly, we assume that we know a priori the form of the Z-distribution Z(r, a r , b r ) of ice radius r, e.g. with values of parameters a r and b r from Figure5c(a r = 1.269•10 −3 , b r = 1.833).

Figure 7 .
Figure 7. (a) Proxy p of ice radius versus original ice radius data without any threshold.The proxy has been derived from maximum backscatter data using the fit function that has been estimated by linear regression (LR-proxy) between original logarithmic MBS and ice radius data, see Figure 3b.(b) Frequency distribution of LR-proxy (gray and black histogram) with a threshold r th = 23.3nm.For comparison we also plot the original Z-pdf curve (blue) from the analysis of original ice radius data, see Figure 5c.The relative error describes the accuracy between LR-proxy data and original Z-function fit.(c) Same as a, but for Z-proxy data resulting from the Z-pdf analysis of MBS data, see text for more details.(d) Same as b with Z-proxy data.
number density n of ice particles.Only the integral of y i is analytically computable with a solution in which the error function defined by the integral erf (x) = 2/ √ π x 0 exp(−t 2 )dt is part of the solution: 2 percent in the range r i = [45 nm; 70 nm].A precise solution of x i resulting from numerical methods of integration is approximated by f x with a relative error less than 0.3 percent in the range r i = [0 nm;45 nm], then the relative error increases linearly to a maximum ∂L(a, b) ∂b = N • a • ln z th • z b th + the derivatives equal to zero yields for a and b maximum-likelihood estimators for scale parameter a and shape parameter b.Competing interests.The authors declare that they have no conflict of interest.Atmos.Chem.Phys.Discuss., https://doi.org/10.5194/acp-2018-642Manuscript under review for journal Atmos.Chem.Phys.Discussion started: 20 July 2018 c Author(s) 2018.CC BY 4.0 License.sosphere and Lower Thermosphere: A Review of Experiment and Theory, edited by R. M. Johnson and T. L. Killeen, Geophys.Monogr.Ser., Vol.87, pp.185-200, 1995.AGU, Washington, D. C. von Cossart, G., Fiedler, J., and von Zahn, U.: Size distributions of NLC particles as determined from 3-color observations of NLC by ground-based lidar, J. Geophys.Res., 26, 1513-1516, 1999.Wilks, D. S.: Statistical Methods in the Atmospheric Sciences, edited by R. Dmowska and J. R. Holton, International Geophysics Series,