A new description of probability density distributions of polar mesospheric clouds

In this paper we present a new description of statistical probability density functions (pdfs) of polar mesospheric clouds (PMCs). The analysis is based on observations of maximum backscatter, ice mass density, ice particle radius, and number density of ice particles measured by the ALOMAR Rayleigh–Mie–Raman lidar for all PMC seasons from 2002 to 2016. From this data set we derive a new class of pdfs that describe the statistics of PMC events that is different from previous statistical methods using the approach of an exponential distribution commonly named the g distribution. The new analysis describes successfully the probability distributions of ALOMAR lidar data. It turns out that the former g-function description is a special case of our new approach. In general the new statistical function can be applied to many kinds of different PMC parameters, e.g., maximum backscatter, integrated backscatter, ice mass density, ice water content, ice particle radius, ice particle number density, or albedo measured by satellites. As a main advantage the new method allows us to connect different observational PMC distributions of lidar and satellite data, and also to compare with distributions from ice model studies. In particular, the statistical distributions of different ice parameters can be compared with each other on the basis of a common assessment that facilitates, for example, trend analysis of PMC.


Introduction
First studies of probability distributions of polar mesospheric clouds (PMCs) were reported by Thomas (1995) using data from the UVS (ultraviolet spectrograph) instrument on board the Solar Mesosphere Explorer (SME) satellite and from the Solar Backscatter Ultraviolet (SBUV) instrument on the Nimbus-7 satellite over the period 1978-1986, measuring scattered limb albedo at 265 nm and nadir albedo at 273.5 nm, respectively. Thomas (1995) introduced empirical measures in the statistical analysis of PMC brightness distributions. He showed that the frequency distribution of PMC albedo derived from both SME and SBUV satellite data can be approximated by an (normalized) exponential probability function, see Fig. 3 in Thomas (1995). Secondly, the author also proposed to use cumulative frequency numbers (the socalled g function) of clouds, g(A), exceeding a certain albedo A in order to better represent the exponential populations. Examples of g distributions are plotted on a semilogarithmic scale in Fig. 4 in Thomas (1995), clearly indicating an approximately linear behavior of cumulative frequencies in a logarithmic format.
In the following years many observational PMC analyses of seasonal statistics have been published frequently using the g function, e.g., reports from Wind Imaging Interferometer (WINDII) and Polar Ozone and Aerosol Measurement II data (Shettle et al., 2002), SBUV data (Deland et al., 2003), Student Nitric Oxide Explorer (SNOE) data (Bailey et al., 2007), ice water content data derived from SBUV (DeLand and , or ALOMAR lidar data (Fiedler et al., 2017). Also model analyses have used the g function investigating trends and long-term changes in PMC parameters (Lübken et al., 2013;Berger and Lübken, 2015).
The g-function approach has been relatively successfully applied to many kinds of different PMC parameters as brightness, albedo, maximum backscatter ratio, integrated backscatter, ice water content, ice mass densities, ice particle size, or ice particle number density since frequency histograms of all these parameters have sometimes a nearly, at least piecewise, exponential shape. Furthermore, sometimes PMC data seem to fit almost perfectly to exponential distributions, particularly when using cumulative standardizations of data (Thomas, 1995). An example of a good exponential fit is the frequency distribution of ALOMAR backscatter data that are discussed in Sect. 3.1.1. On the other hand, in some statistical applications it is obvious that the exponential approach describes the data rather insufficiently, see examples of ice mass density, ice radius, and ice number density in Sect. 3.1.2. Therefore it is a desirable task to provide some more aspects on the theory of PMC statistics.
This paper makes an attempt to investigate in more detail the statistics of probability density functions (pdfs) of PMC climatology for various ice parameters. In the following we analyze a PMC data record of maximum backscatter, ice mass density, ice particle radius, and number density from the period 2002-2016 measured by the ALOMAR Rayleigh-Mie-Raman (RMR) lidar. From the analysis of these ALO-MAR data, we derive a new class of pdfs of PMC distributions that, as we will show, modifies and improves the exponential (g-function) approach as introduced by Thomas (1995).

Description of ALOMAR lidar data
The data set obtained by the ground-based RMR lidar, located at the Arctic station ALOMAR (69 • N, 16 • E), consists of occurrence frequency, brightness, and altitude of PMC. The RMR lidar is in operation on a routine basis during the summer seasons (PMC season: 20 May to 20 August) since 1997. Since summer 2002 the lidar system has the general capability to run in a multiple wavelength (3-color) mode. We briefly summarize the 3-color lidar technique: laser pulses at three separated wavelengths (355, 532, 1064 nm) are emitted, scattered back by air molecules and ice particles in the atmosphere, and collected by telescopes. The received light is recorded by single photon counting detectors with an integration time of 15 min. After separation of the ice particle and molecular backscatter signal, we extract three vertical profiles of so-called backscatter coefficients, which are a measure of height-dependent brightness of the ice cloud. At the height of maximum backscatter (MBS) at 532 nm we calculate three MBS values. We assume that at the altitude of MBS, typically located near 83 km, the actual shape of the ice particle distribution can be described by a normal distribution. Then we derive from the three measured MBS values the characteristics of the normal distribution with mean ice radius, ice number density, and variance (Baumgarten et al., 2007). Finally, we also estimate from these ice parameters the actual ice mass density (IMD) at the MBS height. Such a Gaussian assumption has been widely used in PMC data processing of lidar and satellite data, e.g., ALO-MAR lidar (Baumgarten et al., 2010) and AIM satellite with SOFIE/CIPS instruments (Hervig and Stevens, 2014;Bailey et al., 2015). Also microphysical model studies show strong evidence of Gaussian-distributed ice particles at the height of maximum brightness of PMC, e.g., Berger and von Zahn (2002) and Rapp and Thomas (2006).
In this paper we will analyze the climatology of all ice seasons from 2002 until 2016 merging all 15 seasons to one data record. Within this combined data set we then get a total number N of 8597 observations, which is sufficiently numerous in order to avoid excessive statistical irregularities in a frequency histogram of the data.
3 The exponential probability distribution (g function) In general, the seasonal climatology of PMC events with measured ice parameters, such as integrated backscatter, maximum backscatter, column ice mass, albedo, or ice mass density, has been supposed to follow an exponential distribution that we name E(x) with ice parameter variable x. In the following we summarize the general characteristics of the exponential distribution that allows us to compute a numerical test for exponentially distributed data. The properties of the exponential probability distribution will be also compared with the characteristics of our new probability distribution approach introduced in Sect. 4.
The general form of the exponential distribution E(x) with scale parameter α > 0 is defined as a pdf given by E(x) = α exp(−αx) that fulfills the normalization condition of a pdf with ∞ 0 E(x)dx = 1. Thomas (1995) defined the g function g(x) as the cumulative probability E cum with (1) Taking the logarithm of E yields a straight line ln(E) = ln α − αx. For a given class of values [x 1 ; x 2 ] the likeliness of this class is proportional to the area enclosed by the continuous probability distribution and is obtained by integrating E on the segment length (bin size) x = x 2 − x 1 as x 2 x 1 Edx = −e −αx 2 + e −αx 1 . A statistical analysis of ice parameters has to take into account the aspect of specific sensitivities of different instruments. For example the ALOMAR lidar is generally sensitive to a backscatter signal larger than a threshold about 2-3 × 10 −10 m −1 sr −1 (Fiedler et al., 2017). When considering a threshold (x th ) the exponential pdf E(x) is normalized according to A ∞ x th E(x)dx = 1 with a scaling factor A = exp(αx th ). We summarize the properties of the exponential distribution taking into account a threshold in Appendix A.
For a threshold of zero (x th = 0) we get the regular exponential distribution E(x) that has the mean µ = 1/α; median ν = ln(2)/α; mode η = 0, which is the value that occurs most frequently in the data sample; variance σ 2 = 1/α 2 ; and standard deviation σ = 1/α. Note that the exponential distribution has the unique property that the mean µ and standard deviation σ are identical, see also Eqs. (A1) and (A4). In combination with the median (Eq. A2), these equations form a simple statistical constraint, namely (2) This allows us to test whether a given observational data sample shows good conformity with an exponential (g-function) distribution.
For a given data sample x i (i = 1, . . ., N ) assuming a threshold x i > x th we use the common estimates of mean m and variance s 2 (standard deviation s) with In addition we also calculate the median m and modem.
Hence testing a data sample to be exponentially distributed means that mean, median, and standard deviation of the sample have to fulfill the following identity: We will use this condition to analyze the ALOMAR data with respect to possible exponential (g-function) distributions.
3.1 Analysis of ALOMAR data on exponential distributions (g function)

Analysis of maximum backscatter data
We investigate the frequency distribution of MBS at 532 nm in units of 10 −10 m −1 sr −1 . We assume a threshold of 3 that corresponds to the instrumental sensitivity of the ALOMAR lidar. Then we sort the x = MBS data to a bin size of one per class starting from the threshold value and calculate a frequency histogram. Finally, we normalize the histogram so that the sum of all frequency classes equals one. Figure 1a shows the frequency distribution of x = MBS data in a semilogarithm diagram. The first impression is that the data points are almost perfectly approximated by a linear regression besides some statistical noise. This indicates that an exponential function describes the distribution of data with a high accuracy. Figure 1b shows the distribution histogram in an original nonlogarithmic representation. We see that the exponential fit matches the data histogram with a high precision. The good quality of the fit is characterized by a small relative error of 6.5 % that is calculated as a sum of 100 % · M j |E j − X j | for x > 3 with theoretical exponential frequencies E j and normalized frequencies X j of data x per class j with a total of M classes. The high quality of the fit is also supported by the fact that theoretical mean, median, mode, and standard deviation (µ, ν, η, σ ) using Eqs. (A1)-(A4) and estimates of mean, median, mode, and standard deviation (m, m,m, s) from the data sample derived from Eq. (3) all coincide within their error bars. Now we perform the proposed exponential (g-function) test with m−x th = s = ( m−x th )/ ln(2), see Eq. (4), and insert the values from the data sample of mean (m = 12.0 ± 0.3), median ( m = 9.0 ± 0.4), and standard deviation (s = 9.2 ± 0.5). The error uncertainties have been estimated with bootstrap methods. We find that m−x th = 12.0−3 = 9±0.3, s = 9.2±0.5, and ( m − x th )/ ln(2) = (9.0 − 3)/0.69315 = 8.7 ± 0.6. Hence the identity is fulfilled when allowing for uncertainties introduced by statistical errors. We conclude that lidar MBS data are very likely exponentially distributed and follow a g function.
3.1.2 Analysis of ice mass density, ice radius, and ice number density data Now we investigate other ice parameters from the ALOMAR data set with respect to exponential distributions, namely the frequency distributions of IMD in units of mg m −3 (threshold 20, bin size of 2), ice radius r in units of nm (threshold 20, bin size of 1), and ice number density n in units of cm −3 (threshold 30, bin size of 10). We will show that these parameters do not follow an exponential distribution (g function). In Fig. 2a we plot the frequency distribution for y = IMD data in a semilogarithmic diagram. We show in the following that the data points have no dominant linear shape. There exist systematic deviations from data and the theoretical exponential fit. In comparison to the fit curve, data points are systematically smaller at y = 20-40. Vice versa, data points substantially exceed fit values in the range y = 40-90. Also, frequencies in all classes below the threshold are significantly smaller than a proposed exponential fit. Indeed, the frequency histogram in the nonlogarithmic frame shows these systematic deviations between data and exponential fit even more pronounced, see Fig. 2b. With a relative error of about 19 % the exponential curve fails to satisfactorily fit the data. Also, significant differences exist between fit and data parameters of mean, median, mode, and standard deviation (Fig. 2b). Finally, we apply the exponential (g-function) test for IMD data and get the following results: finding the mean (m = 62.5 ± 1.3), median ( m = 53.5 ± 1.4), and standard deviation (s = 35.2±1.2) directly calculated from the data sample, we get m − y th = 62.5 − 20 = 42.5 not equal to s = 35.2 not equal to ( m − y th )/ ln(2) = (53.5 − 20)/0.69315 = 48.3. Hence the condition of identity is not satisfied even allowing for uncertainties introduced by statistical errors again calculated from bootstrap methods. That is why we have to conclude that the lidar IMD data are very likely not exponentially distributed. When we investigate a possible exponential distribution for ice radius r and ice number density n data, see Fig. 2c-f, we even see larger discrepancies between data and exponential fits with, for example, relative errors of about 29 %, also indicating that both r and n are very likely not exponentially distributed. This is supported by the fact that the test of mean, median, and variance fails again and shows large inequalities. . The bin size is x = 1. The straight line (solid red) has been derived from a least-squares fit to MBS data with x > 3.
(b) Same as (a), but original nonlogarithmic frequency distribution (gray bars x > 3; black bars 0 < x < 3). The exponential fit derived from panel (a) is shown as a red curve. Values of mean, median, and standard deviation are given to compare fit and original data taking into account a threshold of x th = 3. The relative error given in percent describes the quality of exponential fitting, see text for details.
We summarize that ice mass density, ice radius, and ice number density do not follow an exponential distribution in contrast to maximum backscatter. In the following section we will show that this is reasonable and is based on the fact that a functional link between MBS and the other data sets of IMD, r and n does miss a linear relationship.

Test of linearity between maximum backscatter
and ice mass density, ice radius, ice number density data Linearity between MBS and IMD, ice radius r and ice number density n data, is a necessary and sufficient condition that also shows that IMD, r and n data, samples are exponentially distributed, see next section. In the following we will test this constraint. Figure 3a shows a scatter plot in a logarithmic frame for simultaneously measured MBS and IMD data. In order to test a linear relationship between x = MBS and y = IMD we introduce a general fit function described by a power law condition as with the two constants c (linear constant) and d (power constant). Only for d = 1 we expect a perfect linear dependence between x and y. First of all, the logarithmic values of x and y do not yet have a high linear correlation (R = 0.54), see Fig. 3a. Since the correlation coefficient R is unequal one, the two regression lines resulting from y(x) : ln x −→ ln y and x(y) : ln y −→ ln x differ from each other. This means that the best choice of a regression fit is determined by a straight line through the two regression points, which are defined by the means plus/minus standard deviations of logarithmic x and y data. Note that the positions of regression points also relate to the half-width of the angle that is spanned by the two regression lines y(x) and x(y). For our mean regres-sion line we estimate d = 0.873. The statistical error for d is d = ±0.012 with a confidence level of 95 %, which indicates a significant nonlinearity. Hence we conclude that the pdf describing the distribution of IMD data is very likely not an exact exponential function and its cumulative distribution does not follow precisely a g-function description because the criterion of "linearity" is violated. In Fig. 3b we show a second example for the correlation between MBS and ice radius n. Again the correlation is about R = 0.55, but now the power constant is even much smaller with d = 0.497, which is far away from unity. Finally, we investigated the linearity between MBS and ice number density n where we find a weak negative correlation of R = −0.15 (not shown here). A best fit analysis yields a power value of d = −0.534, which again fails significantly the constraint of unity. Hence we conclude that ice radius and ice number density distributions should also not follow exponential (g-function) distributions.

A new probability density function for PMC parameters
In this section we will present the major part of the new statistical approach in order to describe frequency distributions of different PMC parameters. There exists a general mathematical method ("integration by substitution") that provides the opportunity to transform between pdfs with different statistical variables. This is done by the following procedure: assuming a given pdf P (x) with variable x, then the transformation from x to a new variable y(x) with a new pdf Q(y) is specified by with x(y) being the inverse function of y(x). Here the absolute value of the derivative ∂x/∂y has to be calculated so that the new pdf Q is defined positively everywhere. In order to apply this approach one needs generally two requirements (1) Any transformation between the two pdfs, P and Q, needs an initial guess in one of the two pdfs, either P or Q.
(2) An analytic formula of a forward and backward model must be available that describes the functional depen-dence between the two statistical ice variables x and y. In the following we discuss how we satisfy these two requirements.
We apply this method for two ice parameters, namely MBS with variable x and an unknown ice parameter named u (e.g., this unknown ice parameter might be ice particle radius). For condition (1), we use the hypothesis that the distribution of maximum backscatter data (MBS) is perfectly rep-  resented by an exponential pdf and its cumulative distribution is described by a g function according to Eq. (1). For condition (2), we assume a power form of a fit function used in Eq. (5) that also allows us to analytically calculate the inverse function. We discuss a suitable justification of this assumption in Sect. 6.2. Hence the forward model is u(x) = cx d and the backward model is x(u) = (u/c) 1/d . Then the new distribution U for the arbitrary ice parameter u using Eq. (6) is given by Equation (7) can be simplified to a more general form with In a next step we introduce, in an arbitrary manner, a third ice parameter named z for which we assume again the same power law (Eq. 5) now valid between z and u as Again we apply Eq. (6) and calculate the unknown pdf Z(z): At first glance the algebraic expression for Z looks particularly complex, but Z can be transformed to a general form Equation (9) represents our final result. The pdf Z(z) describes the general form of the new statistical distribution. Note that the algebraic expressions of Eqs. (8) and (9) formally coincide. This means that any probability distribution of a new ice parameter that is connected to other ice parameters through our functional power law (Eq. 5) can be described by the general pdf given by Eq. (9). The constants a and b represent two free parameters in the Z distribution, which we name the scale parameter a and the shape parameter b. Obviously, the Z pdf is identical with an exponential pdf (or g function) in the limit b = 1. This shows the close interconnection of the new Z pdf to the commonly used exponential (g-function) approach. We will show in the following that any distribution from so different ice parameters, such as maximum backscatter, ice mass density, ice radius, and number density of ice particles, can be described on a uniform basis with a high accuracy by Z. Vice versa this indicates that these ice parameters are connected depending on each other by the uniform power law relation (Eq. 5), more details are discussed in Sect. 6.2.
5 Application of the Z distribution to real data

General properties of the Z distribution
In this section we first show some general characteristics of the new Z distribution. From these properties we derive conditions and constraints that will allow to estimate the specific values of the two free constants in Z, the scale parameter a and shape parameter b, for a given data sample.
First we show that Z is a correct pdf satisfying the normalization condition For a negative b the distribution Z is described by The cumulative form of Z for b > 0 is given by For b < 0 we have to choose the cumulative calculation in reverse order starting the integration at zero. Naming the reverse cumulative with index zero as Z 0 cum we get Only the cumulative descriptions from Eqs. (11a) and (11b) allow us, in principle, to roughly estimate the constants a and b for a given data sample using the double-logarithmic functional dependence, whereas the direct logarithm of Z (Eqs. 10a, 10b) offers no possibility to solve for a and b. However, the method calculating the double-logarithmic cumulative is not recommended. Several numerical tests showed that a stable estimation of a and b from noisy data applying this double-logarithmic approach is an almost impossible task. Instead, we propose two different methods that rely on much more powerful principles (see next Sect. 5.2). Additionally, we have to take care of a possible negative value of b that can be only identified using Eq. (11b). In fact such a case occurs in the analysis of ALOMAR data. In Sect. 5.3 we will give an example where only a negative slope parameter describes the distribution of number density of ice particles. Generally, the Z distribution has the ability to characterize many different types of distributions, see Fig. 4. Especially the shape parameter b determines the shape of the Z distribution describing nonlinear exponential, exponential, rightskewed, left-skewed, or symmetric curves. For 0 < b < 1 the pdf increase is nonlinear and exponentially accelerated to infinity as z approaches zero. For b = 1 the pdf is exactly an exponential distribution having a positive finite value for z equal zero. For b > 1, the function tends to zero as z approaches zero. When b is between 1 and 2, the function is right-skewed and rises to a peak quickly, then decreases for large z. When b has an approximate value between 3 and 4, the function becomes symmetric and bell-shaped like a normal distribution. Note that exact symmetry is given for a skewness equal to zero, which is true at z = 3.60232. For b values larger than approximately 5, the function becomes again asymmetric changing the skewness to the left. For b < 0, the function is skewed to the right and decreases steeply towards zero as z approaches zero. Note that Z is never negative and owns a local maximum described by the mode whenever b is negative or larger than 1. Finally we see that a double-logarithmic presentation of cumulative functions describes linear shapes with slope b, see Fig. 4j-l.
It is interesting to note that our new Z distribution is closely related to a more general Weibull distribution (Wilks, 1995). Nevertheless there is a difference concerning the shape parameter b, which in our case is not only defined for positive values but also for negative values. Such a case is disregarded by a classical 2-D Weibull distribution. Now we shortly summarize the mathematical descriptions of median, mode, mean, variance, and standard deviation parameters of Z for the case of a zero threshold. The calculations are described in detail in Appendix B for the general case of a nonzero threshold. Median: Mode: Variance and standard deviation: The expressions of mean and variance use the gamma func-  defined for all real values of t except t = 0 and all negative integer values of t. Note also that median, mode, mean, variance, and standard deviation parameters of Z coincide with those of an exponential distribution in the limit as b equals 1.
5.2 Two computational methods to estimate the free parameters a and b of Z from a given data sample In this section we present two numerical methods to calculate the scale parameter a and shape parameter b describing the new Z distribution. First of all, since any measurement depends on a specific instrumental sensitivity, we have to introduce a threshold that we name z th . The remaining data sample consists of N observations z i with z i > z th . Then we calculate the mean m and standard deviation s of data z i using Eq. (3), and also the median value m from data z i . Method (1): we investigate the corresponding theoretical moments from Z. In Appendix B we derive the theoretical mean µ (Eq. B5) and median ν (Eq. B3) for the Z distribution with a threshold constraint. Taking the estimates of mean m and median m from the sample as best proxies for the theoretical mean µ and median ν values of Z, we get the following equations: Note that the use of a threshold constraint involves the introduction of a scaling factor A = exp(az b th ), which is present in Eq. (13b). Inserting the algebraic term of a (right side of Eq. 13a into the right side zero-equation Eq. 13b) and using the threshold value of z th yields an equation only for b, which has to be computed iteratively. Once a numerical value of b has been estimated with a sufficient accuracy, we insert this b value into the upper right equation to get the numerical value for a.
We note that in classical statistics the method of moments determines a and b from the mean and variance equations.
In principle this approach should be possible here too, but in practise the algebraic structure of the variance equation is too complicated, see Eq. (B6) in Appendix B. This means that the variance equation, if at all, is only iteratively solvable, whereas the use of the median equation offers an analytical transformation to a. Generally, we recommend to apply the proposed method using the mean and median equations. This straight-forward method is easy to program and produces reliable estimates of parameters a and b.
Method (2): we also present a second method using a maximum likelihood approach, see Appendix C. The parameters are again calculated from two equations (Eq. C5) with Interestingly, the left equation includes a term 1/N z b i , which is the mean of the sample values weighted by power b, whereas the right equation includes the mean 1/N ln z i of logarithmic data and 1/N z b i ln z i . This shows a similarity to the computation of regression points used in Fig. 3. We insert a into the right equation that yields a unique equation for b, which again can be solved iteratively. Once b is fixed, the left equation allows us to determine a. In the following we will test our lidar data samples with these two procedures and we will show that both methods produce almost identical results.

Z distributions applied to ALOMAR data
Applications of the Z distribution to ALOMAR data of maximum backscatter (x), ice mass density (y), ice particle radius (r), and ice number density (n) are shown in Fig. 5. Note that thresholds have been computed from the regression functions (Eq. 5) described in Sect. 3.2 on the basis of x th = 3 × 10 −10 m −1 sr −1 , resulting in y th = 22 mg m −3 , r th = 22.3 nm, and n th = 662 cm −3 . The values of scale parameter a and shape parameter b have been calculated with the method of mean and median equations (method 1). Then the theoretical curves of Z and theoretical values of mean, median, mode, and standard deviation have been calculated by inserting the values of a, b, and threshold z th into Eqs. (B1)-(B6). Obviously the pdf Z sometimes has no simple exponential shape, which is the case for ice mass density, ice radius, and ice number density. As we see in Fig. 5 all Zpdf curves (in blue) match the original data histograms with a high accuracy. The relative error is in a range of about 6 %-10 % except that ice number density has a relative error of 15 %. When we compare the mean, median, mode, and standard deviation derived from the theoretical distribution and corresponding estimates from data samples, we see a precise coincidence of mean and median values. Not surprising this is due to the fact that the parameters a and b have been computed by the mean and median method, which guarantees the preservation of mean and median values. Nevertheless standard deviation and mode also always show a good agreement within the error range. A closer look to the maximum backscatter distribution shows that MBS data are almost perfectly exponentially distributed with b = 0.931, which is not too far away from b = 1 for an exact exponential pdf. As we had already shown, see Sect. 3.1.1, MBS data are very likely exponentially distributed, now the Z-distribution analysis confirms this result. Hence we conclude that the commonly used exponential (g-function) analysis might only be a reasonable statistical method in the case of analyzing MBS lidar data.
In contrast to MBS, the Z distribution of IMD shows a function that converges rapidly to zero for small IMD values. The distribution is described with b = 1.355, which significantly deviates from b = 1 for a precise exponential function. Note that the mode of the data sample at 40 mg m −3 differs from the theoretical mode of 23 mg m −3 because of a relatively high statistical noise in the data. But mean, median, and standard deviation values agree almost perfectly. Similar to IMD, the ice radius distribution indicates a significant nonexponential behavior with b = 1.833. The distribution converges to zero as the radius approaches zero. The curve is skewed to the right and has a maximum at r = 25.8 nm, which differs only slightly from the mode of the data sample at r = 27.8 nm. Again mean, median, and standard deviation values agree almost perfectly.
The sample of ice number density shows a completely different behavior with a slope parameter that is negative with b = −0.819. The physical meaning is that the parameter ice number density is negatively correlated with all other ice parameters. For example, large ice numbers n correspond to small ice radii, IMD and MBS values. As a consequence this leads to a threshold of n in the reverse direction, that is from large values to small values defined by n < n th = 662 cm −3 . One can see this feature in the right tail of Z(n) plotted as a dashed curve, see Fig. 5d. The reverse behavior is also present for small values of n. Small values of n are measured for very bright PMC events with large MBS that have small occurrence rates. Therefore, the number of small ice particles has a relatively high uncertainty due to their low occurrence frequency, and it is this statistical error that produces some deviations from the fit curve to the data in the range of n = 0-80 cm −3 . We note that the numerical procedure computing the pair (a, b) from the method of mean/median (Eqs. 13a, 10b) has automatically detected the existence of a negative slope parameter b without any a priori information. Now we repeat the analysis using method 2. Figure 6 summarizes the (a, b) values and statistical moments calculated from the method of maximum likelihood estimators. As can be seen the maximum likelihood approach computes almost identical results for all ice parameters. We have also added in Fig. 6a-c the histogram bars (in black) for all data being smaller than the threshold. Please keep in mind that the calculation of theoretical distribution curves is based exclu- sively on data larger than the threshold. Hence, decreasing or increasing a threshold will change the specific values of a and b. Figure 6d shows the ice number density distribution where we have added in the histogram (in black) all data being larger than the threshold. Again, also the maximum likelihood method has automatically detected the existence of a negative slope parameter b for the ice number density distribution.

Construction of artificial data
In the derivation of the Z distribution we used the assumption that all ice parameters of maximum backscatter x = MBS, ice mass density y = IMD, ice particle radius r, and ice number density n are connected with one another by the power law given in Eq. (5). In Sect. 5.3 we showed that the Z pdf describes with a high accuracy each distribution of these ice parameters, which, in turn, means that indeed there exists at least an approximative power law between ice parameters. We discuss a suitable justification of this power law relation in more detail in Sect. 6.2. In the following we will show that the use of a Z distribution allows us to construct ar-tificial unknown data samples of various ice parameters that approximate true data to a high degree. We think that such an application is one of the most beneficial outcomes from the new Z-distribution approach. We explain the numerical procedure by the help of a practical example.
We already showed a linear dependance in the logarithmic frame using linear regression (LR) for maximum backscatter and ice particle radius, see Fig. 3b. Hence we can compute artificial ice radius proxies r i , named as LR proxy of true data r i , as a function of MBS-data x i from the regression power law function (Eq. 5) with r i = cx d i and with power law coefficients c = 13.509 and d = 0.497. Figure 7a shows a comparison between LR proxy and original ice radius data where we test the identity of the two data samples. The correlation coefficient is the same as shown in Fig. 3b with R = 0.55. Mean and median values of proxy and original data are almost identical, and a regression analysis shows a perfect identity (c = 1.000 and d = 1.000). Now we calculate the frequency histogram of LR proxy ice radii, see Fig. 7b, and compare the histogram with the original Z distribution of ice radii already shown in Fig. 5c. We find that the LR proxy approximates the mean, median, mode, and standard deviation values of the original Z distribution with an relative error of 9.5 % comparable to the original error of 9.1 %. We conclude that a linear regression analysis of logarithmic data offers a good opportunity to approximate data, provided that a pair of data samples exists that allows for the calculation of power law coefficients c and d from regression methods. Now the Z-distribution approach offers a more general possibility to derive artificial data samples without any knowledge of correlation and regression coefficients. Indeed we will show that results from the Z approach are very close to results from a regression analysis. Again, our goal is to approximate ice radius data from a given maximum backscatter data sample. But now we suppose that no data of ice particle radii r exist, hence any correlation and regression analysis is not possible. First, we assume that a data sample of x = MBS of number N exists and also its Z distribution Z(x, a x , b x ) with a x = 0.140 and b x = 0.931 is well known, see Fig. 5a. Secondly, we assume that we know a priori the form of the Z distribution Z(r, a r , b r ) of ice radius r, e.g., with values of parameters a r and b r from Fig. 5c (a r = 1.269 × 10 −3 , b r = 1.833). Please keep in mind that such information about scale and shape parameters of the ice radius distribution could be also provided from independent satellite measurements that are capable of measuring ice particle radii, e.g., AIM-SOFIE.
Our new proxy method (Z proxy) requires the following transformations. We first transform the x i values (i = 1, . . ., N ) into the z domain with x i −→ z i : z = a x x b x i followed by a second transformation with z i −→ r i : r i = (z i /a r ) 1/b r resulting in Note that the derivation of c and d in Eq. (15) is based on the same mathematical steps when we developed the Z distribution from Eqs. (8) to (9). Inserting the a x , b x , a r , and b r values into Eq. (15) determines the power law coefficients for Z proxy r with c = 12.994 and d = 0.508. These values do not exactly coincide with c and d values obtained from the regression method, see above, but the identity test between Z proxies and true ice radii shows a very good coincidence, see Fig. 7c. Again, mean and median values of proxy and original data are practically identical, and a regression analysis shows an almost perfect identity (c = 1.095 and d = 0.980). Finally we calculate a frequency histogram of Z proxies, see Fig. 7d, and find a good agreement between proxies and true Figure 7. (a) Proxy p of ice radius versus original ice radius data without any threshold. The proxy has been derived from maximum backscatter data using the fit function that has been estimated by linear regression (LR proxy) between original logarithmic MBS and ice radius data, see Fig. 3b. (b) Frequency distribution of LR proxy (gray and black histogram) with a threshold r th = 23.3 nm. For comparison we also plot the original Z-pdf curve (blue) from the analysis of original ice radius data, see Fig. 5c. The relative error describes the accuracy between LR-proxy data and original Z-function fit. (c) Same as (a), but for Z-proxy data resulting from the Z-pdf analysis of MBS data, see text for more details; (d) same as b with Z-proxy data.
pdf. Mean, median, and standard deviations of Z-proxy data correspond perfectly to original ice radius data, and the relative error has now even decreased to 9.2 %. We summarize that we present a new method in order to construct artificial data samples provided Z-descriptions of these data sets exist. By means of a consecutive arranging of ice parameters starting at a given data sample, this method allows us to construct any artificial data sample within (x, y, r, n). This method can also be applied to other data sets, e.g., ice parameter measurements from satellite observations. For example, a data sample of ice water content (IWC) obtained from satellite measurements might be analyzed in terms of a Z distribution estimating the scale and shape parameters a and b of the IWC distribution. This would allow us to establish a connection between satellite IWC data to lidar data samples (x, y, r, n) through Eq. (15), hence the satellite IWC data could be transferred to lidar maximum backscatter, ice mass density, ice particle radius, and ice number density. Vice versa the knowledge of a satellite IWC Z distribution would allow us to transform lidar observations into IWC proxies and compare these with the original IWC observed by the satellite. We think that our proposed transformation method could be very helpful to connect different ice parameter data from different instruments, either from satellite observations or ground-based measurements. We also think that this new approach might be important in trend analysis of PMC.
In the next section we will discuss the power law assumption (Eq. 5) and the physical meaning of the shape parameter b, which might be introduced as a new trend variable in the analysis of PMC long-term changes.

Discussion of the power law assumption between PMC parameters
In this section we discuss some theoretical aspects of the power law dependence on ice parameters in order to validate the justification of Eq. (5). We use again the assumption as already discussed in Sect. 2 that at the altitude of maximum brightness (MBS) and ice mass density (IMD) there exists in the real atmospheric background an ice particle distribution that is perfectly Gaussian-distributed (N i ) as We also assume that the geometric shapes of these ice particles are spheres with ice radii τ with mean radius r i and variance σ 2 i . Again index i = 1, . . ., N relates to the ith measurement in a given data sample of number N . N i is normalized to ∞ 0 N i (τ )dτ = 1. When we assume an ice number density of n i particles per cubic centimeter we get the expression ∞ 0 n i · N i (τ )dτ = n i . Note that mean ice radii r i and ice number densities n i are elements of our lidar data climatology, which we have introduced in Sect. 2.
Furthermore, we assume from the analysis of lidar observations (three-color measurements) that the relation of mean radius and variance is according to σ i = 0.4 r i for r i < 37.5 nm and σ i = 15 nm for r i ≥ 37.5 nm (Baumgarten et al., 2010). This assumption has also been applied in the analysis of AIM/SOFIE-CIPS PMC satellite data (Lumpe et al., 2013;Hervig and Stevens, 2014). In order to simplify calculations we apply this relation σ i = 0.4 r i also for r i larger than 37.5 nm. We now investigate the question of which backscatter lidar and ice mass signals result from such an ice distribution.
We compute the mass of a spherical ice particle with radius τ as 4/3π ρ ice τ 3 with density of ice ρ ice = 932 kg m − 3. The backscatter signal from a single ice particle is calculated as a L τ 5.8 with the lidar constant a L = 1.5 × 10 −11 m 2 . Then the maximum backscatter x i and ice mass density y i are estimated by an integration of the radius distribution from zero to infinity as assuming a constant number density n of ice particles. Only the integral of y i is analytically computable with a solution in which the error function defined by the integral erf(x) = 2/ √ π x 0 exp(−t 2 )dt is part of the solution: The integral for x i includes the term τ 5.8 that arises from Mie-scatter theory for light scattering of a wavelength of 532 nm (ALOMAR RMR lidar) at spheres in a range of radii with 1-100 nm. The exponential value of 5.8 approximates exact Mie-scatter calculations with a relative error less than 0.5 % in this radii range. Unfortunately, the integral can only be solved analytically if the exponent is an integer number as 5 or 6. Nevertheless, we are able to solve this integral by means of numerical methods with the specific exponent of 5.8. In a next step, we construct analytical approximations f x (r i ) and f y (r i ) for both integral solutions using a typical value of n = 200 cm −3 with The linear constants a 1 and a 2 with values a 1 = 4.20 and a 2 = 1.47 are optimal dimensionless parameters. f y approximates the analytical solution of y i with a relative error less than 0.7 % in the range r i = [0, 45 nm], and less than 1.2 % in the range r i = [45, 70 nm]. A precise solution of x i resulting from numerical methods of integration is approximated by f x with a relative error less than 0.3 % in the range r i = [0, 45 nm], then the relative error increases linearly to a maximum error of 5 % at r i = 70 nm. We find that the solutions f x and f y approximate the general power law condition p = cq d (Eq. 5) inside a small error range. Hence these analytical examples show that MBS is a function of ice radius proportional to ∼ r 5.8 (d = 5.8), the same is also true for IMD (∼ r 3 , d = 3). It also follows that MBS and IMD are consequently connected through a power law condition with MBS ∼ IMD 5.8/3 (d = 5.8/3 = 1.93). But the new form of the Z-distribution technique opens up whole new perspectives for the validation of the analytical examples based on the ALOMAR lidar data samples. We transform the z distribution of IMD into the MBS domain using Eq. (15) with y i −→ z i : z i = a y y b y i and z i −→ x i : with c = a y a x

U. Berger et al.: Probability density distributions of PMC
We insert into Eq. (16) the values of a x = 0.140, b x = 0.931, a y = 4.321 × 10 −3 , and b y = 1.355 from Fig. 5a and b and get c = 0.023 and d = 1.46. The power constant (d = 1.46) derived from the shape parameters b x and b y of the Zdistribution analysis of real ALOMAR IBS and IMD data is significantly different from the power estimate (d = 1.93) belonging to the analytical example that necessitates various assumptions, e.g., Gaussian-distributed ice particles at the height of maximum backscatter, constant ice particle number, or spherical shape of ice particles. Hence, we conclude that the determination of shape parameters b from a Zdistribution analysis of observational data therefore provides a qualitative indication of the actual microphysical state that controls real ice formation processes. This leads to the idea that as a future task long-term changes in PMC formation might be characterized by potential long-term changes in b that indicate long-term changes in atmospheric background conditions and microphysical ice constraints of ice formation.

Summary and conclusions
In this study we present a new method to describe statistical probability density functions (pdfs) for different ice parameters of PMC. We analyze a climatology of ice seasons from 2002 until 2016 as measured by the ALOMAR lidar. From this data set we derive ice cloud parameters of maximum backscatter, ice mass density, ice radius, and ice number density whose occurrence frequencies are investigated with respect to exponential distributions. We show that only maximum backscatter follows an exponential distribution, whereas ice mass density, ice radius, and ice number density frequencies fail to fit satisfactorily to an exponential distribution. The reason for these deviations from exponential behavior is based on the fact that these ice parameters are not linearly dependent on each other.
We introduce a new probability density distribution (Z function, see Eq. 9) that instead assumes a general power law relation among ice parameters, see Eq. (5). The new Z distribution is described by two free constants with scale parameter a and the shape parameter b. We point out that the new distribution is closely related to a more general Weibull distribution. The new distribution has been applied to maximum backscatter, ice mass density, ice radius, and ice number density data from the ALOMAR data set. As a result all data distributions are described with a high accuracy by Z. We discuss that the exponential distribution (g function) is a special case of the more general Z function with shape parameter b = 1. We present two numerically stable methods (method of mean and median, method of maximum likeliness) that allow to derive the values of free constants a and b describing the actual Z-function shape for a given data sample.
Perhaps the most important application of the new method is the possibility to construct unknown data sets for different ice parameters that approximate true data to a high degree. We show in Sect. 6.1 that a linear regression analysis in a logarithmic data frame offers a good opportunity to approximate data provided that a pair of data samples exists that allows for the calculation of power law coefficients c and d from regression methods. The Z-distribution approach offers a more general possibility to derive artificial data samples without any knowledge of correlation and regression coefficients. This allows for the connection of different observational PMC distributions of lidar and satellite data, and also with distributions resulting from ice model studies. In particular, the statistical distributions of different measured ice parameters can be compared with each other on the basis of a common assessment that again should be helpful in combining trend analysis of PMC long-term time series from different observational data sets.
Appendix A: Properties of the exponential distribution (g function) When considering a threshold (x th ) the exponential pdf E(x) is normalized according to A ∞ x th E(x)dx = 1 with a scaling factor A = exp(αx th ). It follows that the mean µ is then given by This yields for the mean µ = x th + 1/α.
The median ν denotes the boundary of separating the higher half from the lower half of the distribution with The equation is solved for the median with ν = x th + ln(2)/α.
Inserting µ = x th + 1/α simplifies the algebraic expression and shows that the variance σ 2 (standard deviation σ ) is independently from a given threshold:

Appendix B: Properties of Z distribution
In the following all quantities take into account a threshold z th . We introduce a scaling factor A = exp(az b th ). Setting the threshold to zero means a scaling factor A = 1 and gives the regular expressions for cumulative pdf, median, mode, mean, and variance, see Eqs. (12a)-(12d).
Probability density function Z(z ≥ z th , a > 0, b = 0): Cumulative form of Z cum : The derivative with respect to parameter a is and for parameter b Setting each of the derivatives equal to zero yields for a and b.
These are the maximum-likelihood estimators for scale parameter a and shape parameter b.