The behavior of high-CAPE summer convection in large-domain large-eddy simulations with ICON

Abstract. Current state of the art regional numerical weather prediction (NWP) models employ kilometre scale horizontal grid resolutions thereby simulating convection within its grey-zone. Increasing resolution leads to resolving the 3D motion field and has been shown to improve the representation of clouds and precipitation. Using a hectometer-scale model in forecasting mode on a large domain therefore offers a chance to study processes that require the simulation of the 3D motion field at small horizontal scales, such as deep summertime moist convection, a notorious problem in NWP. We use the Icosahedral Nonhydrostatic weather and climate model in large-eddy simulation mode (ICON-LEM) to simulate deep moist convection distinguishing between scattered, large scale dynamically forced and frontal convection. We use different ground and satellite based observational data sets, that supply information on ice water content and path, ice cloud cover and cloud top height on a similar scale as the simulations, in order to evaluate and constrain our model simulations. We find that the timing and geometric extent of the convectively generated cloud shield agrees well with observations while the life time of the convective anvil was, at least in one case, significantly overestimated. Given the large uncertainties of individual ice water path observations, we use a suite of observations in order to better constrain the simulations. ICON-LEM simulates cloud ice water path that lies in-between the different observational data sets but simulations appear to be biased towards a large frozen water path (all frozen hydrometeors). The bias in frozen water path and the longevity of the anvil are little affected by modifications of parameters within the microphysical scheme. In particular one of our convective days appeared to be very sensitive to the initial and boundary conditions which had a large impact on the convective triggering, but little impact on the high frozen water path and long anvil life time bias. Based on this limited set of sensitivity experiments, the evolution of locally forced convection appears to depend more on the uncertainty of the large-scale dynamical state based on data assimilation than of microphysical parameters. Overall, we judge ICON-LEM simulations of deep moist convection to be very close to observations regarding timing, geometrical structure and cloud ice water path of the convective anvil, but other frozen hydrometeors, in particular graupel, are likely overestimated. Therefore, ICON-LEM supplies important information for weather forecasting and forms a good basis for parameterization development based on physical processes or machine learning.


shape, to explore the sensitivity to model error. In the literature one can find numerous studies of the sensitivity of convective storms and tropical cyclones to cloud microphysics (Wang, 2002;Milbrandt and Yau, 2006;Li et al., 2009;Van Weverberg et al., 2012;Bryan and Morrison, 2012, among many others). Most of them report significant sensitivity especially through 95 the impact of evaporation and melting on the strength of the cold pool. Those sensitivity experiments are important for understanding the uncertainty connected with convectively generated precipitation and climate relevant aspects such as the longer term impact of convection on the upper tropospheric water budget.
To investigate the uncertainty of convection in high-CAPE weather situations, we first select several summer convective events over Germany that feature (i) strong and deep convective cells with little advection (e.g. 4 July 2015 extending into  Table 1 for a list of all considered days. To evaluate the performance of the control and sensitivity simulations of summer continental convection, we use groundbased and satellite observations from polar orbiting and geostationary sensors. To assess the quality of the high resolution 105 simulations we rely on a suite of satellite ice water path (IWP) products representing the range of uncertainty in state-of-the-art retrievals. Furthermore, cloud ice water content (IWC), cloud top height (CTH) and an instrument-like ice cloud cover (ICC) conclude the evaluation of deep convective clouds.
The challenge to provide a meaningful comparison of cloud ice related quantities with spaceborne observations was reported in Waliser et al. (2009). Several follow-up studies (Eliasson et al., 2011;Waliser et al., 2011;Stein et al., 2011;Li et al., 2012; 110 Eliasson et al., 2013;Li et al., 2016;Duncan and Eriksson, 2018) discussed the importance of considering the uncertainties in satellite IWP observations and their limitations for model evaluation. In order to analyze simulated cloud ice, it is necessary to know the unavoidable constraints of satellite observations. These range from retrieval sensitivities to microphysical assumptions (Yang et al., 2013), spatial and temporal sampling characteristics (Eliasson et al., 2013) and ultimately limitations that are determined by instrument type (active or passive sensors). This study uses a suite of observational data sets that reflects 115 a realistic range of retrieval uncertainties for constraining the simulated cloud ice. These data sets encompass passive optical observations with high temporal resolution by the Meteosat Second Generation (MSG) satellite as well as with high spatial resolution by polar orbiting platforms. To explicitly show uncertainties of satellite ice products, different retrieval results are shown. In addition, a passive microwave sensor is also considered to complement the optical instruments.
The structure of the paper is as follows. Section 2 gives a synoptic overview of the selected cases to describe the meteo-120 rological background of the convective events. We describe the model simulations and the observations used for verification in Sect. 3 and 4. The evaluation of the ICON-LEM against observations is detailed in Sect. 5, while Sect. 6 describes the sensitivity studies for varying boundary and initial conditions and model physics before we conclude in Sect. 7.
types. In Fig. 1 snapshots of SEVIRI satellite images are juxtaposed with synthetic SEVIRI images for the respective days. The synthetic SEVIRI (Spinning Enhanced Visible and Infrared Imager) images were produced with RTTOV (Radiative Transfer for TOVS; Saunders et al., 1999Saunders et al., , 2018, using as input ICON-LEM profiles of temperature, specific humidity, cloud liquid water content (LWC) and cloud ice water content (IWC), as well as simulated surface skin temperature and 10 m wind speed.
The ice optical properties come from the Baran parametrization (Vidot et al., 2015) and trace gas profiles were set to the 130 RTTOV reference profiles. The RGB composites use the 0.6 micron reflectance for the red channel, the 0.8 micron reflectance for the green channel, and the average of the 0.6 micron and 0.8 micron reflectance for the blue channel. In addition, simulated CAPE values of ICON-LEM are displayed in the lowermost row in Fig. 1 for the respective time slices indicating atmospheric unstable regions.
The first selected day covers the evolution of a frontal zone on 20 June 2013. Germany lay between a ridge of an anticyclone 135 spanning from the central Mediterranean Sea to the Baltics and a low pressure system in France. Organized convection developed all day along a convergence zone, predominantly in the western and northern part of Germany favored by hot surface temperatures above 35°C under unstable atmospheric conditions. Heavy rainfall including large hailstones above 5 cm has been reported for this day. Comparing the real and synthetic satellite images for 20 June 2013 in Fig. 1 (top and middle rows in column (a)) shows similar cloud structures around noon. The simulated CAPE field reflect huge potential of highly unstable 140 regions (CAPE values over 3000 J kg −1 ) above Germany. Based upon this single metric it can be seen that once convective inhibition is overcome, the potential to produce strong updrafts is given almost everywhere.
Furthermore, a 48 hour period starting at 0 UTC on 4 July 2015 has been chosen, which witnessed multiple local explosive convection cells on the first day and convection connected with a more synoptic scale frontogenesis on the second day (columns (b) and (c) in Fig. 1). For both days temperatures of nearly 40°C have been registered, which support localized triggering of 145 convection under unstable atmospheric conditions. Both criteria (high surface temperatures and unstable conditions in the lower and mid-troposphere) have been fulfilled on 4 July, leading to the formation of a couple of convective cells over the northern part of Germany. The development of these cells was quite explosive, resulting in a strong upward transport of moisture.
Despite the convective region being highly localized, upper tropospheric detrainment of moisture and ice by deep convection created an extensive cirrus shield covering the complete northeastern part of Germany by the evening (not shown). Although 150 the comparison of the observed and simulated cloud fields in Fig. 1b reveals structural differences, the overall ability of the model to simulate confined convective cells is clearly visible in the CAPE field. Circular white areas of consumed CAPE are located in the northern part of Germany surrounded by regions of higher CAPE.
The situation on 5 July is in the morning characterized by the decay of the large scale convective system of the previous day and later by a transition of a front aided by dynamical lifting induced by an upper air trough located over the North Sea. The 155 satellite image in Fig. 1c shows the passage of the frontal system. The model produces an excessively large cloud structure that For this study we focus on three days which feature strong convection characterized by high CAPE (convective available 190 potential energy). The three days, representing different types of convective development, are: 20 June 2013 and 4-5 July 2015.
A more detailed synoptic description for these days is given in Sect. 2. We further analyze five additional high-CAPE summer convection days, including small scale scattered convection (Table 1). These cases are analyzed in a statistical manner together with the three focus days in section 5.2, which summarizes the overall performance of ICON-LEM to represent atmospheric ice quantities in connection with deep convection.

195
Several sensitivity experiments have been conducted. The first set of additional simulations investigate the dependence of model performance on the initial and lateral boundary conditions (lbc). Two additional analyses from ICON-NWP (using the forecast system of DWD based on ICON) and IFS (cycle 41r1) models with lower spatial resolution (Table 2) have been remapped onto the ICON-LEM grid in order to initialize and force the high-resolution model during runtime. Using different/coarser analysis allows us to address the sensitivity of ICON-LEM to large-scale forcing. Because ICON was made 200 operational at DWD in 2015, this analysis has only been performed for the 4-5 July 2015 case (Sect. 6.1).
A second set of sensitivity experiments deals with changes to the two-moment microphysics scheme of Seifert and Beheng (2006a, b) (Appendix A3; Tab. A1). We focus on the sensitivity simulations connected with ice crystal properties. In order to account for different ice crystal geometries and associated fall velocities based on Heymsfield and Kajikawa (1987), two separate simulations have been performed specifying cloud ice as hexagonal plates (simulation: 'hexPlate') or dendrites 205 (simulation: 'dendrite'), both of which have lower terminal fall velocities compared to the default setup. A further sensitivity experiment, named 'stickLFOhigh', explores the impact of increased sticking efficiencies during ice hydrometeor collisions (snow-snow, ice-ice, snow-ice and graupel-snow) using parameters from Lin et al. (1983). The modified coefficients for the different sensitivity experiments are shown in Table 3. These simulations have been performed on the coarsest model grid of 625 m (DOM01). All microphysical sensitivity studies correspond to the 5 July 2015 case and are discussed in Sect. 6.2. Only 210 for these microphysical sensitivity studies we make use of an explicit coupling of the two-moment microphysics scheme with radiation by calculating the effective radii of cloud ice and cloud droplet based on the predicted mass and number densities and the assumed particle size distribution.

Observational methods and data sets
We use ground-based as well as satellite-based observations to evaluate our simulations. Several previous studies have stated the 215 differing magnitude and sampling characteristics of satellite-observed IWP or IWC (Waliser et al., 2009;Eliasson et al., 2011;Hong and Liu, 2015;Duncan and Eriksson, 2018). In evaluating the vertical and temporal distribution of simulated atmospheric ice in terms of IWP or IWC it is crucial to use multiple observational data sets representing a range of algorithms in order to estimate retrieval errors and uncertainties. For that reason, model simulations are compared to eight different observational methods, each of which has its own advantages and limitations.

220
For a vertically resolved point-to-point evaluation of the simulations at different sites, two ground-based observations have been taken into account: 7 https://doi.org/10.5194/acp-2020-635 Preprint. Discussion started: 14 July 2020 c Author(s) 2020. CC BY 4.0 License.

225
-SEVIRI CiPS (Cirrus Properties from SEVIRI, Strandgren et al., 2017a) -SEVIRI SatCORPS (The Satellite ClOud and Radiation Property retrieval System, Minnis et al., 2011, Trepte et al., 2019 -SEVIRI APICS (Algorithm for the Physical Investigation of Clouds with SEVIRI, Bugliaro et al., 2011) -SEVIRI CPP (Cloud Physical Properties from SEVIRI, Roebeling et al., 2006)  -SPARE-ICE (Synergistic Passive Atmospheric Retrieval Experiment-ICE, Holl et al., 2014) Four of them provide ice cloud properties with 15 min temporal resolution from the 12-channel SEVIRI imager aboard the geostationary MSG satellites (Schmetz et al., 2002), while two of them are from polar orbiting satellites (see next sections for details). The different methods and characteristics of the observational data sets are described in the following.

RAMSES
RAMSES is the operational high-performance multi-parameter Raman lidar at the Lindenberg Meteorological Observatory (Reichardt et al., 2012). It is equipped with a water Raman spectrometer (Reichardt, 2014) that facilitates direct measurements of cloud water content (CWC) on a routine basis. It is thus well suited for cloud microphysical studies, or for evaluating cloud models or the cloud data products of other instruments. However, such CWC measurements are only possible at night, under 240 favorable atmospheric conditions and often only in the lower cloud ranges, because the Raman return signals from clouds are extremely weak, which makes them particularly vulnerable to background light and light extinction. For cirrus clouds it was possible to overcome this limitation by developing a retrieval technique which allows to estimate IWC under all measurement conditions (see Appendix A1 and Fig. A1 for more details). The new method was applied to the case study of 4-5 July 2015 in Sect. 5.1.

Cloudnet
The ground-based data set of Cloudnet provides synergistic products from 35 GHz cloud radar, ceilometer, and multi-frequency microwave radiometer measurements. These products are derived for the observation sites Jülich, Leipzig, and Lindenberg using the same retrieval package developed in Cloudnet (Illingworth et al., 2007). Measurements are performed day and night, data are provided with a temporal and vertical resolution of 30 s and 60 m, respectively. Due to the low attenuation of the radar 250 signals at these wavelengths, the clouds are detected almost in their entire vertical extent depending on the radar sensitivity.
As the first step, the retrieval performs a target classification including the determination of cloud base and top. Radar profiles of reflectivity, Doppler velocity, and ceilometer backscatter profiles are used for this purpose, as well as temperature and humidity profiles provided by a NWP model (e.g. COSMO-DE for Lindenberg) or radiosoundings. Vertical profiles of LWC and IWC are derived subsequently. For echoes classified as ice, IWC is calculated from radar reflectivity and temperature 255 using an empirical formula, which was derived on the basis of a large mid-latitude aircraft data set (Hogan et al., 2006). The random error of the IWC retrieval is approximately between +50 % and -33 % for IWC values in the range of 0.03 and 1 g m −3 .
A potential systematic error in IWC, which is mainly caused by systematic errors in radar reflectivity, is of the same order of magnitude assuming a radar calibration error of 2 dBZ. It should also be noted that due to the limited sensitivity of the cloud radar, very thin clouds (with small ice crystals) may not be detected. Orthogonal Polarization (CALIOP) instrument (Winker et al., 2009) are used. Day and night coverage, a temporal resolution 265 of up to 5 min, and a spatial resolution of 3 km at nadir, makes the algorithm ideal for evaluating the temporal evolution of high cloud fields. CiPS targets thin cirrus clouds, detecting, compared to CALIOP, about 50, 60, and 80 % of cirrus clouds with an ice optical thickness of at least 0.05, 0.08, and 0.14 (Strandgren et al., 2017a), which corresponds to an IWP of roughly 0.6, 1.0, and 3.0 g m −2 , respectively. The CTH retrieved by CiPS has an average error of 10 % or less for cirrus clouds with a top height greater than 8 km, again with respect to CALIOP. The high sensitivity of CiPS to thin cirrus does, however, lead 270 to a quick saturation of the IWP and τ retrievals in thicker cirrus clouds. Maximum IWP and τ amount to approximately 100 g m −2 and 4, respectively. This makes the algorithm unsuitable for the evaluation of modeled IWP in this paper, where thick convective clouds are analysed, but CiPS is an ideal tool to study e.g. the spatial extent of anvil cirrus from the convective outflow including the optically thinner cloud edges.

275
The Algorithm for the Physical Investigation of Clouds with SEVIRI (APICS, Bugliaro et al., 2011) computes optical thickness τ and ice crystal effective radius r eff for pixels identified as cirrus by CiPS, by means of the Nakajima-King method (Nakajima and King, 1990) using two SEVIRI solar channels centred at 0.6 and 1.6 µm. IWP is derived from these two quantities (τ , r eff ) under the assumption of a vertically homogeneous cloud layer using the relationship IWP = 2/3ρ ice r eff τ , where ρ ice = 917 kg m −3 is the density of ice. The algorithm assumes the general ice crystal shape mixture from Baum et al. (2011).

SEVIRI SatCORPS
The Satellite ClOud and Radiation Property retrieval System (SatCORPS) is a comprehensive set of algorithms designed to retrieve cloud micro-and macrophysical information day and night from meteorological satellite imager data. These algorithms 285 were originally developed for the NASA Clouds and Radiant Energy Systems (CERES) project (Minnis et al., 2011, Trepte et al., 2019 and adapted for application to other polar-orbiting and geostationary imagers, including SEVIRI. Using radiances in the 0.6 µm (visible), 3.9 µm (shortwave-infrared), 10.8 µm (infrared), and 12.0 µm (split-window) bands, three different methods are employed depending on time of day and cloud opacity to retrieve cloud optical thickness (τ ), ice crystal effective diameter (D eff = 2r eff ), and cloud effective temperature (T c ).

290
During daytime, the Visible Infrared Shortwave-infrared Split-window Technique (VISST) uses the visible, shortwaveinfrared, and infrared radiances to determine τ , D eff , and T c , respectively, by an iterative process that also exploits the splitwindow band to aid in phase determination. The VISST is similar in essence to the classic Nakajima and King (1990) bispectral method.
For thin non-opaque cirrus (τ < 8) during nighttime, the Shortwave-infrared Infrared Split-window Technique (SIST) 295 retrieves the same parameters from brightness temperature differences between the shortwave-infrared and infrared bands and those between the infrared and split-window bands. The VISST/SIST reflectance lookup tables (LUTs) and emittance parametrizations are calculated for smooth solid hexagonal ice crystals. Assuming that the retrieved ice crystal effective diameter represents the average over the entire cloud thickness, IWP is computed from the following cubic equation: For thick opaque ice clouds (τ > 8) during nighttime, the Ice Cloud Optical Depth from Infrared using a Neural network (ICODIN) method is used (Minnis et al., 2016), complementing the SIST applicable to semitransparent cirrus. The ICODIN retrieves τ and IWP by training shortwave-infrared, infrared, and split-window radiances against the CloudSat radar-only 2B-CWC-RO product (Austin et al., 2009), which includes vertical profiles of IWC and ice particle effective radius. The method 305 can be used to derive ice cloud τ up to 150; however, τ and thus IWP for the deepest convective clouds is still frequently underestimated. According to equation (1), with a maximum τ of 150 and a maximum effective diameter of 150 µm, the maximum IWP that can be derived using this approach is ≈ 8100 g m −2 . Also note that near the terminator, the weak solar component in the 3.9 µm band increases the uncertainty in the opaque vs. semitransparent cloud classification and can result in the use of default values for τ (16 or 32), which are significant underestimates in deep convective clouds (see the sudden 310 dip in IWP around 18 UTC in Fig. 4). Nevertheless, SatCORPS is the only geostationary retrieval used here that provides IWP during both day and night for thin and thick ice clouds. The pixel-level 15-minute temporal resolution SEVIRI SatCORPS data were obtained from NASA Langley Research Center (satcorps.larc.nasa.gov).

SEVIRI CPP
The Cloud Physical Properties (CPP) algorithm (Roebeling et al., 2006) is a bispectral method (Nakajima and King, 1990), 315 which uses SEVIRI 0.6 µm and 1.6 µm solar reflectance measurements to retrieve cloud optical thickness and ice particle effective radius during daytime. The retrievals are based on LUTs of top-of-atmosphere reflectances calculated for planeparallel layers of randomly oriented monodisperse roughened hexagonal ice crystals (Hess et al., 1998). Assuming no vertical variation in ice crystal size, the IWP is calculated as for APICS, although the density of ice is assumed to be ρ ice = 930 kg m −3 .
Specifically, we use data from the CLoud property dAtAset using SEVIRI -edition 2 (CLAAS-2) archive provided by the 320 EUMETSAT Satellite Application Facility on Climate Monitoring (Benas et al., 2017). The pixel-level IWP retrievals are available every 15 minutes at a spatial resolution of ≈ 6 km over Germany. For this algorithm, maximum retrieved optical thickness and effective radius are 100 and 62.5 µm respectively, which result in a maximum IWP of ≈ 3900 g m −2 .

MODIS
MODIS is a 36-channel imager with spatial resolutions of 250, 500 or 1000 m at nadir and with a swath width of 2330 km. It 325 is the key instrument aboard the Terra and Aqua NASA satellites and provides global coverage every 1 or 2 days. The MODIS cloud microphysical products are also obtained by the Nakajima and King (1990) bi-spectral method and provide daytime estimates of cloud optical thickness and ice particle effective radius from solar reflectances measured in a non-absorbing visible band and a water-absorbing near-infrared band (Platnick et al., 2017). Three different spectral cloud retrievals are performed by combining the 0.66 µm channel separately with the 1.6 µm, 2.1 µm, and 3.7 µm channel, although here we only 330 use the primary 0.66 µm -2.1 µm channel pair. In the latest Collection 6 algorithm, the plane-parallel reflectance LUTs are calculated for a single ice shape of severely roughened compact aggregates composed of eight solid columns. Assuming a vertically homogeneous cloud, the IWP is derived as for SEVIRI APICS and SEVIRI CPP. The 1 km resolution IWP retrievals are available twice a day from the Terra and Aqua satellites, which are in a 1030 Local Solar Time (LST) descending node and 1330 LST ascending node sun-synchronous polar orbit, respectively. Maximum retrieved optical thickness and effective radius 335 are 100 and 60 µm, yielding a maximum retrieved IWP of ≈ 3700 g m −2 .
The exclusion of solar reflectances from SPARE-ICE allows retrievals both day and night; however, the reliance on microwave measurements results in fairly large footprints varying from 16 km in diameter at nadir to 52 × 27 km 2 in areas at the edge of the scan. The lower and upper sensitivity limits of SPARE-ICE are 10 g m −2 and O(10 4 ) g m −2 , respectively, with the median fractional error between SPARE-ICE and 2C-ICE IWP being a factor of 2. For the current study, data are available from the 345 MetOp-A/B (0930 LST descending node) and NOAA-18/19 (1500-1630 LST and 1330-1400 LST ascending node) satellite overpasses.

Interpretation of satellite IWP retrievals
Despite the wide variety of available satellite instruments (imagers, sounders, lidar, radar) and retrieval methods exploiting the information obtained with these instruments, determining atmospheric ice mass has been recognized as a great challenge 350 for remote sensing (Waliser et al., 2009;Eliasson et al., 2011), which has seen only limited progress in the past decade as large discrepancies in IWP remain among satellite data sets (Duncan and Eriksson, 2018). In this context, "ice" represents all frozen hydrometeors, including the smaller suspended (or floating) cloud ice as well as the larger precipitating forms such as snow, graupel, and hail (in the following referred to as habits). Current satellite retrieval methods are unable to truly distinguish suspended ice from precipitating ice, which makes estimates from these techniques rather uncertain in thick, multi-355 layer, mixed-phase and mixed-habit cloud fields. The measured signal, and hence the derived ice mass, is a weighted sum of the individual contributions from the different ice habits. Habit weighting, however, varies by retrieval method and is poorly characterized if at all, which complicates model-satellite comparisons because the various satellite products all refer to "ice water path", without any qualifying caveats about their differing sensitivities. In turn, this also means that different instruments are sensitive to different ice cloud types (Eliasson et al., 2011) such that several space borne sensors are needed to cover the 360 full range of ice clouds.
Passive VIS-NIR methods can derive IWP only indirectly, from optical thickness and effective particle size. However, they infer particle size from cloud-top measurements and usually provide an estimate of cloud-top ice particle size. Thus, they are unable to obtain information about ice particle sizes in lower layers inside vertically thick clouds and the used bulk IWP formulas that assume vertical homogeneity (see Sect. 4.4, 4.6 and 4.7) cannot a priori account for vertical variations in extended 365 clouds.
Furthermore, these methods are subject to saturation effects (affecting normally a few percent of pixels in our analyzed scenes, mainly the convective cores; in situations with large scale convective activity many pixels may be affected e.g. 20% of pixels on the 20th June 2013), because visible reflectance loses sensitivity to optical thickness in thick clouds. As a result, the maximum reported optical thickness is truncated at a threshold value varying between 100-200 depending on the data product.

370
The maximum reported ice particle effective radius also varies among data sets, although in a narrower range, depending on the ice optical properties used. In addition, the retrieved optical thickness and particle effective radius strongly depend on the assumed ice particle shape (smooth or roughened, solid or hollow, hexagonal columns or aggregates etc.), even for unsaturated input reflectances. For instance, Eichler et al. (2009) show that for thin ice clouds with an optical thickness between 3-5, the choice of ice particle shape leads to uncertainties of up to 70% for optical thickness and 20% for effective radius. Retrievals in 375 deep convective clouds have uncertainties of similar magnitude or even larger. As a last source of uncertainty one has to mention that the passive optical retrievals assume the cloud to consist of either ice or liquid water clouds according to their cloud top phase. When in convective clouds both phases are present -liquid water in the lower and ice in the upper part, with a mixed phase layer in between -the retrieved IWP accounts in part for the liquid water layers and thus tends to overestimate the real Nevertheless, the combination of all the above effects can easily lead to a factor of 2-3 variation in the estimated domain-mean IWP. In our VIS-NIR satellite data, SEVIRI CPP shows the smallest IWPs and SEVIRI SatCORPS the largest ones, with SEVIRI APICS and MODIS values being in between (see Fig. 4), providing a broad range of estimates reflecting the current state-of-the-art.
In contrast, the CloudSat/CALIPSO active radar-lidar measurements, which were used to train the SPARE-ICE algorithm, 385 have better vertical profiling capability, but their sensitivity is markedly shifted to the larger ice hydrometeors. Therefore, the mass contribution from the smaller suspended cloud ice particles, albeit relatively small in convective clouds (Waliser et al., 2009), is likely underestimated in these IWP retrievals. This might be for instance related to high IWCs produced by the presence of numerous small ice crystals (e.g. Lawson et al., 2010;Gayet et al., 2012) that cannot satisfactorily be accounted for by the radar (Gayet et al., 2014). Furthermore, deep convection represents a challenge also for CloudSat/CALIPSO for 390 various reasons. On one side, heavy precipitation can lead to full attenuation of the radar signal while multiple scattering plays a relevant role which is difficult to consider properly (Matrosov et al., 2008). On the other side, typical assumptions in ice cloud retrievals for CloudSat/CALIPSO are not valid inside dense hail and graupel produced in deep convection (Delanoë and Hogan, 2010). Finally, passive microwaves can also hardly fully distinguish the ice from the liquid water fraction, especially in cloud layers with mixed phase. As a last issue, the different spatial resolutions of the satellite measurements must be mentioned. Since

395
MODIS provides the finest resolution, SEVIRI an intermediate resolution and SPARE-ICE the coarsest, MODIS is able to catch peaks of high IWP that are smoothed out in the other two observational data sets. However, the differences in instantaneous pixel-level estimates due to different spatial resolutions are largely reduced in domain-mean IWP. In summary, we expect SPARE-ICE to provide the largest IWPs due to the inclusion of graupel and hail, although the SatCORPS passive VIS-NIR retrieval can also produce IWPs of comparably large magnitude, depending on the specific ice shape and IWP parametrization 400 formula used.
In our model validation effort, we follow a somewhat qualitative rule of thumb recommended by Waliser et al. (2009)  including the larger agglomerates such as snow (qs), graupel (qg), and hail (qh) within the two-moment microphysics. Please refer to Sect. 4.9 for a discussion about the sensitivity of the single satellite retrievals to different ice classes.

Evaluation of ICON-LEM simulations against observations
We focus on ice cloud properties in the ICON-LEM simulations, which have until now been only evaluated in simulations over the equatorial Atlantic . More specifically, the impact of deep summertime convection on ice cloud properties is investigated over Germany. We focus on a few case studies (Sect. 2) and study the evolution of the convective 435 outflow making use of data from geostationary and polar orbiting satellites (Sect. 4).

Evaluation of simulated ice cloud properties
In this section, we evaluate the ability of ICON-LEM to simulate the convective outflow and its temporal evolution for the three large-scale summertime convective events over Germany that were introduced in Sect. 2. It should be noted that no perfect agreement is expected in IWC development when comparing individual model grid points 455 against ground-based observations. Nevertheless, the modeled cloud ice development, especially for Lindenberg and Leipzig, reveals a good description of the observed temporal evolution, including the representation of the cirrus layer over Lindenberg between 6:00 and 14:00 UTC.

Comparison to ground based measurements
For the second convective episode on 4-5 July 2015, no validation data are available from most of the Cloudnet stations.
Instead, the simulation is compared with RAMSES measurements at the Lindenberg Meteorological Observatory (Fig. 3).    other shortly after 12 UTC, thus pointing out that the threshold selection does not induce a strong variability in the VIS-NIR retrievals at this stage, maybe due to the still small spatial extension of the convective cell. The modeled total ice amount is biased high even compared to SPARE-ICE retrievals, which are not affected by saturation issues and are generally considered 515 more representative of total as opposed to cloud ice. All observational data sets rather provide IWP values similar to the simulated tqi estimate consisting of small cloud ice particles only. The largest IWP discrepancy between the observations is found during the strong convective phase between 12 UTC and 18 UTC, when the percentage of saturated VIS-NIR retrievals is the highest. As discussed in Sect. 4.9, the maximum reported optical thickness and to a lesser degree the maximum reported ice crystal effective radius vary significantly between the different data sets, resulting in a large scatter in domain-mean IWP 520 when the scene is dominated by deep convective clouds. Also note that the SatCORPS and SPARE-ICE retrievals indicate a faster IWP decay, i.e. cloud thinning, after sunset than simulated by the model, while the modeled and observed cloud fractions agree well. The underestimation of tqi before 12 UTC is consistent with the underestimation of ICC in the morning.

Please note again that MODIS data is always close to the APICS curve or between the APICS and CPP values. SPARE-ICE
IWP is close to the APICS line or between APICS and SatCORPS during day, despite its enhanced sensitivity to larger ice 525 hydrometeors as explained in Sect. 4.9. SatCORPS is almost always larger than the other VIS-NIR retrievals, even in non convective situations (e.g. in the morning hours of 20 June 2013) where different hydrometeors types than cloud ice shouldn't be relevant, thus indicating a slightly different approach to IWP than the other algorithms. During night SPARE-ICE IWP is larger than SatCORPS IWP on this day. In general, CPP seems to retrieve less thick clouds and its increase in IWP after convective initiation at around 11 UTC is also slower.

530
The analysis for 4 July 2015 (Fig. 4b) shows larger differences with regard to ICC. The area coverage of simulated cirrus cloud fields in the morning is strongly underestimated compared to CiPS. This is due to the outer edge of a front consisting mainly of thin cirrus passing over Central Europe that is not captured by the model but is observed by CiPS thanks to its high sensitivity to thin ice clouds. An increase in ICC after 10 UTC (before convective initiation) is noticeable within the ICON-LEM simulation partly compensating the lack of ICC. The start of the convective activity in the ICON-LEM simulations (∼ 13 UTC) 535 and observations (∼ 15 UTC) is roughly the same. But convective triggering in the simulations appears to continue well into the night which could not be supported by satellite observations. ICC is comparable with CiPS after the main convective event and consists of a larger cirrus system connected with the convective outflow. The maximum ICC values are similar for both ICON-LEM and CiPS (approx. 60%), but CiPS reaches its maximum ICC at around 18 UTC while ICC from ICON-LEM steadily increases from 10 to 24 UTC. In a simulation that was run for 2 consecutive days we found that the life time of the anvil was 540 significantly overestimated. The width of the shaded area in ICC implies that approximately 10 % of the total ICC consists of clouds with very low optical depths (around 0.05 to 0.14) introducing also a large uncertainty in the determination of simulated cloud top heights depending on the assumed IWP CiPS−sim thresholds (see Fig. 5). In combination with the development of ICC, the IWP strongly increases after initiation of convection around 14 UTC, but reaches both in simulation and observation lower peak values than on 20 June 2013 (Fig. 4a) and 5 July 2015 (Fig. 4c). On this day (4 July 2015) the tendency of IWP in 545 the observations is very steep and resembles the increase in tqf rather than in tqi. However, at 16 UTC the maximum IWP is reached in the observations and its value agrees very well with the model tqi.
The IWP estimates of SPARE-ICE and the SEVIRI retrievals agree well for 4 July 2015. In the morning almost no cloud ice is simulated, despite the fact that ice clouds (with ICC ≈ 40 %) are apparent indicating that the cirrus field is optically very thin. The comparison between simulated and observed IWP during the convective phase shows similar results as for 20 June 550 2013: considering only cloud ice particles and neglecting snow, graupel and hail, tqi agrees well with satellite estimates. Please notice that in this case the SEVIRI retrievals were almost not affected by saturation, with only a few percent of pixels reaching the maximum optical thickness. Overall, the explosive convection triggered around 14 UTC exhibits a much more complicated synoptic situation to be represented by the model, as will be shown in Sect. 6.1, resulting in a poorer matching of observed and modeled IWP than for the 20 June 2013 case.

555
Satellite estimates are subject to saturation effects (see Sect. 4.9), so that it is advisable to apply an upper threshold to the model results when using them for evaluation. Applying an IWP cut-off threshold of 10,000 g m −2 (upper limit of SPARE-ICE) reduces simulated tqf by approximately 15-20 % (on average or that is peak reduction??) during all three convective events.
Applying a saturation threshold to ICON-LEM tqi leads to negligibly changed estimates. Even when using the lowermost cutoff threshold (representing the saturation limit of MODIS) of 3700 g m −2 the maximum reduction amounts to 0.2 %. Around shows that CiPS underestimates CTHs in the range 11 to 13 km by approx. 1 km on average for the geographical location analyzed in this paper, which is in line with the difference between observation and model. Nevertheless, lower cloud top heights of up to 10 or 11 km are likely underestimated in the simulation. On 4 July 2015, the modelled CTH again peaks at approx.
1 km higher altitudes than in the observations. Furthermore, the distribution of the modelled CTH is skewed towards higher CTH, whereas the distribution of observed CTH is skewed towards lower CTH. Those differences do not merely result from 575 the fact that the early morning cirrus cover was not reproduced by ICON-LEM. Instead we see that additionally low ice clouds are missed by the model later in the day. CiPS indicates that CTHs are lower as one moves away from the convective core, whereas ICON-LEM simulates more homogeneous cloud top heights over the whole cirrus shield (Appendix A2). The modeled cloud top heights, therefore result in a more distinct CTH peak displayed by the histograms. A rather uniform distribution of observed CTHs is apparent for 5 July 2015 which is not reproduced by ICON-LEM. The large probability of high CTHs and 580 the corresponding lower probability of lower CTHs in the simulation may partly be due to the model predicting an excessively long-lived of the outflow cirrus that maintained high CTH. Again, ICON-LEM seems to miss the decrease in cloud top heights at the edges of the convective cloud field. For all days, the maximum simulated CTH agrees well with the observed maximum height of 14 km, which is important in order to capture the effect of the cloud field on longwave radiation.

Statistics of several convective days 585
In order to provide an analysis of ICON-LEM performance over a broader range of convective situations, we have collected eight convective days in the time period 2013-2016 (Table 1). This selection, which also includes the three days evaluated in the previous sections, encompasses different kinds of meteorological conditions, from convection embedded in fronts to scattered convection. For all these days we evaluate statistics of CTH, ICC, and IWP.
The simulated CTH distribution shows good agreement with the observed one (Fig. 6a). As mentioned above, the slight 590 rightward shift of the simulated CTHs to higher values compared to observations is partly explained by the known negative bias of CiPS underestimating unusually high CTHs at mid-latitudes (see Sect. 5.1). The model, however, underestimates the frequency of clouds with CTHs at the lower end of the distribution between 8 and 10 km. This is partly caused by the overestimation of the height of anvil edges, which is present in all convective simulations and is particularly strong in the convective situations on 4-5 July 2015.

595
The interpretation of the ICC and IWP histograms is more difficult, because our ensemble of simulations consists of a few large scale convective events partly connected with frontal systems and a few cases of scattered small scale convection.
Therefore, the convective activity does not always lead to the largest ICC and IWP when averaged over the simulation domain.
The histogram of ICC (Fig. 6b) shows a relatively flat distribution with maxima in observed ICC around 50% and 90% cloud coverage. In the simulations the highest probability is for ICC between 50% and 80%, but a large part of those ice clouds are 600 optically thin. The differences in the observed and simulated ICC histogram may have different causes. They could be related to an underestimation of the convective cell extension, even though the opposite seems to be true for the 4-5 July case, to an underestimation of ice clouds originating from other meteorological systems that remain unresolved in ICON (see Sect. 5.1, discussion of ICC for the morning of 20 June 2013), to spatial shifts of the convective spots that partly evolve outside the ICON domain, or to errors stemming from the initialization.

605
Concerning the IWP histogram (Fig. 6c) While the distribution of ice in the model is generally similar to satellite observations, the distinction between tqi and tqf can be considerably different between model and satellite retrievals, and also between the various retrieval algorithms (Sect. 4.9), due to the different sensor sensitivities and assumptions made on partitioning the total ice into the various ice habits. in particular modified cloud microphysics (Sect. 6.2), and with changing initial and boundary data used to drive the model (Sect. 6.1), giving a measure for the predictability of the synoptic situation.
Note that the sensitivity studies were performed at 625 m resolution with no further nesting in order to save computing time and storage space -as opposed to 150 m resolution for the simulations discussed above. As Stevens et al. (2020) pointed out, the improvement going from 625 m to 150 m is modest, so we expect the results of our sensitivity study to carry over to the higher 630 resolution domain. A comparison of the two control simulations at 625 m and 150 m resolution confirmed this; for example, cloud water path (tqc) and tqi only changed by 1.5% and 6.0%, respectively.

Sensitivity to initial and boundary conditions
For 4-5 July 2015 additional simulations were performed using different initial and lateral boundary conditions. Instead of using initial and boundary data from the COSMO-DE analysis fields (in the following referred to as "default simulation"), data 635 from ICON-NWP (lbc1) and the IFS (lbc2) have been used (see Table 2). The sensitivity simulations using IFS (cycle 41r1) and ICON-NWP data were analyzed regarding the evolution of IWP, ICC and the distribution of CTHs and compared to the default simulation and observations ( Fig. 7 and Fig. 8).
In all three simulations strong convective events are located over northern Germany on 4 July 2015. However, both the timing and the amplitude of the increase in IWP and ICC (Fig. 7a) appear to be very sensitive to the initial and boundary 640 data. Using the ICON-NWP data for initialization, convective activity starts too early and is too vigorous. This appears to be connected with a wet moisture bias in the boundary layer in the ICON-NWP analysis for those days. Connected with exceeding convection, CTHs are overestimated. Using COSMO-DE or IFS data for the initialization and boundary conditions ICON-LEM captures the temporal evolution of the IWP over Germany well. The SatCORPS IWP estimate agrees well with simulated tqi in the default and lbc2 simulations, whereas in the lbc1 simulation tqi is much larger than observed. The decrease of tqi at 645 the end of the day is not captured. The evolution of ICC is slightly less successfully simulated. ICC is underestimated in the morning and decreases only slightly (lbc1 and lbc2) or fails to decrease completely (default) during the night, indicating that the cirrus field connected with the convective outflow remains too large for many hours after the main convective event. In 20 https://doi.org/10.5194/acp-2020-635 Preprint. Discussion started: 14 July 2020 c Author(s) 2020. CC BY 4.0 License.
the ICON-LEM default simulation for 4 July 2015, tqi remains constant and ICC continues to increase through the night. As pointed out before, this continued increase in the modeled cirrus shield appears to be caused by the numerous small convective 650 events simulated in the vicinity of the convective cirrus shield throughout the afternoon and night, which are in contrast with the single big convective event observed in the afternoon. CTHs in those two simulations are lower than in the ICON-NWP forced simulation, but the fraction of clouds reaching 13 km is significantly too high when compared to observations (Fig. 8a).
For all three simulations tqf is significantly higher than tqi. The difference is particularly large at the time of convection and several hours afterwards pointing to a large number of larger hydrometeors. Whereas the difference between tqi and tqf strongly 655 decreases at night in the lbc1 and lbc2 simulations, this is not the case for our default simulation indicating a continuing large optical depth of the ICC resulting from the convective event.
The spread in the simulations for 5 July 2015 (Fig. 7a) is slightly smaller than for the previous day. The ICON-NWP initial and boundary data appear to be in better agreement with the COSMO-DE and IFS data. The start of convective activity in the ICON-LEM lbc1 run is too early, which is likely connected with a premature transition of the frontal system in the morning In general, CTH distributions do not vary strongly with initial and boundary data for these two days, except for the overestimation of the CTH on 4 July 2015 when using ICON-NWP data. Furthermore, simulated CTHs underestimate the frequency of lower cirrus clouds on both days (Fig. 8). While the observed distribution of CTH appears wide or even bimodal, the model prefers single-peaked distributions centered on high CTH between 11 and 14 km, capturing little of the lower level cirrus fields 670 that CiPS detects between 8 and 10 km. The absence of lower CTHs is caused by the overestimation of CTH in clouds not directly connected with the convective systems and also by the overestimation of CTH at the edges of the convective cirrus shield (see Appendix A2).

Sensitivity to microphysics
To investigate the representation of cloud microphysical processes as a possible cause of model deficiencies, we have performed Here we discuss these experiments, which all lead to a reduction of IWP; recall that an over-pronounced anvil cloud has previously been identified as a likely model bias. A short description of the experiment setups and their outcome is given in 680 Appendix A3. We concentrate on three experiments in particular. The experiments 'hexPlate', 'dendrite', and 'stickLFOhigh' (Tab. 3 and experiments 3, 4, and 10 in Tab. A1) replace the original mass-size and velocity-size relations for cloud ice by a different particle geometry. The corresponding relations in the control run are for irregular crystals derived from in-situ measurements collected during CRYSTAL-FACE (A. Heymsfield, pers. comm.). These irregular crystals have rather high terminal fall velocity more typical of column-like particles. This has been replaced by a plate-like geometry in experiment 685 'hexPlate' and a dendrite-like geometry in experiment 'dendrite'. Both of these crystal geometries have rather low fall speeds, but they differ in the exponent of the mass-size relation (see Tab. 3), leading to the dendrite-like geometry growing more quickly in maximum dimension than the plate-like crystals. Both experiments result in a significant decrease in cloud ice water path (tqi, Tab. A2) of 18 % and 16 %, respectively. Figure 9 displays the actual time series of the condensate path and the vertical profiles of the in-cloud water content for each water species. This shows clearly that experiments 'hexPlate' and 'dendrite' lead 690 to a decrease of tqi during the day when deep convection develops.
The decrease of tqi corresponds to an increase in the amount of graupel. Note that the graupel category should be interpreted more broadly as partially rimed ice and graupel for the SB scheme. This shift is also reflected in the vertical profiles which clearly show a reduced vertical extent of the cloud ice layer for the 'hexPlate' and 'dendrite' experiments, which is easily explained by the reduced sedimentation velocity. The increase in graupel is most probably caused by the increased collection 695 of cloud ice by graupel due to the increased velocity difference between the two categories and, hence, an increased collection kernel. This behaviour differs from the case of isolated cirrus or anvil clouds for which an increased sedimentation velocity leads to a faster fall out of ice into the drier layers below and, hence, a faster dissipation of the ice cloud and consequently a reduced tqi. For the studied mature mesoscale convective system (MCS) our simulations show the opposite behavior, because of the presence of deep condensate layers with snow and graupel below the cloud ice layer. Unfortunately, the experiments 700 'hexPlate' and 'dendrite' were unable to significantly reduce the areal extent of the anvil clouds and, hence, did not improve the performance of the ICON-LEM model in that regard. In fact, the slower falling cloud ice particles lead to an increase in ICC and CTH, in disagreement with the CiPS satellite retrievals (Fig. 11).
The strongest decrease in the ice water path tqi is shown by the experiment 'stickLFOhigh', featuring a significantly increased sticking efficiency between ice, snow, and graupel. An increase of the sticking efficiency trivially leads to increased collection 705 rates and, hence, to the faster formation of large precipitation-sized particles, which in turn enhances the depletion of cloud ice by faster conversion to graupel. This is clearly visible in the time series and the vertical profiles shown in Fig. 10. The graupel content in mid-levels, however, is actually decreasing for 'stickLFOhigh', which can be explained by the formation of larger and therefore faster falling graupel particles. Compared to the satellite observations of cloud top height and ICC, there is no significant improvement, though. The change in sticking efficiency affects mostly the vertical structure of the MCS and less 710 so its horizontal extent. Overall, the 'stickLFOhigh' (Table 3) simulations produced inconclusive results. We also note that the used sticking efficiencies are rather high in light of more recent laboratory measurements (Connolly et al., 2012). exceeds the simulated cloud ice water path and the observed IWP by a large degree as soon as convection is triggered.

745
Evaluating our ensemble of 8 simulated days regarding CTH, ICC and IWP we find the PDFs of the cloud variables to be reasonably well simulated by ICON-LEM. Whereas CTH is relatively well simulated regarding its variability and its estimate for clouds of convective origin, the evaluation of ICC is challenging since it is very sensitive to the existence of optically very thin ice clouds. The horizontal structure of the CTH of convective anvils appears to be too homogeneous in the simulations and anvil cloud edges are too high (see Appendix A2), which likely hints at deficiencies in the microphysical scheme. ICON 750 simulations exhibit a higher probability of very large tqf values than observations. Since observations vary in their sensitivity to different ice habits and cannot detect all ice, a certain overestimation of tqf in the model relative to observations would be expected. However, the model estimate of tqf is in extreme cases, such as 4-5 July 2015, larger than all observed IWPs by a factor of 3 or 4. Therefore the question arises whether ICON can be said to overestimate tqf.
Current state-of-the-art satellite retrievals provide a rather weak constraint on bulk ice mass in the atmosphere. Satellite 755 retrievals employing different remote sensing methods, e.g. involving active and passive instruments, span a large range of IWP estimates. By using remote sensing data in the microwave spectral region, SPARE-ICE is also sensitive to ice hydrometeors other than cloud ice whereas the VIS-NIR retrievals (SEVIRI and MODIS) are not. The VIS-NIR retrievals alone span quite a broad range of IWP that does not appear to be tied to differences in sensitivity to hydrometeors. Furthermore, when comparing only estimates based on SEVIRI (APICS, SatCORPS and CPP) the spread of retrieved IWP is still significant, up to a factor of 760 2-3 being typical, due to differences in inherent assumptions. While in many situations SPARE-ICE is close to APICS and/or SatCORPS, particularly in convective situations, it often exceeds all other retrievals. Nevertheless, SPARE-ICE is likely to underestimate tqf partly due to the presence of small ice crystals in convective clouds that cannot be reliably accounted for.
The sensitivity of the existing passive and active methods to the different ice habits (small cloud ice versus large precipitating ice) is poorly quantified, complicating the interpretation of the reported IWP values.

765
What emerges from our model-satellite comparison with confidence is that the simulated tqi is within the current, relatively wide, range of satellite estimates. The model tqf, however, is biased high even compared to satellite estimates based on active radar/lidar retrievals (SPARE-ICE), implying an overestimation of elevated graupel.
Evaluating the ICON-LEM simulations in detail against observations in terms of biases in ice clouds and anvil evolution allows us to go one step further and examine the uncertainty of the associated forecasts at hectometer resolution. Given recent 770 work (see introduction) that points to moist processes and initial conditions and large-scale weather as key players in the predictability of convection as well as larger scale weather phenomena we aimed at exploring those sensitivities on cases specifically selected as potentially most unpredictable (high CAPE, yet low large-scale advection).
For the investigation of uncertainty we selected the explosive convective event over Germany of 4-5 July 2015 for which the model struggled to simulate the evolution of convection realistically. Looking at high cloud properties in the three sensitivity 775 experiments with COSMO, ICON and IFS initial and boundary conditions we found impact on convective triggering, strength and to a lesser degree on the life time of the convective outflow. The sensitivity in terms of ICC and IWP is of similar order of magnitude as the diurnal cycle. Note, that the variability is larger for the more locally forced 4 July 2015 and smaller for 5 July 2015 which was embedded in a front, pointing to the importance of convective instability.
Second, we investigated the sensitivity to microphysics as it represents a large part of the non-linearities and uncertainty 780 in the model physics. Given a tendency of over-prediction of cloud ice in ICON-LEM, we selected modifications focused on the hydrometeor geometry aiming to reduce cloud ice. It is striking to note that these substantial physics changes result in a large reduction in cloud ice (up to a factor of 5) and smaller changes to cloud top height, but the critical timing of convection including the diurnal cycle, in contrast, changed little. The considered changes in the microphysical parametrization did not reduce the water path of the other frozen hydrometeors either or shorten the life time of the convective outflow cloud field.

785
In summary, the work we present demonstrates the usability of a O(100 m) resolution model for forecasting studies or parameterization development of convection including anvil evolution and its uncertainty. Given the fact that a major source of non-linearity in cloud-resolving models originates from cloud physics, the surprising result of our case study of 4-5 July 2015 was the relatively small impact of microphysics in the uncertainty of convective development. We therefore recommend future work to focus on a wider set of cases of locally forced continental summer convective days. Another direction of research to 790 strengthen the understanding in the interplay of large-scale forcing and local physics in the uncertainty of the prediction of continental convection would be to investigate other parts of the description of clouds in models relating to the liquid phase and including lateral mixing in convective cores at sub-grid scales. The current work highlights the existing limits in using observations to evaluate high ice clouds from O(100 m) forecast models, which originate from both data and algorithms. The arrival of the new spaceborne radar/lidar system EarthCare in 2022 will provide a driving force in both aspects. This will be 795 followed by the Ice Cloud Imager (ICI) in 2023 on EUMETSAT's second generation polar system, giving significantly tighter observational constraints by exploiting sub-millimeter wavelengths and promising a much reduced (50%) uncertainty in IWP retrievals (Eriksson et al., 2020).
RAMSES is a spectrometric water Raman lidar which allows to measure water in all of its three phases. However, because of the extremely weak inelastic scattering by clouds, the condensed phases can only be obtained directly under favorable conditions.
To widen the range of applicability, the RAMSES data set of cloud water content (CWC) measurements was searched for a proxy variable that would be easier to measure than CWC directly but would still provide reasonable estimates of CWC at all times. It was found that in the case of cirrus clouds the cross-polarized backscatter coefficient (BSCs) serves this purpose, 805 and an analytic expression for deriving IWC profiles and, by extension, IWP from BSCs and atmospheric temperature was developed [Reichardt; manuscript in preparation]. To validate the RAMSES IWC retrieval technique, a comparative study was conducted in which RAMSES IWP was contrasted with IWP results retrieved from satellite-borne radiometers (CiPS, SPARE-ICE). First results have been presented by Strandgren (2018). Generally, good agreement is found when the observed cirrus system can be assumed to be ergodic. As an example, Fig. A1

A2 Underestimation of the probability of low cloud top heights
The analysis in Sect. 5.1.2 shows that the probability of low (below 11 km) CTHs is underestimated in the simulations (see Fig. 5b and c). To elucidate the causes, a snapshot of observed CTHs is compared with the default simulation in Fig. A2. The anvil over northeastern Germany is clearly visible in the evening of 4 July 2015. Whereas the observations show a systematic 820 decrease of convective anvil height towards cloud edges, the simulations lack such spatial gradients in CTH. This model deficiency can be seen on most convective days and is the main reason for the underestimation of low CTHs in the simulations.
The effect is strongest on 4 July 2015, when it might be exacerbated by an increased convective activity continuing into the night in the ICON-LEM simulation. Furthermore, the band of low ice clouds in the northwest of the domain (Fig. A2) is not captured by the model, which adds to the relative lack of simulated low CTHs.

A3 Additional microphysical sensitivity simulations
The results of all microphysical experiments (Tab. A1 are summarized in condensed form in Tab. A2. Here we highlight only the values of the domain-and time-averaged liquid resp. ice water path for cloud water (tqc), cloud ice (tqi), snow (tqs), graupel (tqg) and hail (tqh). Such simple statistics do nevertheless provide some insights. For example, the narrow ice particle size distribution leads to a slower ice sedimentation and, hence, a higher cloud ice water path (29 % increase compared to the 830 control). The increased number of CCN leads to smaller cloud droplets, a suppression of warm rain formation and an increased lofting of water mass above the freezing level. Hence, cloud water is increased, rain water decreased, and cloud ice shows a strong increase of 46 % resp. 99 %. Interestingly, the precipitating ice categories of graupel and hail also show a significant reduction for increased CCN in these simulations. For a more detailed investigation and discussion of the impact of CCN in large-domain large-eddy simulations over Germany we refer to Costa-Surós et al. (2020). Compared to the other experiments, 835 the assumptions regarding ice nuclei (IN) of experiments 12 to 14 have only a moderate impact on the simulation results, but the present-day aerosols (PDA) scheme leads to a significant increase in snow, graupel and hail, most notably in experiment 15, which assumes a significant contribution from organic IN. In the main text we focus on those microphysical experiments that lead to a decrease in cloud ice amount, which are experiments 3 and 4 with a modification of the cloud ice geometry, and experiment 10 with the increased sticking efficiency.
Data availability. Access to observational and model data sets used within this publication are provided under zenodo archive (Rybka, 2020) Author contributions. UB, MK and HR created the conceptual design of this study. MK and LB selected the cases for suitability. Sensitivity