Employing airborne radiation and cloud microphysics observations to improve cloud representation in ICON at kilometer-scale resolution in the Arctic

. Clouds play a potentially important role in Arctic climate change but are poorly represented in current atmospheric models across scales. To improve the representation of Arctic clouds in models, it is necessary to compare models to observations to consequently reduce this uncertainty. This study compares aircraft observations from the Arctic CLoud Observations Using airborne measurements during polar Day (ACLOUD) campaign around Svalbard, Norway, in May– June 2017 and simulations using the ICON (ICOsahedral Non-hydrostatic) model in its numerical weather prediction (NWP) setup at 1.2 km horizontal resolution. By comparing measurements of solar and terrestrial irradiances during ACLOUD ﬂights to the respective properties in ICON, we showed that the model systematically overestimates the transmissivity of the mostly liquid clouds during the campaign. This model bias is traced back to the way cloud condensation nuclei (CCN) get activated into cloud droplets in the two-moment bulk microphysical scheme used in this study. This process is parameterized as a function of grid-scale vertical velocity in the microphysical scheme used, but in-cloud turbulence cannot be sufﬁciently resolved at 1.2 km horizontal resolution in Arctic clouds. By parameterizing subgrid-scale vertical motion as a function of turbulent kinetic energy, we are able to achieve a more realistic CCN activation into cloud droplets. Additionally, we showed that by scaling the presently used CCN activation proﬁle, the hydrometeor number concentration could be modiﬁed to be in better agreement with ACLOUD observations in our revised CCN activation parameterization. This consequently results in an improved representation of cloud optical properties in our ICON simulations.


Introduction
In recent decades, the Arctic has proven to be especially susceptible to global climate change (Screen and Simmonds, 2010), as several positive feedback mechanisms strengthen the warming in high latitudes of the Northern Hemisphere (Serreze and Barry, 2011;. Among those feedback mechanisms that influence the Arctic climate, the cloud feedback -even though being small in magnitude compared to other feedback mechanisms like the surface albedo or temperature feedbacks -exhibits a relatively large uncertainty (Pithan and Mauritsen, 2014;Block et al., 2020). This uncertainty can be related to the general complexity of the Arctic climate system and to misrepresented microphysical processes in global climate models (GCMs) that are used to quantify the cloud feedback. Typical issues associated with the simulation of clouds in the Arctic are incorrectly simulated amount and distribution of clouds (English et al., 2015;Boeke and Taylor, 2016), which often can be linked to an erroneous representation of mixed-phase clouds (Cesana et al., 2012;Pithan et al., 2014;Kretzschmar et al., 2019). This consequently affects the quantification of the effect of Arctic clouds on the (surface) energy budget in GCMs (Karlsson and Svensson, 2013).
To identify processes within the microphysical parameterization that are misrepresented in models, it is inevitable to compare them to appropriate observations (Lohmann et al., 2007). As pointed out by Kay et al. (2016), any comparison between modeled and observed quantities can easily be misleading if it is not scale and definition aware. For GCMs, observations from satellite remote sensing are well suited, being on similar scales as those large-scale models. A comparison to satellite-derived quantities can further be made definition aware by using instrument simulators like those provided within the Cloud Feedback Model Intercomparison Project's (CFMIP) Observation Simulator Package (COSP; Bodas-Salcedo et al., 2011). The benefit of using COSP for evaluating clouds in GCMs in the Arctic has been shown in several studies (Barton et al., 2012;Kay et al., 2016;Kretzschmar et al., 2019).
Even though satellite observations provide valuable information on the atmospheric state in the Arctic, they often suffer from instrument-dependent idiosyncrasies like ground clutter for a spaceborne cloud radar or attenuation of the beam of a spaceborne lidar by optically thick clouds (Cesana et al., 2012). Those problems can be, in part, overcome by using ground-based or aircraft observations. Due to much smaller temporal and spatial scales, those observations only have limited suitability for the evaluation of large-scale models. To this end, the use of storm-resolving models with grid sizes on the order of kilometers or large eddy models is necessary, as they are able to better capture features and variability present in those rather smaller-scale observations . Due to the relatively large computational effort that is needed for large eddy simulations, they are limited in spatial extent and are often used for comparison with groundbased observations at individual locations in the Arctic (e.g., Loewe et al., 2017;Sotiropoulou et al., 2018;Schemann and Ebell, 2020). Furthermore, large eddy simulations have been used to study and evaluate microphysical processes (e.g., Fridlind et al., 2007;Ovchinnikov et al., 2014;Solomon et al., 2015), as well as aerosol-cloud interactions (e.g., Possner et al., 2017;Solomon et al., 2018;Eirund et al., 2019) in the Arctic. To avoid the need for large computational resources but still be able to resolve many processes that act on scales that cannot be captured by GCMs, limitedarea simulations with grid sizes on the order of a few kilometers, where (deep) convection does not need to be explicitly parameterized, can offer a good compromise. Simulations at such resolutions on relatively large domains have received increased interest in recent years .
This study makes use of such a setup using the ICOsahedral Non-hydrostatic (ICON) model  at kilometer-scale horizontal resolution. Studies, mainly focusing on the tropical Atlantic, have reported that the model at storm-resolving resolutions is able to simulate the basic structure of clouds and precipitation in that region Stevens et al., 2020). In the present study, ICON is used in a similar setup and is compared to observations that have been derived from the Arctic CLoud Observations Using airborne measurements during polar Day (ACLOUD) campaign around Svalbard, Norway,  and to observations derived during the Physical feedbacks of Arctic planetary boundary layer, Sea ice, Cloud and AerosoL (at P2L58f and P2L76f) (PASCAL; Flores and Macke, 2018) shipborne observational campaign in the sea-ice-covered ocean north of Svalbard in May and June 2017. This study mainly compares observations of solar and terrestrial irradiances during ACLOUD flights to our ICON simulations to obtain a first estimate of whether the model is able to correctly simulate general cloud optical properties. Based on the results of this comparison, it is further explored to what extent cloud macro-and microphysical properties might be misrepresented in this setup and how to improve the simulation of clouds in ICON at the kilometer scale.  Flores and Macke, 2018) shipborne observational study. The airborne measurements during ACLOUD were conducted with the two research aircraft Polar 5 and Polar 6 (Wesche et al., 2016) that were based in Longyearbyen (LYR), Norway. While Polar 5 focused on remote-sensing observations of mainly low-level clouds and surface properties from higher altitudes (2-4 km), Polar 6 concentrated on in situ observations of cloud microphysical and aerosol properties in and below the clouds. Ground-based observations from the ship and an ice floe in the sea-ice-covered ocean north of Svalbard were performed during PASCAL using the German research vessel (R/V) Polarstern (Alfred-Wegener-Institut Helmholtz-Zentrum für Polar-und Meeresforschung, 2017). Additionally, a tethered balloon was operated on an ice floe camp during PASCAL .
The synoptic development during both campaigns is separated into three phases (Knudsen et al., 2018). A period with advection of cold and dry air from the north in the beginning (23-29 May 2017) was followed by a warm and moist air intrusion into the region where the two campaigns took place (30 May-12 June 2017). During the final 2 weeks of the campaigns (13-26 June 2017), a mixture of warm and cold air masses prevailed. Especially during the last two phases, clouds in the domain close to Polarstern, where the bulk of the measurements took place, mainly consisted of (supercooled) liquid clouds with only a small amount of cloud ice being present .
In the following, a brief description of the instrumentation and data used in this study is given (for a comprehensive overview, we refer the reader to . Two pairs of upward-and downwardlooking CMP22 pyranometers for the solar spectral range (0.2-3.6 µm) and CGR4 pyrgeometers for major parts of the terrestrial spectral range (4.5-42 µm) were installed on board Polar 5 and Polar 6 to measure the upward and downward broadband (solar and terrestrial) irradiances on both aircraft . We also utilize microphysical data that have been derived from in situ measurements on Polar 6. We use data of the particle size number distribution obtained from the Small Ice Detector mark 3 (SID-3) , covering a size range of 5-45 µm divided into 16 size bins (2-5 µm resolution). For more information on the SID-3 and processing of the measurements, the reader is referred to Schnaiter et al. (2016) and. For comparison of the bulk liquid water content, we exploit data from a Nevzorov probe (Korolev et al., 1998) that was installed on Polar 6 (Chechin, 2019). Furthermore, we use observations of cloud base height as observed by the laser ceilometer and cloud-top height derived from a 35 GHz cloud radar  on board R/V Polarstern to derive geometrical cloud depth in the sea-ice-covered ocean north of Svalbard.

ICON simulations
In this study, data measured during ACLOUD and PAS-CAL are compared to the output of the ICOsahedral Nonhydrostatic model (ICON; Zängl et al., 2015). ICON is a unified modeling system that allows for simulations on several spatial and temporal scales, spanning from simulation of the global climate on the one end (Giorgetta et al., 2018) to high-resolution large eddy simulations (LESs) on the other (Dipankar et al., 2015;Heinze et al., 2017). ICON is also employed as a numerical weather prediction (NWP) model at the German Meteorological Service (Deutscher Wetterdienst, DWD). For each application (GCM, NWP, LES), a dedicated package of physical parameterizations is provided to satisfy the specific needs for each setup. For our simulations, the applied set of physical parameterizations is similar to that used in Klocke et al. (2017). However, we use the two-moment bulk microphysical scheme developed by Seifert and Beheng (2006) instead of the single-moment scheme by Baldauf et al. (2011) used in Klocke et al. (2017. Furthermore, we apply an all-or-nothing cloud-cover scheme that allows for gridscale clouds only as this facilitates the comparison with the observations. At the resolutions used in this study, an all-ornothing cloud-cover scheme might miss some clouds as the necessary saturation humidity might not be reached. A comparison to simulations with a fractional cloud-cover scheme showed only little differences compared to the all-or-nothing cloud-cover scheme used, which made us confident that resolving clouds at the grid scale only is sufficient for our setup. The Rapid Radiation Transfer Model (RRTM; Mlawer et al., 1997) is applied to derive the radiative fluxes. Due to the rather fine horizontal resolution of our simulations, we only parameterized shallow convection using the Tiedtke (1989) shallow convection parameterization with modifications by Bechtold et al. (2008), whereas deep convection is considered resolved (albeit not relevant for the Arctic case considered here). In the following, the used setup will be simply denoted as ICON. However, findings in this study are specific to our chosen setup (spatial scale and parameterizations used) and should not be seen as generally representative of ICON.
We deploy ICON in a limited-area setup with one local refinement (nest) in the region where the research flights and ship observations were performed (Fig. 1). The outer domain has a horizontal resolution of approximately 2.4 km (R2B10 in the triangular refinement), while the inner nest has a refined resolution (R2B11) of approximately 1.2 km. For both domains, we use 75 vertical levels spanning from the surface to 30 km altitude with a vertical resolution of 20 m at the lowest model level that gradually gets coarser towards model top. We initialize the model using the analysis of the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecasting System (IFS). The respective IFS forecast is used as boundary data to which we nudge our model every 3 h. We do not continuously run the model for the whole period of the campaign but reinitialize the model from the 12:00 UTC analysis of the previous day in the case of a subsequent day with flight activities. This gives the model a spin-up time of more than 12 h even for takeoffs in the early morning.
During the initial comparison of ICON and the ACLOUD observations, we found that the albedo of sea ice in the model is substantially lower compared to values observed during ACLOUD . The reason for this underestimation of the surface albedo in ICON is caused by how our simulations are initialized using the IFS analysis. As the IFS sea ice albedo is not used during the initialization of ICON, the parameterization of the sea ice albedo performs a cold start. For such a cold start, the sea ice albedo is a function of the sea ice surface temperature only, as given by Mironov et al. (2012) (their Eq. 5). This formulation was slightly adapted in ICON by setting the maximum sea ice albedo (α max ) to 0.70 and the minimum sea ice albedo (α min ) to 0.48. For surface temperatures close the freezing point (as has been observed during ACLOUD, especially in the second half of the campaign), such a cold start results in albedo values that are considerably lower compared to the observations. This underestimation of the sea ice albedo could be avoided by increasing the spin-up of the model to a few weeks or by using DWD ICON analysis instead of the IFS analysis. In the latter case, the albedo is initialized from the initial data and no spin-up is required . As one of the main aims of this study is the comparison of irradiances, an accurate representation of surface albedo is crucial; therefore, we chose to take yet another approach. Due to the fact that the simulated period falls on the onset of the melting period, the sea ice albedo significantly reduces in that period. To accurately represent this reduction in sea ice albedo, we prescribe the sea ice albedo as a function of time to be consistent with the observed sea ice albedo. For this purpose, from the observations, only scenes with homogeneous sea ice are selected using a fish-eye-camera-derived sea ice concentration threshold of 95 %. This approach by construction results in a SD of as little as 0.024 between daily modeled and observed albedo. In the case of fractional sea ice cover in the model, the surface albedo is a surface fraction-weighted average between the prescribed value and the albedo of open water (taken as 0.07).
For the comparison of our ICON simulations to the ACLOUD data, we temporally and spatially colocate the model output to be consistent with the actual position and altitude of the aircraft. We use a multidimensional binary search tree (also known as k-d tree; Bentley, 1975) to sample the model output along the flight track in space and time directly on its native unstructured, triangular grid. The temporal frequency of the observational data is 1 Hz. Additionally, we averaged the (sampled) datapoints from the observations and the simulations into 20 s intervals. This ensures that the observational data are on a similar spatial scale as the simulation on the 1.2 km grid of the inner domain (considering an average velocity of the aircraft of 60 m s −1 ). Due to storage constraints, we chose to output the model state only every 30 min, which reduces temporal variability in the model output. As the planes are not static and "fly" through the model grid, temporal variability is, to some extent, replaced by spatial variability when sampling a large enough area along the flight track. Additionally, the 30 min output frequency introduces inconsistencies in the top-of-atmosphere incoming solar irradiance, as the solar zenith angle is constant in the model output, while it varies with time in the observations. This implies that the largest temporal difference between an observational datapoint and the output time step of ICON is ± 15 min, causing a bias of up to ± 14 W m −2 for incoming solar irradiation at the top of the atmosphere in the early morning and late evening when the temporal derivative of incoming solar radiation is the largest. As most flights took place during noon and we mostly focus on cloudy conditions, we expect this bias to be on the order of a few watts per square meter at most, giving us confidence that this issue will not significantly influence the overall findings in this study. Even though being on similar scales, spatial and temporal variability in both datasets prohibit a one-to-one comparison. We will, therefore, use histograms in the comparison.
3 Surface radiative quantities as simulated with ICON and measured during ACLOUD In the following, the simulations are compared to data for several surface radiative variables that have been observed during low-level flight sections. Some flights were excluded due to relatively short flight times to save computational resources. Additionally, some flights with cloudless conditions towards the end of the campaign were not analyzed as the main focus of this study is a comparison of cloud properties. An overview of the flights used for the comparison is given in Table 1. In the observations and in the model, we define low-level flight sections such that no cloud is present below the present altitude of the aircraft.
3.1 Spatial structure of the radiative field of the Arctic atmospheric boundary layer In the Arctic, two distinct radiative states have been reported: a radiatively clear state with no (or only radiatively thin) clouds and a cloudy state with opaque clouds (Shupe and Intrieri, 2004;Stramler et al., 2011). This two-state structure was also observed during ACLOUD, but compared to spatially fixed observations with almost constant surface albedo, observations during ACLOUD were further decomposed into a cloudy and cloudless state over sea ice and open ocean, which consequently results in a four-state structure (Wendisch et al., 2019). As in Wendisch et al. (2019), we compiled two-dimensional histograms of surface albedo and surface net terrestrial and net solar irradiances, defined as the difference between downward and upward radiative energy flux densities, for the ACLOUD observations and the ICON simulations (Fig. 2). The general difference to Wendisch et al. (2019) (their Fig. 14) is explained by the prescribed surface albedo approach applied in this study, which results in higher sea ice albedo values compared to the previously used model setup.
In general, the structure of the modeled net terrestrial irradiance (F net,terr ) close to the surface ( Fig. 2a and b) is in agreement with the observed one. Only for surface albedo  values between 0.6 and 0.7 will noticeable differences between the ACLOUD observations and the ICON simulations become obvious. Those albedo values are related to days towards the end of the campaign (mid to late June 2017) when the melting season had begun and sea ice albedo was reduced. For this period, the model overestimates the presence of cloudy conditions, whereas cloudless conditions were present in the ACLOUD observations. Conversely, for situations with sea ice albedo greater than 0.7, ICON overestimates the presence of cloudless conditions. The lack of cloudless conditions for surface albedo values between 0.6 and 0.7 in the ICON simulations is also visible from the histograms of surface albedo and net solar irradiance ( Fig. 2c and d). For surface albedo larger than 0.7, the net solar irradiance (F net,sol ) close to the surface seems, on average, in agreement with the observations, even though the observed variability in surface albedo is not simulated by the model. The reported discrepancies can be influenced by the input used to force our limited-area simulations. This can be seen in the underestimation of the albedo of sea-ice-covered surface despite the prescribed surface albedo in the model that is in accordance with the observed sea ice albedo. This bias is, therefore, related to differences in sea ice fraction in the model and in the observations and indicates that the sea ice fraction in the ECMWF input data is too small.

Surface net irradiances and cloud radiative effect over sea ice and below clouds
This section explores the effect of clouds on the surface radiative budget in the ACLOUD observations and in our ICON simulations over sea ice. For that purpose, we, at first, look at net surface irradiance, which we further split into its solar and terrestrial components. To ensure comparability, despite obvious differences between the ICON simulations and ACLOUD observations described in Sect. 3.1, we will restrict our comparison to situations where the model and the observations are within the same cluster of the twodimensional histograms of surface albedo and surface net terrestrial irradiance at the same time. To distinguish between those clusters, a situation is defined as cloudy if the net terrestrial irradiance at the surface is larger than −50 W m 2 . Furthermore, a surface is classified as sea ice covered if the surface albedo is larger than 0.7 but less than 0.85, which is equivalent to the daily averaged maximum albedo value used in our adapted albedo parameterization. As we are interested in cloud (radiative) properties over sea-ice-covered surface, we will focus our evaluation on those situations. Furthermore, this cluster is appealing as most low-level flight sections were performed under these conditions. In Fig. 3, we compare observed and simulated net nearsurface irradiances using histograms. From Fig. 3a, it becomes obvious that the model systematically overestimates net surface irradiances below clouds and over sea ice. This variable also shows a quite strong variability for both the model and the observations, which is related to varying sea ice albedo during the campaign. Additionally, the incoming solar radiation varied between research flights as they took place at different times of the day, which also introduces further variability. Looking at median values of the spectral components, we find that differences between simulated and observed net surface irradiances are mainly mediated by its solar component, while the median of net terrestrial surface irradiances are well simulated by ICON; also the shapes of their histograms match better. Besides the above reported underestimated surface albedo for sea-ice-covered surface in ICON, misrepresented cloud optical properties can also contribute to the positive bias in net solar irradiances at the surface.
Furthermore, we investigate the surface cloud radiative effect (CRE) during ACLOUD, which is defined as the difference between net surface irradiance for cloudy and cloudless conditions. In the model, cloudy and cloudless irradiances can easily be derived by a double call to the radiation routines: one with clouds and one without clouds, leaving all variables not related to clouds constant. For observations, it is impossible to simultaneously observe both cloudy and cloudless conditions. Therefore, irradiances of cloudless conditions were obtained from dedicated radiative transfer simulations that used observations of atmospheric (i.e., temperature and humidity profiles) and surface properties (albedo). The one-dimensional plane-parallel DIScrete Ordinate Radiative Transfer solver DISORT (Stamnes et al., 1988) included in the libRadtran package (Emde et al., 2016) was applied for this purpose. The molecular absorption parameterizations from Kato et al. (1999) for the solar spectral range (0.28-4 µm) and from Gasteiger et al. (2014) for the terrestrial wavelength range (4-100 µm) were chosen. For calculating the observation-based CRE, the observed all-sky albedo was used, which is also used to create the prescribed functional dependency of the sea ice albedo that has been applied in the ICON model. Potential inconsistencies regarding The overwhelming majority of the observed and modeled total (solar plus terrestrial) surface CRE values are positive over sea ice, which indicates that clouds have a warming effect on the surface (Fig. 4a). This is consistent with the relatively high surface albedo values at the onset of the melting period during ACLOUD , which decreases the cooling effect of clouds in the solar spectral range. Similar to the net surface irradiance, ICON overestimates the total surface CRE (Fig. 4a), which is mainly caused by less cooling due to solar CRE (Fig. 4b), while the modeled terrestrial CRE again matches the observed surface terrestrial CRE (Fig. 4c). The way that the surface solar CRE is defined allows us to narrow down which effect is the main cause for the overestimated net solar surface irradiances. If clouds were perfectly simulated by the model, the negatively biased surface albedo would cause a too strongly negative surface solar CRE. As this is not the case for ICON, it is inferred that the main reason for the overestimated net solar surface irradiances is related to overestimated transmissivity of the cloud layer, which is defined as the ratio of downward transmitted solar irradiance at cloud base to downward incident solar irradiance at cloud top. Therefore, underestimated cooling effects in the solar spectral range are most likely related to incorrect simulations of microphysical or macrophysical properties of Arctic clouds in ICON. Therefore, in the following section, we compare those properties as they were simulated (ICON) and measured (ACLOUD) in more detail.

Comparison of macro-and microphysical cloud properties in ICON to ACLOUD observations
Transmissivity T of a cloud layer is directly related to its optical thickness τ c : where τ c is defined as the volumetric cloud particle extinction coefficient β ext , vertically integrated from cloud base z base to cloud top z top : During ACLOUD and PASCAL, clouds were mostly in the liquid water phase with only a small amount of ice present, which allows us to express the extinction coefficient as a function of liquid water content q c and cloud droplet number concentration N d (Grosvenor et al., 2018): Equations (3) and (2) show that τ c depends on geometrical depth (z top − z base ), as well as on q c and N d . In this study, we will denote the geometrical depth as a cloud macrophysical property and denote q c and N d as cloud microphysical properties. Nevertheless, we are aware that liquid water content, especially in a model that employs a saturation adjustment, cannot be considered to be solely a microphysical property as it strongly depends on the thermodynamical state of the atmosphere, thus making it a macrophysical variable that is adjusted by microphysical processes.
To identify potential sources explaining the modelmeasurement differences discussed in the previous section, we compare geometrical cloud thickness and microphysical properties of clouds in ICON to observations collected during ACLOUD and PASCAL. We decided to focus on the period from 2 to 5 June 2017, when flights were possible on 3 out of 4 days. Here, only a brief summary of the meteorological conditions during that period is given. For a comprehensive overview of this period, we refer the reader to Knudsen et al. (2018) and . During this period, a southerly to easterly inflow of warm and moist air into the region where research flights took place was observed. Average near-surface temperatures and integrated water vapor at R/V Polarstern during that period were −3 • C and 6 kg m −2 , respectively. A relatively shallow inversioncapped atmospheric boundary layer (Knudsen et al., 2018) with cloud-top heights of less than 500 m in the vicinity of R/V Polarstern was observed. During those 4 d, the low-level cloud field was relatively homogeneous and mostly stratiform, with almost no high clouds being present in the domain where the research flights took place. Mostly liquid water and mixed-phase clouds were observed during this period (Wendisch et al., 2019). The relatively stable meteorological conditions during this period facilitated the statistical aggregation of the measurements on all the research flights that took place during that period, which was not as straightforward for other parts of the campaign. Especially during mid June 2017, broken multilayer clouds were present, which made a consistent comparison between the model and the observations harder to achieve. This can be seen in the limited amount of simultaneously cloudy and sea-ice-covered scenes in the period from 16 to 18 June (see Table 1). Additionally, in situ observations of cloud microphysical properties were performed on all flight days during that period. Another important point on why this period was chosen is the fact that R/V Polarstern was within the sea-ice-covered region and provided another source of observations that we can use for the comparison with our ICON simulations.

Geometrical cloud depth
We compare geometrical cloud depth as simulated by ICON to that observed during PASCAL. We choose PASCAL cloud radar and ceilometer observations instead of ACLOUD observations as they provide a continuous dataset in time, which facilitates the comparison of geometrical cloud depth. To better compare the simulations to ground-based observations, we use ICON's meteogram output. It provides profiles of model variables at a certain location at every model time step compared to the 30 min output frequency when outputting the whole model domain. For each day simulated, we chose to output the profiles at Polarstern's 12:00 UTC location. While its position was rather constant from 3 June onward (Wendisch et al., 2019, their Fig. 2), the ship was still in transit to the ice floe on 2 June. This might introduce some inconsistencies in the comparison to the spatially fixed ICON profiles. As the ship was already relatively far into the marginal sea ice zone, the cloud field should be homogeneous and representative of sea ice covered conditions.
For the model output, a layer within a profile is considered cloud covered if the total cloud condensate (liquid and ice) is larger than a threshold of 0.05 g m −3 . We only assess clouds close to the surface, namely, from the ground to 2 km altitude. In this altitude range, we define cloud base (top) as the lowest (highest) model level a cloud is being simulated within a profile. To derive the observed geometrical cloud depth, we use cloud base height as observed by the laser ceilometer on board R/V Polarstern, while cloud-top height was derived by using the 35 GHz cloud radar . Both modeled and observed cloud depths have been temporally interpolated to be on identical time steps. We acknowledge that such a comparison of geometrical cloud thickness is not a definition-aware comparison as it depends on instrument sensitivities and on the chosen threshold of total cloud condensate for diagnosing clouds in the model. Additionally, the rather simple approach is not able to correctly diagnose cloud depth for multilayer clouds, but as stated above, mostly single-layer clouds were observed and simulated during the period of interest.
The difference in geometrical cloud depth simulated by ICON and as observed from R/V Polarstern during the period from 2 to 5 June is shown in Fig. 5. In general, the geometrical cloud depth is slightly negatively biased in our ICON simulations with a mean bias of 65 m and a SD of 110 m. In offline radiative transfer simulations, we explored the effect of this bias in cloud geometrical thickness on the solar component of the surface CRE (see the Supplement). For that, we used profiles of liquid water that have been observed during the period from 2 to 5 June and interpolated those profiles in the vertical. For all those profiles, a bias of 65 m in cloud vertical extent led to a change in solar CRE of approximately 5 W m −2 , which is not sufficient to explain the reported model bias of more than 20 W m −2 . Therefore, we will now focus on how cloud microphysical properties are represented in ICON compared to the observations and to what extent they contribute to the ascertained biases in cloud optical properties. Figure 6. Spatiotemporal average particle number size distribution (a) and relative frequency of total particle number in the diameter range from 5 to 40 µm (b), as well as liquid water content (c). All data are averaged over the flights from 2 to 5 June over sea-ice-covered region. Filtering for sea-ice-covered ACLOUD flight sections is done using simulated albedo from ICON.

Cloud microphysical properties
To investigate how cloud microphysical properties contribute to the underestimated cloud optical thickness in ICON, we make use of the suite of in situ instruments that were part of the instrumentation of Polar 6 (Ehrlich et al., 2019). From 2 to 5 June, research flights with Polar 6 were performed on 3 out of 4 days (no flight on 3 June). We focus on particle size distribution of hydrometeors and the respective moments, which have been observed by the Small Ice Detector mark 3 (SID-3), covering a size range of cloud droplets or ice crystals from 5 to 40 µm. As particle size distributions derived from SID-3 agree well with those from other sensors (such as the cloud droplet probe, CDP) for days when both probes were available (Ehrlich et al., 2019), we are confident that particle size distributions from the SID-3 are best suited for our comparison. In the following, we compare simulated and observed particle size distributions as well as the total particle number concentration (N d ), mainly consisting of droplets in the size range presented in Fig. 6. Furthermore, the liquid water content (q c ) is shown. To be comparable to the particle size distribution from the SID-3, we integrate the size distribution of the two-moment microphysical scheme implemented in ICON within the size bins of the SID-3 for cloud droplets and ice crystals and add them. Due to relatively warm temperatures in the region of the research flights in early June 2017, only a small amount of ice was present in clouds during that period. While we derive the particle number concentration directly from particle size distribution by integrating over the size bins of the SID-3, we use measurements from the Nevzorov probe on Polar 6 to obtain information on q c . Figure 6 shows particle number size distributions and the particle number concentration and liquid water content (q c ) for the period from 2 to 5 June. Looking at the particle size distributions, we find that ICON underestimates the amount for hydrometeors smaller than 25 µm, while it overestimates the amount of cloud particles larger than that threshold in comparison to the measurements. As the number concentration of hydrometeors is mainly influenced by the number of small particles, the total amount of hydrometeors is also underestimated in the model. Averaged over all bins, q c is underestimated by ICON relative to q c derived by the Nevzorov probe, as the model overestimates the frequency of occurrence for relatively small q c values.

Representation of cloud microphysical parameters in ICON
According to Eq. (3), the underestimated hydrometeor number concentration and q c can both lead to lower cloud optical thickness in ICON. As not all microphysical schemes in ICON do provide number concentration of cloud droplets and ice crystals, the calculation of cloud optical properties is simplified in the radiation scheme. As an input for the radiation routines for liquid water clouds in ICON, a constant profile of N d , which decreases exponentially with altitude, and q c is used for the calculation of optical properties of liquid clouds. For open water or sea ice, the assumed surface N d within the radiation scheme is 80 cm −3 , which is close to the observed cloud hydrometeor number concentrations (Fig. 6). Nevertheless, this value is slightly lower than the observed mean of 85 cm −3 for the three flight days from 2 to 5 June. Assuming that the model is able to correctly simulate q c , this underestimation would imply lower cloud optical thickness, which would further contribute to the overestimated amount of downward solar irradiance that reaches the surface. Calculation of optical properties of ice clouds is even further simplified as they depend solely on the ice water content. To evaluate the effect of cloud ice on radiative properties in the model, we performed a sensitivity analysis in which we turned off any radiative effect of cloud ice. This analysis revealed only a minor impact of cloud ice on radiation properties like surface CRE and net irradiance at the surface, which were both on the order of 1 W m −2 compared to the basic setup. This low impact is due to the already low cloud ice fraction in the model, which causes the radiative effect of cloud ice to be low. Due to the limitations of the observational dataset with a small amount of cloud ice being observed, it is hard to constrain the model from the observational side. Therefore, any estimation of the impact of cloud ice on the radiative balance has to be interpreted with some caution. Additionally, q c in the model is underestimated compared to the observations, which also contributes to the bias in cloud optical thickness in ICON. We attribute the lower q c to an underestimated number concentration of relatively small cloud droplets (diameters < 25 µm), which are commonly observed for this region and season . The model also overestimates the number of hydrometeors with diameters larger than 25 µm. Thus, too few cloud droplets are generated; therefore, condensational growth and coalescence of the available cloud droplets shifts the size distribution towards larger droplets. Looking at the phase state of precipitation reaching the surface in the region around R/V Polarstern (81-85 • N and 5-15 • E), where most of the research flights from 2 to 5 June took place, we find that rain rate at the surface (8.57 g m −2 h −1 ) is almost an order of magnitude larger than that of snow (2.95 g m −2 h −1 ). As temperatures in the atmospheric boundary layer over sea ice were mostly below freezing during the 3 d analyzed, this rain must stem from "warm" rain processes, indicating a relatively active autoconversion process in our setup. Therefore, autoconversion further contributes to the underestimated q c by ICON as it acts as a sink for cloud liquid water.
Interestingly, the here reported systematic underestimation of hydrometeors is different from the findings by Schemann and Ebell (2020). They conducted simulations for the Ny-Ålesund research station using the ICON model in the large eddy setup (ICON-LEM) and compare ground-based cloud radar observations with their ICON-LEM simulations by applying a radar forward operator. Besides a different scheme for turbulent transport and activated parameterization of shallow convection in our setup, as well as corresponding initial and boundary conditions from DWD's operational ICON forecast (instead of ECMWF forecast), the basic setup is similar to our simulations. Comparing radar reflectivities using contoured frequency by altitude diagrams in mid June 2017 (see Fig. 6 in Schemann and Ebell, 2020), they found that for their 75 m domain, the model strongly overestimates the frequency of occurrence for low radar reflectivities and small hydrometeors. They argue that this finding can be related to the way cloud condensation nuclei (CCN) are activated into cloud droplets in the default Seifert-Beheng two-moment microphysical scheme. This was confirmed by ICON-LEM simulations in an Arctic domain by Mech et al. (2020), who implemented different CCN activation scheme (Phillips et al., 2008) within the Seifert-Beheng two-moment microphysics.

Revised activation of CCN in ICON
In the following, we will focus on the issue of the nonmatching particle number size distribution compared to ACLOUD observations and how it affects total droplet number and q c of clouds in our simulations. As has been pointed out by Schemann and Ebell (2020), this process might presently be misrepresented in the model. In its present implementation in ICON, the activation of CCN is parameterized as a function of grid-scale vertical velocity w and pressure p as described in Hande et al. (2016): (4) where the parameters A(p) to D(p) contain information on the vertical profile of CCN and on the activation of CCN with respect to grid-scale vertical velocity w. The profile presently used in the two-moment microphysical scheme is a temporally and spatially constant profile taken over Germany for a day in April 2013 as in Heinze et al. (2017). This CCN activation profile is not representative of the amount of CCN activation in the Arctic domain, as the CCN concentration in the Arctic is much lower. As stated in Schemann and Ebell (2020), the overestimated frequency of occurrence for low radar reflectivities and small hydrometeors in their simulations can be related to this unsuitable CCN profile.
Despite this unsuited CCN activation profile for an Arctic domain, we find an underestimated number concentration of hydrometeors in our simulations. Therefore, it is plausible that the relatively low hydrometeor number concentration is related to the coarser resolution in our ICON simulations. A realistic simulation of turbulence and cloud-scale vertical motion is crucial for Arctic mixed-phase clouds (Rauber and Tokay, 1991;Korolev and Field, 2008;Shupe et al., 2008). As the number of activated CCN is a function of grid-scale vertical velocity, it is likely that our simulations at 1.2 km resolution do not sufficiently resolve in-cloud vertical motion and turbulence (Tonttila et al., 2011). This is consistent with the fact that characteristic eddy sizes in Arctic mixed-phase clouds are less than 1 km (Pinto, 1998). Fan et al. (2011) suggested that only horizontal model resolutions of less than 100 m are able to resolve major dynamic features that contribute to vertical motion in Arctic mixed-phase clouds. Not being able to resolve those features consequently affects particle size distributions and its moments like number concentration as too few droplets are activated (Morrison and Pinto, 2005).
To account for subgrid-scale vertical motion, vertical velocity in the aerosol activation in larger-scale models is often parameterized as a function of specific turbulent kinetic energy (TKE; Ghan et al., 1997;Lohmann et al., 1999), which is defined as where u , v , and w are the subgrid-scale deviations from grid-scale velocity, and the overbar denotes grid-box average. To explore the effects of including subgrid-scale vertical velocity in the Hande et al. (2016) CCN activation parameterization, we chose to follow a similar approach as proposed in Ghan et al. (1997), who assume the subgrid vertical velocity in a grid box to follow a Gaussian distribution, i.e., P (w | w, σ w 2 ). The grid-box-averaged number of activated CCN can, therefore, be written as the integral over positive vertical velocities: To numerically solve the integral in Eq. (6), a simple trapezoidal integration is employed using 50 equally spaced bins in a ± 3σ w range around w.
If it is assumed that subgrid-scale motion in low-level Arctic mixed-phase clouds is isotropic (u 2 = v 2 = w 2 ), as proposed by Pinto (1998), the variance of vertical velocity can be expressed as a function of TKE as follows (Morrison and Pinto, 2005): Using turbulence measurements on a tethered balloon during the PASCAL ice floe operations, Egerer et al. (2019) showed that isotropic turbulence is a valid assumption for a subset of days during PASCAL that have been analyzed in their study. We, nevertheless, are aware that isotropic subgrid-scale motion in Arctic clouds cannot be assumed for all conditions (Curry et al., 1988;Finger and Wendling, 1990).
The effects of this revised CCN activation for the period from 2 to 5 June are shown in Fig. 7. Compared to the original activation parameterization, the model shows a much closer agreement with the measurements, although an overestimation of hydrometeors with diameters less than 20 µm is simulated, while it underestimates the number of hydrometeors larger than 30 µm. As the number of small hydrometeors governs the total number of hydrometeors, their overestimation leads to an overestimated number of total hydrometeors in the whole diameter range between 5 and 40 µm. The particle size distribution now is in better agreement with the findings by Schemann and Ebell (2020), as we find an overestimation of smaller hydrometeors and underestimated number concentration of larger hydrometeors compared to in situ observations. The shift of the particle size distribution towards smaller hydrometeors can be related to the unsuited CCN profile within the activation parameterization. As discussed above, autoconversion is the predominant sink for cloud water in the absence of precipitation formation via the ice phase. The fact that the revised activation of CCN increases N d eventually leads to a reduction in the size of cloud droplets (see Fig. 7a). This reduces the collection efficiency of cloud droplets, which leads to a less efficient autoconversion process, which can be seen in the shift in the histogram of q c towards higher values in Fig. 7c. Compared to the ACLOUD observations, small values of liquid water content less then 0.3 g m −3 are underestimated, while values larger than that threshold are simulated more frequently in the revised CCN activation.
The presently used CCN activation profile was originally derived for spring conditions in Germany, where one would expect a much higher load of CCN compared to the Arctic. To have a more realistic representation of CCN, a dedicated simulation with a model that is able to represent the formation and transport of aerosols would be necessary. We opt against this approach and instead scale the number of activated CCN from the default profile using a scaling factor of 0.4. A more elaborate description why this scaling factor was used is given in Sect. A. The chosen scaling factor results in an underestimated number of hydrometeors smaller than 22 µm as is shown in Fig. 8, while hydrometeors with larger diameters are overestimated by the model. Looking at the hydrometeors number concentration, the chosen scaling factor shifts the simulated distribution towards smaller hydrometeor concentrations that consequently results in a slight underestimation of hydrometeors compared to the observations. This indicates that the chosen scaling factor is slightly too effective in reducing the number of activated CCN. Compared to Fig. 7, high values of liquid water content larger than 0.3 g m −3 occur less frequently when scaling the number of activated CCN, but there is still a slight underestimation in the frequency of occurrence for q c values between 0.1 g m −3 and 0.3 g m −3 . Even though scaled, the overall shape of the profile of activated CCN as a function of vertical velocity remains unchanged. A different aerosol composition or just a different vertical profile of aerosols alters the shape of the profile, which might also lead to biases in the number of activated CCN. This emphasizes the need for an CCN activation profile that is better suited for an Arctic environment, which has also been proposed by Schemann and Ebell (2020).
The effect of the different CCN activation setups on the CRE for all flights from 2 to 5 June is shown in Fig. 9a-c. We would like to point out that the cloud fields between the respective CCN activation setups vary. For that reason, the number of available datapoints for which the threshold for sea ice coverage and cloudy conditions are fulfilled at the same time differ between the runs due to the filtering that is employed. Similar to the histograms in Fig. 4, which cover all flights used in this comparison, the warming effect of clouds at the surface is overestimated when looking at the period from 2 to 5 June. For the revised CCN activation, the increase in q c reflects the surface CRE, which now has a small negative bias compared to the ACLOUD observations. Because of the aforementioned constant profile of cloud droplet number concentrations in the calculation of the effective radius within the radiation scheme, this negative bias would be more strongly expressed if the actual cloud droplet number concentration from the microphysical scheme were to be used (see Sect. 5.3). When scaling the activated number of CCN by a factor of 0.4 using the revised CCN activation, the CRE is still overestimated by ICON compared to observations even though the positive bias in the median could be reduced by approximately 5 W m −1 . As downscaling the number of activated CCN by a factor of 0.4 was already slightly too effective in reducing the hydrometeor number, a larger scaling factor might be able to further decrease the CRE in the model.
From the previously conducted sensitivity study employing a more effective CCN activation, it is not clear whether the above-reported biases in cloud microphysical properties is a source (inefficient CCN activation) or a sink issue (autoconversion that is too effective). To this end, we conducted a further sensitivity study with unchanged CCN profile and in which autoconversion was turned off entirely (see the Supplement). While the effect on q c is comparable to the revised activation, but not yet scaled CCN activation (see Fig. 7), the cloud droplet number concentration is still underestimated. Furthermore, the shape of the size distribution does not match the shape of the observed one. Since the CCN profile used in the activation of CCN into cloud droplets within the cloud microphysical scheme is not suited for an Arctic domain as it overestimates the availability of CCN, the underestimated amount of cloud droplets in the simulations with autoconversion turned off is indicative for a source rather then a sink problem of cloud droplets in our simulations.

Coupling of hydrometeor number concentration to radiation
As already discussed above, there is an inconsistency between the hydrometeor number concentration derived in the two-moment microphysics and that used in the radiation routines. Therefore, in the following, we explore the effect of making the hydrometeor concentrations consistent between the two parameterizations. As input for the calculation of optical properties, ICON uses cloud droplet and ice crystal effective radius, which is defined as the ratio of the third to the second moment of the size distribution. Previously, effective radii were computed solely as a function of specific masses. To ensure consistency with the size distributions in the Seifert-Beheng two-moment scheme, we calculate the effective radii from the used gamma distribution (see Sect. B for the derivation). This new implementation has already been used in Costa-Surós et al. (2020). In Fig. 9d-f, the biggest difference to the uncoupled hydrometeor number concentrations (Fig. 9a-c) can be seen in the histograms for the revised CCN activation (Fig. 9e). In this setup, the CRE is underestimated compared to observations due to higher hydrometeor concentration, which is now also considered in the radiation parameterization. For the revised and scaled CCN activation, only little differences are simulated between coupled and uncoupled hydrometeor concentration. As stated above, the fixed cloud droplet number concentration in the default radiation routines is already relatively close to the hydrometeor concentration observed for the flights from 2 to 5 June. Nevertheless, compared to the observations, the median value of the CRE in ICON in Fig. 9f is closest to the observed values, even though they are still slightly overestimated. Altogether, the revised CCN activation with a scaled CCN activation and coupled hydrometeor now results in a positive bias of only approximately 6 W m −2 . The effect on surface CRE of the coupling of hydrometeor number concentration to radiation for this period is relatively low (1 W m −2 ; see Fig. 9c and f), as the assumed number concentration in the default setup and the number concentrations from the two-moment microphysical scheme in the revised and scaled CCN activation are in a similar range. As can be seen from Fig. 9b and e, if the N d profile in the microphysics deviates from the profile in the radiation, there can be quiet substantial differences due to a more realistic representation of the Twomey effect (Twomey, 1977), which can be important for relatively clean or polluted situations. As can be seen in Fig. 4, the differences in the CRE for the respective sensitivity experiments are again primarily mediated by its solar component, whereas the ter-restrial components are in good agreement with the observationally derived terrestrial CRE components (see the Supplement).

Conclusions
In this study, we use observational data from the ACLOUD and PASCAL campaigns  to compare them to limited-area simulations with the ICON atmospheric model at kilometer-scale resolution. While the model compares well to the observations in its ability to simulate the four cloud-surface radiation regimes in the Arctic, it severely underestimates cloud radiative effects in the solar spectral range. This is despite a slight underestimation of the geometrical cloud thickness and attributable to droplet number concentrations that are too small and liquid water content that is too little when simulated by the model. We showed that it is crucial to correctly represent in-cloud turbulence in Arctic clouds, which is essential to correctly simulate hydrometeor number concentration and liquid water content. The findings of this study are mainly representative in the case of turbulence-driven stratiform and optically thin single-layer clouds that contain liquid water but are, to some extent, also valid for multilayer clouds, which was confirmed by an analysis of days in mid June 2017, where such conditions prevailed. Furthermore, similar improvements were obtained at lower horizontal and vertical resolutions (2.4 km and 50 vertical levels) when including subgrid vertical motion in the activation of CCN into clouds droplets, which gives us confidence that such an approach can also be beneficial for simulations with coarser spatial resolutions.
As reported by Stevens et al. (2020), the representation of clouds in atmospheric models benefits from higher-resolved simulations. Nevertheless, long-term global simulations at the hectometer scale will not be feasible in the foreseeable future , whereas climate projections at the kilometer scale can be achieved . It is, therefore, especially important to improve models on such scales to enable them to make realistic simulations. As shown in this study, aircraft observations are a valuable source of information and can be used for evaluating and improving the representation of physical processes for models at the kilometer scale. The results presented in our study might also be beneficial to the representation of clouds in ICON in other regions, where clouds are driven by turbulence.  (Benedetti et al., 2009). We computed the number of activated CCN for various vertical velocities and also supersaturation for a sea-ice-covered domain north of Svalbard during the period from 2 to 5 June following the approach of Block (2018). Close to the surface, the number of activated CCN at a supersaturation of 0.5 % in this dataset is approximately 45 cm −3 . This value is on the lower end of the observed number concentrations of activated CCN during PASCAL, which were in a range of 40 to 80 cm −3 during this period (Wendisch et al., 2019, their Fig. 10).
To decide which scaling factor to use, we looked for a scaling factor (in steps of 0.05) that minimizes the mean squared error of the scaled profile and the profile derived from CAMS for several vertical velocities in an altitude band from the surface to 700 hPa. From Table A1, we find that a scaling factor of 0.4 is a good compromise for relatively low vertical velocities in Arctic clouds. Even though scaled to best mating the CAMS profile, the overall shape of the profile of activated CCN in ICON remains unchanged. Figure A1 shows that the default profile strongly overestimates the number of activated CCN close to the surface while nicely matches the CAMS profile for altitudes higher than 800 hPa. As almost all clouds from 2 to 5 June were below that altitude, it is more important to correctly represent the number of activated aerosol particles close to the surface. The number of activated CCN is almost constant up to 850 hPa, whereas the number of activated CCN in the CAMS profile increases with altitude. Even though we cannot match the shape of the activation profile, a scaling factor of 0.4 should represent an approximate average up to 850 hPa.
Appendix B: Derivation of effective radius from gamma distribution To describe the particle size distributions of all hydrometeor categories in the Seifert-Beheng two-moment microphysical scheme (Seifert and Beheng, 2006), a modified gamma distribution is used: where x is the particle mass, and ν and µ are the parameters of the distribution for the respective hydrometeor category. Coefficients A and λ can be expressed by the number and mass densities and the parameters ν and µ (Eq. 80, Seifert and Beheng, 2006). Following Petty and Huang (2011), the kth moment M k of such a modified gamma distribution can be expressed as follows: The ratio between the third and second moment can, therefore, be written as To obtain the effective radius, Eq. (B1) has to be first converted into a function of radius. According to Eq. (54) in Petty and Huang (2011), the particle size distribution as a function of radius f (r) can be written as A r r ν r exp −λ r r µ r = A x(r) ν exp −λ r µ dx dr .
The particle mass as a function of radius x(r) in the Seifert-Beheng two-moment microphysical scheme is defined as follows: which differs from the functional relationship given in Table 1 in Petty and Huang (2011), as the values for a and b are defined differently (see Table 1 in Seifert and Beheng, 2006). Therefore, Inserting Eqs. (B5) and (B6) into Eq. (B4) and comparing the respective parameters for radius and mass in Eq. (B1), we find the following conversion relationships for the parameters in the particle size distribution: By inserting those parameters into Eq. (B3) and applying the functional dependencies for A and λ from Eq. (80) in Seifert and Beheng (2006), the effective radius r eff can be written as follows: where q and N are the mass and number densities for the respective hydrometeor category.
Data availability. The ICON model output data used in this study are stored at the German Climate Computing Center (DKRZ) and are available upon request from the corresponding author. The observational data from the ACLOUD and PASCAL campaigns are archived on the PANGAEA repository and can be accessed from the following DOIs: broadband (solar and terrestrial) irradiances ( Supplement. The supplement related to this article is available online at: https://doi.org/10.5194/acp-20-13145-2020-supplement.
Author contributions. JK, JS, MW, and JQ conceived this study. DK helped setting up the input data for the ICON runs and gave valuable expertise on how to run the model in a limited-area setup. JK and JS prepared and analyzed the model and observational data, respectively. All of the authors assisted with the interpretation of the results. JK prepared the article with contributions from all coauthors.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Arctic mixed-phase clouds as studied during the ACLOUD and PASCAL campaigns in the framework of (AC) 3 (ACP/AMT/ESSD inter-journal SI)". It is not associated with a conference.