Understanding the model representation of clouds based on visible and infrared satellite observations

Satellite observations provide a wealth of information on atmospheric clouds and cover almost every region of the globe with high spatial resolution. The measured radiances constitute a valuable data set for evaluating and improving clouds and radiation representation in climate and numerical weather prediction (NWP) models. An accurate, bias-free representation of clouds and radiation is crucial for data assimilation and the increasingly important solar photovoltaic (PV) power production prediction. The present study demonstrates that visible (VIS) and infrared (IR) Meteosat SEVIRI observations contain valuable 5 and complementary cloud information for these purposes. We analyse systematic deviations between satellite observations and convection-permitting, semi-free ICON-D2 hindcast simulations for a 30-day period with strong convection. Both visible and infrared satellite observations reveal significant deviations between the observations and model equivalents. The combination of infrared brightness temperature and visible solar reflectance allowed to attribute individual deviations to specific model shortcomings. Furthermore, we investigate the sensitiv10 ity of model-derived VIS and IR observation equivalents to modified model and visible forward operator settings to identify dominant error sources. The results reveal that model assumptions on subgrid-scale water clouds are the primary source of systematic deviations in the visible spectrum. Visible observations are, therefore, well-suited to advance this essential model assumption. The visible forward operator uncertainty is lower than uncertainties introduced by model parameter assumptions by one order of magnitude. In contrast, infrared satellite observations are very sensitive to ice cloud model assumptions. Fi15 nally, we show a strong negative correlation between VIS solar reflectance and global horizontal irradiance. This implies that improvements in VIS satellite reflectance prediction will coincide with improvements in the prediction of surface irradiation and PV power production.

turn, are mainly determined by atmospheric clouds (Zack, 2011). According to Köhler et al. (2017), the main shortcomings of  VII Two-moment scheme with strongly reduced asymmetry factor for subgrid-liquid clouds (A = 1.5, B = 2.5) and no subgrid ice-clouds. This simulations was motivated because the two-moment scheme reflected too much radiation, and temperature but does not allow for retrieving the TCW/TCI. In contrast, the solar reflectance is only 0.35 at these threshold values and can still provide information on the water/ice content up to TCW/TCI values of about 1 kg/m 2 . As the flux of the 160 reflected radiation is anticorrelated with the flux of the transmitted radiation (which will be discussed more quantitatively in Sect. 3.3), solar reflectances are sensitive to changes in cloud water/ice content in the range where such changes have also the strongest impact on the incoming solar radiation at the surface.
These different and complementary sensitivities show that model evaluation with solar and thermal channels has the potential to provide more information on the nature of the systematic errors and to possibly identify specific shortcomings that 165 would not be visible by only examining a single channel.

Satellite forward operators
To compute model equivalents for visible satellite images from the ICON model state, we employ the VISible satellite image Forward OPerator (VISOP) that uses the fast 1D radiative transfer (RT) method MFASIS (Scheck et al., 2016). MFASIS is 170 based on a compressed lookup table (LUT), computed using the DISORT solver. It is possible to consider aerosols or different kind of ice habits for the computation of the MFASIS LUT (results in section 4.2). VISOP takes the slant satellite viewing angle into account (tilted independent column approximation; (Wapler and Mayer, 2008)) and accounts for the most important 3D RT effect by using the cloud top inclination correction (CTI) described in Scheck et al. (2018). The surface albedo values required as input for MFASIS are taken from the RTTOV-BRDF Atlas .

175
As we aim to achieve consistent assumptions in both the operator and the NWP model, we decided to use effective radii from the model output for water clouds directly. This is based on the consideration that radiative transfer, micro-physics and possibly operators should deal with the same optical properties.
However, some adjustments are required for the ice clouds, as will be motivated in the following. The micro-physics scheme in the simulation predicts six hydrometeor categeories: cloud water, cloud ice and precipitating liquid water, snow, hail, and 180 graupel. Rain droplets, hail and graupel particles are assumed to be much larger than cloud droplets and cloud ice particles in the model. Therefore, for the same mass they are also much less effective in scattering radiation and are thus neglected in the forward operators. However, the distinction between snow and cloud ice particles in the model is rather artificial. Model snow particles can be small enough to cause non-negligible scattering effects (see discussion in Hogan et al., 2001). Hence, as a (first) approximation we construct a frozen phase whose total mass, Q tot i , is the sum of the diagnosed ice water content (grid 185 and subgrid-scale) and snow content (only grid scale available) and whose "effective effective radius", is defined using the simulated effective radii of cloud ice r i,eff and snow r s,eff . This approximation assumes that the optical thickness of the frozen phase is equal to the sum of the optical thicknesses of the ice and snow phases. The approximation becomes exact in case of wavelengths much smaller than the hydrometors size (optical limit), and therefore it is quite appropriate 190 for visible channels. we use the approach to find a suitable calibration offset by minimising the average histogram difference between the observed and simulated solar reflectance distribution. Through this, we found an offset of -13 % between observations and our reference simulation.
To derive SEVIRI infrared brightness temperature from the model state, we use the efficient methods implemented in the RT-

200
TOV 12.1 package (Saunders et al., 2018), which is used by many weather services. The spatial resolution of MSG SEVIRI VIS006 and IR108 observations is 3 km x 3 km at subsatellite point and reduces to 6 km x 3 km in the ICON-D2 domain, with a temporal resolution of 15 min.
For the evaluation, we applied both operators at the full model resolution and interpolated solar reflectances and brightness temperature to observation space afterwards to avoid additional representativeness errors. (Marseille and Stoffelen, 2017).

Global horizontal irradiance observation and forward operator
Global horizontal irradiance (GHI) is the total amount of shortwave radiation and includes both direct normal irradiance (DNI) and diffuse horizontal irradiance (DHI). Deutscher Wetterdienst operates a dense network of GHI observations across Germany.
GHI is an hourly average and is evaluated at 122 pyranometer stations (Fig. 1)

Verification metrics
A combination of metrics is applied to evaluate synthetic satellite imagery at 12 UTC with observations. The verification domain (red rectangle in Fig. 1) is smaller than the ICON-D2 domain to exclude nesting effects at the domain boundaries and signals from snow-covered surfaces in the Alps that exhibit reflectances similar to clouds. We show VIS006 and IR108 probability density functions (PDF) of our simulations P sim and calibrated observations P obs , without coarsening or thinning.

215
The number of bins N of the PDFs is 50, with R∈[0,1] and BT∈ [200,300] K. From that, we define the cloudiness (C) as the fraction of pixels in which the solar reflectance is higher than a threshold value R c of 0.2. This value is an upper limit for the clear-sky reflectance in the considered verification domain (see discussion in Scheck et al. (2018)). Violin plots are used to visualize the daily bin-by-bin deviation of the PDF (deviation computed for each day d and bin n) from the reference run and model/operator sensitivity experiments: hist n,d = P obs n,d − P sim n,d . This allows for a consistent comparison of VISOP and 220 model uncertainty, by examining the median deviation (the mean is always zero), the interquartile range (difference between 75th and 25th percentile) as a measure for variability and the range as the extent of deviations. We further analyze clouds by constructing contoured 2D probability density function (PDF) plots of brightness temperature and solar reflectance, comparable to contoured frequency by altitude diagrams (CFADs) of radar observations. We use the US. Standard Atmosphere 1962   (Sissenwine et al., 1962) to classify brightness temperatures into three cloud categories (low, middle and high clouds) as 225 defined in the International Cloud Atlas (Cohn, 2017). In the US Standard Atmosphere, the surface temperature is 288 K and the (wet) temperature lapse rate is 0.65 K/100 m, leading to temperature ranges of T > 275 K for the surface and low clouds, 275 K ≤ T ≤ 243 K for middle clouds and T < 243 K for high clouds.

Synoptic overview and cloudiness
A 30-day period from 26 May to 24 June 2016 is analysed, which is dominated by strong summer-time convection in Germany.

230
In the beginning, large parts of Europe were affected by high-impact weather events over almost two weeks. Atmospheric blocking and interaction of low thermal stability and weak mid-tropospheric winds were the ingredients for this exceptional sequence of thunderstorms and related flash floods (Piper et al., 2016). Many authors have discussed these two weeks (see e.g. Atlantic and the Mediterranean to Germany and supporting cloud formation (Fig. 3). In general, the simulated cloudiness (defined in section 2.5) is predominantly overestimated, leading to a period mean observed and simulated cloudiness of 0.73 and 0.76, respectively. This convective period with high cloud cover at different levels is well suited to examine the cloud climatology and its sensitivity to cloud-related parameterisations.

Selected cases
In this section, we exemplarily discuss two days of the period to illustrate the methodology of evaluating clouds using visible and infrared satellite channels. On the first one (29 May), deep convection and severe thunderstorms occurred leading to a leads to a strong clear-sky signal arising from the North Sea (R ocean ⇡ 0.05) (Fig. 7e).

479
The solar reflectance distribution further reveals a relatively good agreement between 480 observed and simulated cloudy reflectances. That is also true for the BT-distribution, 481 however the number of middle to high clouds are systematically underpredicted.

482
These two days, with clouds on di↵erent levels and optical thicknesses already high- uniform distribution of observed solar reflectances (Fig. 4e). Overall, the agreement between observed and simulated visible histograms is relatively good given that the model is forced towards the current weather only through the boundary conditions.
The vortex structure of the cyclogenesis is also apparent in the IR108 observation (Fig. 4b), but the simulation shows clear systematic errors. In the simulation, the cloud pattern is dominated by relatively high ice clouds (Fig. 4d), which are less fre-  quent in the observations. The histogram confirms this picture: The signal of high clouds is overestimated in the simulations,

255
whereas the signal of medium clouds is underestimated by 40 %.
On 02 June 2016, boundary layer clouds dominated in both the observation and simulation (Fig. 5b&d). Additionally, superimposed ice clouds are observed in some regions. The simulated IR108 distribution fits the observed one relatively well on this day (Fig. 5f). In the visible satellite image (Fig. 5a&c), a high cloudiness is apparent, with 87 % in the observation and 91 % in the simulation. Different to 29 May, however, the distribution (Fig. 5e) reveals an overestimation of medium-thick clouds, The examples discussed above show that the examination of a single channel (VIS or IR) can lead to opposite conclusions with respect to forecast quality. The agreement of the histograms for 29 May is good in the visible range but not in the IR.
The opposite is observed for the 02 June. This shows that both channels provide complementary information. In the following, we show that further information can be obtained by using the combined information of both channels in 2D PDF plots of 265 brightness temperature and reflectance. We have already discussed how the IR histogram shows an overestimation of high     clouds on the 29 May. The combined histograms (Fig. 6a & 6c) provide the additional information that this overestimation of clouds mostly happens for thick clouds (R>0.6). This indicates that the model produced too strong deep convection. On 02 June, where lower clouds dominated the scene, the observation and simulations agree on the vertical location of the shallow cumulus clouds (Fig. 6b & 6d). However, solar reflectances are primarily distributed around 0.7 in the observation and around 270 0.5 in the simulation. Compared to the 1D reflectance histogram, the 2D PDF provides the additional information that the systematic reflectance errors are related to low clouds. These two days with predominantly deep convective clouds (29 May) and low clouds (02 June) are exemplarily for different cloud types and formation processes. Their analysis therefore illustrates the benefit of combining a solar and an infrared channel.
The analysis of individual cases presented above illustrates certain characteristics, but longer periods are required to identify systematic model deficiencies. To address this, we now present results for the 30-day period. The observed mean VIS006 solar reflectance distribution at 12 UTC reveals a clear-sky peak at low reflectance values (R ∈ [0, 0.2]), a nearly uniform distribution for higher reflectances (R ∈ [0.2,0.8]) and a sharp decrease for reflectances higher than 0.8 (Fig. 7a). The distribution of the reference simulation overall looks similar, but shows some deviations from the flat plateau seen for the observations, with a 280 surplus of clouds around a reflectance 0.5. Fig. 7b presents a histogram of the 30-day mean IR108-BT at 12 UTC and overall confirms findings from previous studies using convection-permitting models. There are generally too many clouds with low brightness temperatures (BT<240K). This, together with an underestimation of mid-level clouds in our ICON simulations is a well known issue that has also been found in other studies (e.g. Illingworth et al., 2007;Pfeifer et al., 2010;Böhme et al., 2011;Keller et al., 2016). The distribution further reveals a clear-sky bias, where the model underpredicts high BT values.

285
In general, the cloud climatology based on 2D PDF plots indicates that the model and observation distributions have similar structures (Fig. 7c & 7d). Noticeable differences in the distribution occur in boundary-layer clouds. The increase in solar reflectance with decreasing brightness temperature (increasing height) is noticeably steeper in the observations (indicated by dashed white lines in the plots). This means that thick boundary-layer clouds consistently reach higher levels in the observations, and suggests that shallow convection is too weak in the model. The 2D PDFs further indicate that the surplus of clouds 290 around a reflectance of 0.5 in the model is related to boundary layer clouds, revealing a deficiency in the model representation of liquid water clouds. In addition, the simulation lacks in producing enough mid-level clouds at all solar reflectances. Finally, a secondary maximum at low BTs and high solar reflectance (R≈0.8) is apparent in the simulations but not in the observations. This maximum mainly corresponds to deep convective and precipitating clouds, which are either too active or produce too much ice, similar to 29 May. High-level clouds (cirrus as well as iced cloud tops) and low-level clouds are generally overestimated.

295
The combined histograms clearly show important shortcomings in shallow and deep convection. Combined histograms can thus provide additional information on the nature of the systematic errors evident in the 1d histograms, and very valuable information for model development, showing which model configuration produces more realistic clouds.

Global horizontal irradiance
In the previous section, we examined systematic deviations between observed and simulated VIS006 solar reflectance and 300 IR108 brightness temperature. Now we investigate if improvements in the forecast of satellite solar reflectence are related to improvements of irradiance forecasts at the surface, which are crucial for the prediction of PV power production. For an effective cloud albedo (CAL, Mueller et al., 2011), the basic relation between surface solar irradiance (SSI) and CAL is predominantly linear, assuming energy conservation (Cano et al., 1986;Beyer et al., 1996). CAL is based on a broadband visible channel and SSI is the downwelling broadband solar irradiance. However, using one narrowband visible channel the 305 dependency can not be exactly linear because of several reasons: A high amount of incoming solar energy is absorbed in the atmosphere by water vapour and ozone, which is not represented in VIS006 solar reflectance, since its wavelength is centered

Contributions of different clouds to the reflectance distribution
For understanding the sensitivity of the synthetic visible satellite images to changes in operator settings and model modifications, it is helpful to determine the contribution of different hydrometeor types and subgrid-scale clouds to the reflectance histogram of the reference run (Fig. 7a). Figure 9 shows the observed and simulated VIS006 solar reflectance distribution (OBS and REF are the same as in Fig. 7a), the distribution that results from taking only grid-scale clouds into account (REF-grid) 330 and several distributions obtained by using only certain types or combinations of hydrometeors.
Grid-scale clouds only lead to a distribution with a nearly flat plateau between reflectances 0.3 and 0.7, a feature that is also found in the distribution of the observed reflectances. However, the fraction of cloud pixels would decrease from C = 0.76 to 0.5 if only grid-scale clouds were present. Adding subgrid clouds results in much better agreement with the observed value of C = 0.73. It is thus essential to take these additional subgrid clouds into account. However, the imperfect parameterisation of  Figure 10. Differences between reflectance histograms obtained for the reference run with modified operator settings and standard settings.
The modified settings are switching off the cloud top inclination, using maximum-random instead of random subgrid cloud overlap, including aerosols with an optical depth of 0.1 and changing the cloud ice particle habit to solid columns. For comparison also the difference between observation and reference run histogram is shown (dashed curve).
ily, reflectances larger than 0.5 become slightly less frequent. In contrast, taking only ice clouds (including snow) into account 340 (REF-IS) has a more substantial impact on the histogram and results in much smaller cloudiness of C = 0.29. Water clouds thus play a much more substantial role for the reflectance distribution than ice clouds. This result is not surprising as the total column ice content is much smaller than the water content (Fig. 2c) and additionally larger ice particles are less effective in scattering light than smaller water droplets (Fig. 2a).
In both the water-only and the ice-only cases, the corresponding subgrid clouds are included. The water-only curve (REF-W) 345 shows the same deviation from the plateau-like shape of the observed distribution as the curve computed for all clouds (REF), but the ice-only curve (REF-IS) does not. Thus, it seems that the subgrid water cloud parameterisation needs to be improved to get better agreement in the histogram shapes. Finally, ignoring the simulated snow content (REF-WI) has a small, but detrimental effect. This emphasizes the need for including snow in the computation of the RT input variables as discussed in section 2.3.

Estimated uncertainty of VISOP
Forward operators use fast, approximate RT methods and rely on the limited information that is available from the NWP model.
Due to missing RT effects and missing information (e.g. on subgrid-scale cloud properties) their output is to some extent uncertain. While forward operators for thermal infrared channels have been available for some time and their uncertainties have been investigated in several studies (e.g. Senf and Deneke, 2017;Saunders et al., 2017Saunders et al., , 2018, no such information is 355 available for visible channels. In the following, the uncertainty related to what we regard as the most critical error sources will be estimated by varying the corresponding operator settings. The potential sources of uncertainty to be investigated are related to missing 3D RT effects, unknown or inconsistent overlap statistics of subgrid-scale clouds, the spatial and temporal variation of aerosols and the shape of cloud ice particles. To estimate upper limits of the uncertainty in the reflectance distribution related to these sources, we repeated the computation of visible 360 reflectances applying VISOP to the reference simulation with deactivated cloud top inclination (CTI) parameterisation, random instead of random-maximum subgrid cloud overlap, and aerosols or a different kind of ice habit included in the MFASIS LUT.
The deviations in the reflectance distribution for the reference run caused by changing these operator settings are shown in Fig. 10.
The subgrid cloud overlap assumptions would not be a source of operator uncertainty if the assumptions in the NWP model 365 and the operator were entirely consistent. However, the near-operational version of ICON employed to perform the model runs for this study uses inconsistent overlap assumptions in the infrared and visible part of the spectrum. This inconsistency will likely be corrected in future versions, but at the moment, it means that the operator cannot be entirely consistent with the model. The deviation in the reflectance distribution caused by changing the assumption from maximum-random to random in the operator (orange line in Fig. 10) can be regarded as an upper limit for the impact. Changing the assumption shifts the peak 370 around R=0.5 (which is related to subgrid clouds, as discussed in Sect. 4.1) to higher reflectances, but has not much influence on reflectances larger than 0.7.
Missing or imperfectly modelled 3D RT effects are likely the source of uncertainty that is most difficult to quantify. According to Scheck et al. (2018) the most important 3D effect is related to the inclination of the cloud top surface, which influences the observed reflectance. The parts of the cloud top surface tilted towards the sun appear brighter and those tilted away from 375 the sun darker. The cloud top inclination correction (CTI, see Scheck et al., 2018) accounting for this effect has been shown to reduce the error with respect to full 3D RT calculations and is included in the reference run. The main effect of the CTI on the reflectance histogram is to reduce the slope at the high reflectance end of the distribution and to bring it in better agreement with observations. Switching off the CTI leads to a too steep decline of the distribution at high reflectances, which is visible as a double peak structure at R > 0.8 in Fig. 10. Other 3D RT effects like cloud shadows may also play a role, in particular for 380 larger zenith angles. However, by focusing on observations near local noon, their influence should be minimized.
According to retrievals based on measurements at AERONET stations (see Giles et al., 2019) in Germany, the mean AOD in June 2016 was in the range 0.06 to 0.12 at a wavelength of 675 nm, which is similar to the wavelength of the visible channel considered here. To estimate the impact of these aerosols on the reflectance histogram, an MFASIS LUT was computed that includes aerosols (the "continental clean" aerosol mixture available in libRadtran, see Emde et al. 2016) with an optical 385 depth of 0.1. Including aerosols in the MFASIS LUT, i.e. taking direct aerosols effect into account, influences the reflectance histogram in two ways. Aerosols can scatter photons out of their path towards the satellite, which is the dominant effect at high reflectances, or scatter photons towards the satellite that would otherwise not have reached it, which is important for low reflectances. In the presence of aerosols the high reflectance end of the distribution is thus shifted towards lower reflectances and the low reflectance end towards higher reflectances. Shifting the pronounced ground peak in the distribution causes a 390 double peak structure at low reflectances in Fig. 10, whereas shifting the flat high reflectance end only causes a single negative peak. In general, the error introduced by direct aerosol effects for events like (Saharan) dust outbreaks can be higher, and could potentially lead to significant errors in solar reflectances. Days affected by such events, which did not occur during our test period, should thus be excluded from model evaluation studies.
Finally, the shape of cloud ice particles is also an uncertain factor that influences the reflectances distribution. Changing 395 the shapes quite strongly from the baum_v36 general habit mixture (Baum et al., 2014) to solid columns (using the optical properties by Yang et al. 2005) basically only affects the highest reflectances, which are slightly reduced. The ice habit is thus not likely to cause large uncertainties in the reflectance distribution. Figure 11 shows the deviations of the reflectance and BT distributions computed for model runs using modified settings (see 400 Sect. 2.1) with respect to the reference run. In general, these deviations are of similar magnitude as the systematic deviations between the observations and the model equivalents for the reference run discussed in section 3 (see dashed curve in Fig. 11). In section 3, we identified several reasons for systematic deviations between the simulation and observations: An underestimation of thick clouds (R in [0.6,0.8]), a too low boundary layer height, too many high clouds and an insufficient representation of low-level water clouds. As further analysed in Sect. 4.1, we found that the discrepancy in low-level clouds mainly arises from manuscript submitted to Atmospheres  The results shown in Fig. 11a indicate that a modified version of experiment VII with weaker modifications or a combination of II, III and IV could be able to achieve the corrections required for the reference run, i.e. to reproduce the dashed line (OBS-

REF).
In both cases the subgrid water clouds play an important role. To correct systematic errors in the reflectance distributions it therefore seems particularly important to tune or advance the subgrid cloud scheme. While the reflectance distribution is not 445 sensitive to changes in subgrid ice clouds, these are clearly important for the BT distribution (compare experiments V and VI in Fig. 11a,b). The combined information from the two parts of the spectrum can thus provide guidance on optimizing the subgrid cloud scheme.
In contrast to visible reflectances, there is no obvious way to scale or combine the model modifications in order to eliminate the errors of the reference run, i.e. to reproduce the dashed line in Fig. 11b. Additional or different model modifications appear 450 to be required for this purpose, but the results presented here already indicate that particular modifications leading to less grid-scale ice clouds are required.

Sensitivity intercomparison for visible reflectances
The comparison of Fig. 10 and Fig. 11a Fig. 12 show these daily bin-by-bin deviations of the reflectance distribution caused by changes in the operator settings and model modifications. Figure 12 indicates that also for individual days of the test period the changes due to model modifications are much larger than the ones related to operator uncertainty. inclination or changes in the cloud ice optical properties could mitigate some of the deviations at high reflectances and using the correct aerosol optical depth can particularly improve the low-reflectance end of the distribution (see Fig. 10). However, for a broad range of reflectances between 0.2 and 0.8 it is only the inconsistency in the overlap assumption that makes the operator results uncertain. As discussed above, this is actually only a temporary issue related to the current versions of the ICON model. As soon as the overlap assumptions in the the model are consistent, the correct choice of the overlap assumption 470 can be regarded as a model setting and model evaluation using visible reflectances can provide information on suitable choices.

Conclusions
We investigated systematic differences between satellite observations and corresponding synthetic observations from the preoperational ICON-DE model to understand better the representation of clouds and radiation in NWP models. For this purpose, a semi-free 30-day convection-permitting hindcast simulation was conducted that is only forced by low-resolution analysis 475 boundary conditions for a highly convective period in May/June 2016. Besides, some additional simulations with modified model settings were conducted to identify dominant error sources and identify potential approaches for improving the representation of clouds in ICON-D2.
The evaluation facilitates a novel approach based on both solar and infrared satellite observations. The combination of observations in these two spectral ranges provides significantly more and complementary information than the use of only infrared 480 observations pursued in previous studies. While infrared observations provide information on cloud top height, their signal quickly saturates in the presence of clouds. This means that infrared observations can only distinguish a small range of cloud water contents and information on water clouds may be obscured by cirrus clouds above. In contrast, solar channels are relatively insensitive to ice clouds and can distinguish a much more extensive water content range.
As solar satellite observations are novel for model evaluation, we conducted a number of sensitivity experiments with modified 485 operator settings to investigate the recently developed forward operator's uncertainty transforming from model to observations space. The comparison revealed that the operator uncertainty is roughly one order of magnitude smaller than the sensitivity of the results to modified model settings. This further emphasises the usefulness of solar channels for model evaluation and improvement.
In addition, we investigated the correlation of these two satellite channels with irradiance in view of improving forecasts of