Evaluation of aerosol and cloud properties in three climate models using MODIS observations and its corresponding COSP simulator, and their application in aerosol-cloud interaction

. The evaluation of modeling diagnostics with appropriate observations is an important task that establishes the capabilities and reliability of models. In this study we compare aerosol and cloud properties obtained from three different climate models ECHAM-HAM, ECHAM-HAM-SALSA, and NorESM with satellite observations us- 5 ing MOderate Resolution Imaging Spectrometer (MODIS) data. The simulator MODIS-COSP version 1.4 was implemented into the climate models to obtain MODIS-like cloud diagnostics, thus enabling model to model and model to satellite comparisons. Cloud droplet number concentrations (CDNC) are derived identically from MODIS-COSP simulated and MODIS-retrieved values of cloud optical depth and effective ra- 10 dius. For CDNC, the models capture the observed spatial distribution of higher values typically found near the coasts, downwind of the major continents, and lower values over the remote ocean and land areas. However, the COSP-simulated CDNC values are higher than those observed, whilst the direct model CDNC output is signiﬁcantly lower than the MODIS-COSP diagnostics. NorESM produces large spatial biases for 15 ice cloud properties and thick clouds over land. Despite having identical cloud mod-ules, ECHAM-HAM and ECHAM-HAM-SALSA diverge in their representation of spatial and vertical distribution of clouds. From the spatial distributions of aerosol optical for AOD over bright land surfaces, while discrepancies between ECHAM-HAM and ECHAM-HAM-SALSA can be observed mainly over oceans. Overall, the AIs from the different models are in good agreement globally, with higher negative biases on the Northern Hemisphere. We evaluate the aerosol-cloud interactions by computing the sensitivity parameter ACI CDNC = dln(CDNC)/dln(AI) on a global scale. However, 5 one year of data may be considered not enough to assess the similarity or dissimilarities of the models due to large temporal variability in cloud properties. This study shows how simulators facilitate the evaluation of cloud properties and expose model deﬁciencies, which are necessary steps to further improve the parametrization in climate models.


Introduction
A climate model is a powerful tool for investigating the response of the climate system to various forcings, enabling climate forecasts on seasonal to decadal time scales, and therefore can be used for estimating projections of the future climate over the coming centuries based on future greenhouse gas and aerosol forcing scenarios (Flato, 15 2011). Based on physical principles, climate models reproduce many key aspects of the observed climate and aid to understand the dynamics of the physical components of the climate systems.
The evaluation of modeling diagnostics is an important task that establishes the capabilities and reliability of models. When key properties of the atmosphere (e.g., 20 clouds, aerosols) are considered, the model assessment is relevant to assure that the climate model correctly captures key features of the climate system. The interest in the reliability of climate models reaches outside the scientific community, as these simulations will form the basis for future climate assessments and negotiations. Therefore, understanding the level of reliability is a necessary step to strengthen the robustness 25 of climate projections and, if necessary, improve the model parametrizations for the relevant processes.
For the evaluation of parametrizations of aerosol indirect effects in global models, satellite data have been proven to be useful (Quaas et al., 2009;Boucher et al., 2013) as they provide large spatial coverage at suitable temporal resolution. Satellite 30 instruments measure the intensity of radiation coming from a particular direction in a selected wavelength range. From the observed radiances, the geophysical quantities are then inferred by inverse modeling using a retrieval algorithm.
The compensation of modeling errors, the intrinsic uncertainties of observational data, and the possible discrepant definitions of variables between models and observational data are major issues affecting the crucial task of model evaluation. For that, satellite simulators have been developed to mimic the retrieval of observational data and to avoid ambiguities in the definition of variables mentioned above. Simulators 5 recreate what the satellite would retrieve when observing the modeled atmosphere. By reprocessing model fields using radiative transfer calculations, they generate physical quantities fully consistent with the satellite retrievals. By including microphysical assumptions, which usually differ between models, inconsistencies in the simulators are avoided. Hence, simulators represent a robust and consistent approach not only for the 10 application of satellite data to evaluate models, but also for model-to-model comparisons. Simulators have been widely used, and their implementation in several models enables intercomparison studies on atmospheric variables, such as clouds, aerosols (Quaas et al., 2009;Williams and Bodas-Salcedo, 2017;Zhang et al., 2010;Luo et al., 2017), and upper atmospheric humidity (Bodas-Salcedo et al., 2011). 15 Two prominent examples of simulators are the simulator developed as part of the International Satellite Cloud Climatology Project, ISCCP, (Klein and Webb, 2009;Yu et al., 1996)

and the CFMIP (Cloud Feedback Model Intercomparison Project)
Observation Simulator Package, COSP (Bodas-Salcedo et al., 2011). CFMIP is part of The Coupled Model Intercomparison Project (CMIP) (Eyring et al., 2016b;Webb 20 et al., 2017), which is a framework providing the modeling community with guidelines for the development, tuning and evaluation of models (Eyring et al., 2016a, c). COSP is a software tool developed within the CFMIP (Webb et al., 2017), which extracts parameters for several spaceborne active sensors, such as the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) and the Cloud Profiling Radar (CPR), as well 25 as for passive sensors, such as the Multi-angle Imaging SpectroRadiometer (MISR) and the Moderate Resolution Imaging Spectroradiometer(MODIS).
In this study the COSP version 1.4 was implemented in three climate models, namely ECHAM-HAM, ECHAM-HAM-SALSA and NorESM, and the diagnostic outputs of the MODIS simulator were compared to MODIS observational data col- 30 lected during the year 2008. The main goal of this study is to evaluate the models' capability to realistically represent clouds by employing MODIS satellite observations and its corresponding COSP simulator. A secondary goal of the study is to estimate the aerosol-cloud interactions through the application of the cloud droplet number concentration (CDNC) derived from observed and COSP simulated values of cloud optical 35 thickness and effective radius. Aerosol-cloud interactions are based on the role of aerosol particles in changing cloud properties, which involves several processes (Bellouin et al.). In this study we focus one of these processes, known as the first aerosol indirect effects or Twomey effect, which can be quantified by an indicator parameter, called ACI, defined as the change in an observable cloud property (e.g., cloud optical 5 depth, cloud effective radius, cloud droplet number concentration) to a change in a cloud condensation nuclei proxy (e.g. aerosol optical depth, aerosol index, or aerosol particle number concentration). The analysis of aerosol-cloud interactions has been reported in literature by a variety of methods: studies presenting results from global scales (Feingold et al., 2001;Quaas et al., 2010) to regional scales (e.g. Saponaro 10 et al., 2017;Ban-Weiss et al., 2014;Liu et al., 2017Liu et al., , 2018 and in-situ observations (e.g. Sporre et al., 2014), using different approaches, i.e. observations from satellites, airborne and ground based instrumentation, or modelling. Nonetheless, the quantification of the aerosol-cloud interactions is still a major uncertainty in understanding climate change (e.g. Lohmann et al., 2007;Quaas et al., 2009;Storelvmo, 2012;Flato 15 et al., 2013;Lee et al., 2016;Seinfeld et al., 2016;Bellouin et al.).
The choice of observations and spatial scale of a study presents intrinsic uncertainties when quantifying aerosol-cloud interactions, and some of them relate to spatial or temporal limitations or artifacts (McComiskey and Feingold, 2012). When considering satellite observations, cloud and aerosols properties are provided at a quite 20 comprehensive spatial and temporal coverage; however several aspects bring challenges in the analysis of these observations. The primary artifacts known to affect satellite estimation of aerosol-cloud interactions are related to (1) the inability of untangling aerosol and cloud retrievals from meteorology (e.g. aerosol humidification, entrainment, cloud regimes dependency), (2) inaccuracies in the retrieval algorithms 25 (e.g. near-cloud impacts on radiative transfer, contamination, statistical aggregation) and (3) assumptions in the retrieval algorithms (Koren et al., 2007;Oreopoulos et al., 2017;Christensen et al., 2017;Wen et al., 2007).
In this work, the Cloud Feedback Model Intercomparison Project (CFMIP) Observation Simulator Package (COSP) (Bodas-Salcedo et al., 2011) is implemented in 30 three climate models to obtain satellite-like diagnostics that enable a direct comparison with satellite retrieval fields. In particular, we focus on liquid cloud properties, which are used to derived CDNC. Cloud droplet number concentration is computed for both satellite observations and satellite-simulated values in a consistent way using an algorithm presented in Bennartz (2007). Aerosol-cloud interactions are quantified over 35 ocean by parameter ACI CDNC = dln(CDNC)/dln(AI). By considering the changes in CDNC, it is possible to isolate the microphysical component of the ACI without the need for constraining the liquid water path.
In Section 2 we provide details of the MODIS data, the models, and the COSP simulator. Section 3 presents the methods used in the analysis of the data. The evaluation 5 of the simulator cloud diagnostics with MODIS satellite data on a global scale is presented in subsections 4.1 and 4.2, while the ACI results are shown in subsection 4.3.
Conclusions are summarised in Section 5. views the entire Earth's surface every 1 to 2 days, thus representing an extensive data set of global Earth observations. MODIS delivers a wide range of atmospheric products including aerosol properties, water vapour, cloud properties, and atmospheric sta-15 bility variables.
We consider data for the year 2008 from MODIS-Aqua since its equatorial crossing time (13:30 local time) ensures a more complete development of the cloud during its daily cycle. MODIS Level-1 (L1) products are geo-located brightness and temperature values, which are elaborated into geophysical data products at Level-2 (L2), and ag-20 gregated onto a uniform space-time grid at Level-3 (L3). We used the latest Collection 6.1 daily MODIS/Aqua MYD08L3, which is a regular gridded Level-3 daily global product (Hubanks et al., 2016). The dataset is limited to observations made during daytime, because these contain a richer set of retrievals and better accuracy in cloud detection. As the L3 1 • x 1 • gridded average values of atmospheric properties, along 25 with a suite of statistical quantities, are derived from the corresponding L2 atmosphere data product, a brief description of Level-2 MODIS aerosol and cloud products is now presented.
The Level-2 MODIS aerosol products provide information regarding the aerosol loading and aerosol properties over cloud-, snow-, and ice-free land and ocean sur- 30 faces at a spatial resolution of 10 km x 10 km. The primary aerosol product is the aerosol optical depth (AOD), derived globally at the wavelength of 550 nm, while the other parameters accounting for the aerosol size distribution, such as the Ångström exponent (AE) or fine-mode aerosol optical depth, are only derived over ocean (Levy et al., 2013). Additionally, the aerosol index (AI) can be derived by multiplying AOD by AE. The MODIS aerosol products have been extensively validated using highlyaccurate observations made by the Aerosol Robotic Network (AERONET) (Sayer et al., 2014) showing good agreement with in-situ measurements. The uncertainty in 5 MODIS retrievals of AOD from validation studies (Levy et al., 2007) was quantified at 0.03 + 0.05×τ A over ocean and 0.05 + 0.15×τ A over land, where τ A is the reference AOD value from AERONET. In this study we primarily focus on the analysis of liquid cloud properties. However, MODIS aerosol data (Levy et al., 2013) is needed to assess aerosol-cloud interactions. 10 The Level-2 MODIS physical and optical cloud properties are derived trough a combination of infrared emission and shortwave reflectance techniques at a spatial resolution varying from 1 km to 5 km, depending on the parameter (Platnick et al., 2017). Collection 6.1, which is used in this work, provides cloud optical parameters divided into different products accordingly to the cloud phase and retrieved at wave-15 lengths of 2.1 µm, at 1.6 µm and 3.7 µm (Hubanks et al., 2016;Platnick et al., 2017).
As the COSP simulator simulates cloud properties at 2.1 µm, the same wavelength is selected in the MODIS observations for both ice and liquid clouds. MODIS offers two scientific L3 cloud fractions datasets, namely the cloud mask cloud fraction and the cloud optical properties cloud fraction (datasets with prefix 'Cloud Fraction' and 20 'Cloud Retrieval Fraction', respectively). From now on we refer to the cloud mask cloud fraction as CF, and to the cloud optical properties cloud fraction as COP CF.
While the CF counts the proportion of the pixels classified by the cloud mask as cloudy or partly cloudy, the COP CF counts the proportion of the pixels for which cloud optical properties have been successfully derived. The main difference between 25 these two definitions roots in the approach of handling partly cloudy pixels. As the task of the cloud mask is to identify fully clear pixels, partly cloudy pixels are counted as cloudy in CF, while in the COP CF they are counted as clear because the retrieval algorithm aims to include only fully cloudy pixels. The different treatment of partly cloudy pixels directly impacts the number of cloud pixels, and consequently many 30 other retrieved cloud properties. Therefore differences are expected in our results and as already reported by Pincus et al. (2012). MODIS observations are here used as a reference dataset. However, MODIS data contains its own errors and limitations. Many studies compared MODIS liquid cloud microphysical properties with in-situ and airborne campaign measurements finding strong correlations for COT but a systematic 35 significant overestimation of MODIS cloud-top droplet effective radius (CER) for marine stratus and stratus cumulus clouds due to possible instrument limitation and algorithm retrieval assumptions (e.g. Noble and Hudson, 2015;Painemal and Zuidema, 2011;Min et al., 2012). A good CER correlation between MODIS and in-situ data was however observed by e.g. Preißler et al. (2016) for marine warm stratiform clouds at 5 higher latitudes. A bias in MODIS CER is propagated into the derivation of MODIS LWP, which also shows a positive bias with respect to the observations (e.g. King et al., 2013;Noble and Hudson, 2015;Painemal and Zuidema, 2011;Min et al., 2012).
Overestimated MODIS LWP were also found over a high-latitude measurement land site (e.g. Sporre et al., 2016) for clouds from all altitudes in the atmosphere. Marchant 10 et al. (2016) showed that the C6 cloud phase discrimination algorithm is significantly improved over C5 but some situations continue to be problematic over regions located at higher latitudes (i.e., polar areas, Greenland, and large desert areas).
In this study, we derive CDNC following the method presented in Bennartz (2007) and this additional cloud parameter is used in the computation of ACI. More informa-15 tion is provided in Sect. 3.2.

COSP -The CFMIP Observation Software Package
The simulator COSP (Bodas-Salcedo et al., 2011) is a publicly available software package (https://www.earthsystemcog.org/projects/cfmip/) developed by the CMIP community (Webb et al., 2017). It consists of a module coded in FORTRAN90, which 20 simulates cloud properties and can be implement in any model. The simulator's working principle is based on using climate model fields to mimic radiances to which a retrieval algorithm is applied to obtain satellite-like fields for the comparison with satellite observations. This process is summed up in three main phases. As model grids are very coarse 25 (∼100 km), the model fields are first down-scaled: each model gridbox mean profile is broken into subcolumns, whose size is more representative of a satellite retrieval area (∼10 km). Details of this downscaling process and corresponding assumptions are provided in Pincus et al. (2012). Next, each sub-column profile is processed by a forward radiative transfer model to create synthetic radiances at the satellite retrieval

ECHAM-HAM
ECHAM-HAMMOZ (echam6.3-ham2.3-moz1.0) is a global aerosol-chemistry climate model (Schultz et al., 2018;Kokkola et al., 2018;Tegen et al., 2019;Neubauer et al., 2019) where ECHAM refers to the atmospheric model of the model configura- 5 tion, HAM to the aerosol model, and MOZ to the chemistry model. In this study only the global aerosol-climate model part of ECHAM-HAMMOZ is used. Instead of the comprehensive MOZ chemistry model, sulphate chemistry is calculated in HAM for which the details have been given by Zhang et al. (2012) and references therein.
ECHAM-HAMMOZ, referred to here as ECHAM-HAM, consists of the general 10 circulation model ECHAM  coupled to the latest version of the aerosol module HAM (Tegen et al., 2019) and uses a two-moment cloud microphysics scheme that includes prognostic equations for the cloud droplet and ice crystal number concentrations as well as cloud water and cloud ice (Lohmann and Diehl, 2006;Lohmann et al., 2007Lohmann et al., , 2008Lohmann and Hoose, 2009). Aerosol microphysical pro-15 cesses such as nucleation, coagulation, condensational growth are computed by the modal scheme M7 (Vignati et al., 2004). HAM computes further processes such as emissions, sulfur chemistry (Feichter et al., 1996), dry depositon, wet deposition, sedimentation, aerosol optical properties, aerosol-radiation and aerosol-cloud interactions.
Next to the two-moment cloud microphysics scheme the stratiform cloud scheme 20 includes an empirical cloud cover scheme (Sundqvist et al., 1989).
The cirrus scheme is based on Kärcher and Lohmann (2002) and described in Lohmann et al. (2008), cloud droplet activation uses the Abdul-Razzak and Ghan (2000) parameterization, the autoconversion of cloud droplets to rain follows the method from Khairoutdinov and Kogan (2000), immersion and contact freezing in mixed- 25 phase clouds follows the scheme from Lohmann and Diehl (2006), and cumulus convection is represented by the parameterization of Tiedtke (1989) with modifications developed by Nordeng (1994) for deep convection.
Simulations were performed at T63 (1.9 • × 1.9 • ) spatial resolution using 31 vertical levels (L31) and COSP v1.4. Horizontal winds and surface pressure were nudged 30 towards the ERA-Interim (Dee et al., 2011) reanalysis for 2008, and observed sea surface temperatures and sea ice cover for 2008 were used (Taylor et al., 2000). Threehourly instantaneous output was used. The output of the COSP satellite simulator is also three-hourly. The implementation of the COSP satellite simulator in ECHAM-8 HAM does not allow instantaneous output. The COSP satellite simulator is called every radiation time step (i.e. every two hours) and the output of the COSP satellite simlator is averaged over the three hourly output period. This means that on average 50% of the values in the output of the COSP satellite simulator are instantenous values (i.e. from only one time step) and 50% of the values are an average over two radiation 5 time steps (i.e. an average over two instantaneous values which are two hours apart).

ECHAM-HAM-SALSA
ECHAM-HAM-SALSA is identical to the ECHAM-HAM setup (echam6.3-ham2.3-moz1.0), with the difference that the sectional aerosol module SALSA (Kokkola et al., 2008(Kokkola et al., , 2018 is used instead of the modal model M7 used in the ECHAM-HAM 10 setup. SALSA calculates the aerosol microphysical processes: nucleation, coagulation, condensation, and hydration. In this setup, the aerosol model HAM applies also the sectional scheme for the rest of the aerosol processes, i.e. emissions, removal, aerosol radiative properties, and aerosol-cloud interactions. In addition to differences in the aerosol size distribution scheme, also the wet deposition schemes differ between 15 the ECHAM-HAM and ECHAM-HAM-SALSA setups. In addition, while ECHAM-HAM uses the cloud activation parameterization for modal models (Abdul-Razzak and Ghan, 2000), SALSA uses the activation parameterization for the sectional representation (Abdul-Razzak and Ghan, 2002). Along with the details of these differences, the implementation and the evaluation of SALSA with the ECHAM-HAMMOZ model 20 version which is used in this study has been presented by Kokkola et al. (2018).

NorESM
The Norwegian Earth System Model (NorESM) (Kirkevåg et al., 2013;Bentsen et al., 2013;Iversen et al., 2013) is largely based on the Community Earth System Model The aerosol microphysics scheme in the NorESM version of CAM, called CAM-Oslo, consists of 12 log-normally shaped background modes which are tagged according to emission source and chemical composition (Kirkevåg et al., 2018). The shape of these modes can change due to condensation and coagulation.
In the current simulations, the NorESM model was run with the CAM-Oslo version 5 5.3 (Kirkevåg et al., 2018), which is configured with the microphysical two moment scheme MG1.5 (Morrison and Gettelman, 2008;Gettelman et al., 2015) for stratiform clouds. The scheme includes prognostic equations for liquid (mass and number) and ice (mass and number) and a version of the Khairoutdinov and Kogan (2000) autoconversion scheme where subgrid variability of cloud water (Morrison and Get-10 telman, 2008) has been included. The aerosol activation into cloud droplets is based on Abdul-Razzak and Ghan (2000) and the heterogeneous freezing in CAM5.3-Oslo is based on Wang et al. (2014) with a correction applied to the contact angle model (Kirkevåg et al., 2018). Moreover, CAM5.3-Oslo has a shallow convection scheme (Park and Bretherton, 2009) and a deep convection scheme (Zhang and McFarlane, 15 1995  The resolution for the simulation was 0.9 • × 1.25 • and the surface pressures as well as horizontal winds were nudged against ERA-Interim reanalysis data (Berrisford et al., 2011) from 2008. CAM-Oslo was run with COSP version 1.4 producing three-25 hourly instantaneous outputs.

Post-processing of the datasets
The comparison of satellite retrievals and model variables is not always straightforward. Satellite-retrieved physical quantities may be derived slightly differently than 30 the corresponding parameters in the model, and differences can be attributed to discrepancies in the retrieved quantities viewed from space versus model fields (i.e. retrieval assumptions, sensor limitations, spatial resolution) (Bodas-Salcedo et al., 2011). In this study we aim at highlighting the differences between observations and models, which stem from different aerosol and cloud physical parametrization by using the COSP satellite simulator. Satellite simulators, such as COSP, represent a compromise between model fields and retrieved fields. Simulators use model fields to reproduce what the satellite sensor would see if the atmosphere had the clouds of a 5 climate model. By taking the characteristics of the MODIS instrument into account COSP generates simulated fields of cloud parameters, which can be quantitatively compared to MODIS observations. The COSP diagnostics are then successively aggregated to the simulator outputs and are provided at the original model resolution.
Prior to their intercomparison, post-processing of the COSP diagnostics and satellite Note that all discussed cloud parameter are diagnosed using satellite simulators and are compared to the corresponding MODIS satellite observations. However, we use two direct model diagnostics in the study: -AOD, which is used to derive the AI, a proxy for cloud condensation nuclei for the computation of ACI -CDNC direct , which is compared with COSP-simulated and MODIS-derived estimates 3.2 Cloud droplet number concentration (CDNC) 5 The CDNC were derived from CER and COT from MODIS observations and COSP simulations by combining Eqs. (6) and (9) from Bennartz and Rausch (2017) in the following equation: where COT is cloud optical thickness, CER is the cloud droplet effective radius and 10 γ = 1.37 · 10 −5 m 0.5 (Quaas et al., 2006). The assumption of not accounting for temperature effect and setting γ as a bulk costant applies rather well to the stratiform clouds in the marine boundary layer but less so for convective clouds (Bennartz, 2007;Rausch et al., 2010). The assumption of not accounting for temperature effect and setting γ as a bulk costant applies rather well to the warm stratiform clouds in the marine 15 boundary layer but less for convective clouds (Bennartz, 2007;Rausch et al., 2010;Grosvenor et al., 2018). The equation represents the "Idealized Stratiform Boundary Layer Cloud" (ISBLC) model (Bennartz and Rausch, 2017) which is based on the following assumptions: the cloud is horizontally homogeneous;

20
the LWC increases linearly from the cloud base to the cloud top; the CDNC is constant throughout the vertical extent of the cloud.
While the ISBLC model describes important aspects of stratiform boundary layer clouds, its assumption will never be fully valid for a real cloud. Issues related to the IS-BLC model assumptions are extensively elaborated in Bennartz (2007) and Bennartz 25 and Rausch (2017) and references therein. Despite the above mentioned approach for deriving CDNC is less reliable over land ares, we will provide MODIS CDNC values globally (land and ocean). These estimates will be used as a reference dataset, rather than "truth" data, for enabling the comparison of COSP-derived CDNC and the model direct output of CDNC.

Assessing the aerosol-cloud-interactions
The aerosol-cloud-interactions is quantified here trough the parameter ACI CDNC defined as the change in the selected cloud property, CDNC, to a change in AI, which is used here as a proxy for cloud condensation nuclei (CCN):   (Grandey and Stier, 2010). This step was included in the post-processing of the 20 datasets.

Global bias distributions
In this section we compare on a global scale aerosol and cloud properties from the three models by subtracting MODIS retrievals from the modeled COSP diagnostics. 25 From now on we will refer to the difference between the simulated parameters and MODIS retrieved values using the term bias.
Overall, the spatial distributions of the biases (Fig. 1-5) show large discrepancies around the polar and ice-covered areas, such as Greenland and Antarctica. Over these areas large discrepancies are expected due to the inaccuracy of the MODIS retrieval 30 13 algorithm due to viewing geometry (i.e. large zenith or viewing angles) and to correctly classify opaque clouds, snow/ice surfaces and optically thin clouds over really bright or warm surfaces (Marchant et al., 2016). Figure 1 presents the differences between the MODIS-COSP cloud fraction diagnostics and COP CF for ice clouds CF ice (Fig. 1b-d), and liquid clouds CF liq (Fig. 1f-5 h), as well as the differences between MODIS total COP CF ( Fig. 1j-l), and CF ( Fig.   1n-p). Additionally, for each comparison the MODIS spatial distribution is presented as reference (Fig. 1a,e,i,m). It was already highlighted in section 2.1 that the cloud fraction retrieved from the optical properties (CF ice , CF liq and COP CF) excludes partly cloudy pixels, representing a limitation in the comparison of the data. Thus, The total cloud fraction bias shows a positive bias between the MODIS-COSP CF simulated by the three models and MODIS COP CF (Fig. 1j-l) and a negative bias when MODIS CF is considered (Fig. 1n-p). Consequently, MODIS CF is higher than the MODIS COP CF product. This outcome is to be expected, and possibly originates from the different treatment in the MODIS algorithm of partly cloudy pixels in the 25 computation of CF and COP CF, as discussed in section 2.1. Additionally, all models underestimate CF in marine subtropical stratocumulus regions.
The spatial distribution of the cloud physical and optical properties is remarkably similar among the model datasets with the exception of CER ice , IWP (Fig. 2 d and l) and COT (Fig. 3g,k) for NorESM. These strong biases are explained by the fact that This is likely caused by the cirrus scheme which does not account for heterogenous 35 nucleation or pre-existing ice crystals during formation of cirrus clouds (Neubauer et al., 2019;Lohmann and Neubauer, 2018). Interestingly, dissimilarities can also be observed between ECHAM-HAM and ECHAM-HAM-SALSA, despite the fact that the models share the same cloud module. ECHAM-HAM CER liq is on average 5 µm smaller than in ECHAM-HAM-SALSA in the mid-latitude belt, and ECHAM-HAM-5 SALSA CER liq is larger around the polar areas (Fig. 2g) and shows a large positive bias for LWP over ocean (Fig. 2o) in comparison to ECHAM-HAM. LWP is also overestimated by NorESM but over land areas (Fig. 2p), while ECHAM-HAM shows a good agreement with MODIS (Fig. 2n).
The evaluation of COT shows homogeneous results and comparable values of root    Other localized distinctions in aerosol loading distribution can be observed over regions which are typically strongly affected by primary emissions (such as the Sahara, India, Southeast Asia, Russia, Canada, central Africa, and South America). The different representation of size distribution, microphysical processing of aerosols and sink processes has a significant effect on the modelled AOD as shown for the aerosol mod-5 ule SALSA2.0 by Kokkola et al. (2018). The overestimation of AOD in the tropical oceans and underestimation of AOD at higher latitudes and over land in ECHAM-HAM has also been found by Tegen et al. (2019).

Joint histogram
The analysis of the CTP-COT joint histogram enables to determine how well the data 10 sources represent the vertical cloud structures and regimes. Figure 6 shows the comparison of the simulated and observed global mean cloud fraction as a function of cloud top pressure and cloud optical thickness. ECHAM-HAM and ECHAM-HAM-SALSA ( Fig. 6a,b) show a nearly identical result by concentrating a large fraction of clouds at low level (CTP ≤ 680 hPa) and in the interval 3.6 ≤ COT ≤ 23. NorESM 15 (Fig. 6c) also concentrates its largest amount of clouds at low levels in the same COT interval as in Fig. 6a and Fig. 6b, but detects also a higher fraction (about 2-2.5%) of optically thick clouds 9.4 ≤ COT ≤ 60 throughout the atmosphere. A second cloud fraction peak is observed for optically thin clouds (COT ≤ 1.3) at very high levels

Aerosol-cloud interactions
The daily mean values of CDNC and AI were used to assess how clouds are affected by the changes of the CCN proxy. Uncertainties were computed as the 95% confidence intervals using daily averages. Positive estimates of ACI CDNC indicate an increase of CDNC as a function of AI, which could be an indication of the aerosol indirect effects.
The potential limitations to this approach are discussed in Sect.refsummary. Figure 7 shows estimates of ACI CDNC on a global scale, although over ocean areas only, for each season and, separately, for the entire period under study as 'All'. The 5 same analysis is iterated on a regional scale, including both land and ocean areas, and presented in the supplementary material (Fig.S4) Error bars are representative of the boundaries of the 95% confidence interval. The positive ACI CDNC values resulting from MODIS and the three models suggest that changes in AI are connected with an increase of CDNC and the trend seems to be independent of the time of the year. 10 The modeling ACI CDNC estimates are similar in the models; however, the results are statistically indistinguishable owing to fully overlapping confidence bars (Cumming et al., 2007). MODIS ACI CDNC estimates show negative values for the winter months (DJF), especially over the Northern Hemisphere (Fig.S4). As these negative values are derived over land regions, it could be indicative of retrieval biases over bright sur-15 faces (i.e. snow or ice). Furthermore, it is important to inform the readers that MODIS aerosol size parameters over land (i.e. AE or fine-AOD) are no longer official products directly provided by the MODIS aerosol team. The publication of these variable were discontinued due to low quantitative MODIS skill (Mielonen et al., 2011;Levy et al., 2013). Using spectral AOD, we derived AE over land and derived AI on a global 20 scale to allow estimates of ACI CDNC on a global scale (Fig.S4). However, the AE values over land were not evaluated. However, negative ACI CDNC values may be associated with the presence of different types of aerosol (i.e. hydrophobic aerosol such as dust, black carbon) and their proximity to clouds, which may affect or inhibits the growth of cloud droplets (Chen et al., 2018;Jiang et al., 2018;Costantino and Bréon, 25 2013 air by entrainment (Ackerman et al., 2004). While both processes affect LWP, CDNC is not necessarily changing. This indicates limits in the derivation of CDNC from retrieved quantities for MODIS. Also water uptake by aerosol particles and effects of meteorology can have a significant impact on the estimation of ACI CDNC derived from the relationship between CDNC and AI (Neubauer et al., 2017). 35 Cloud properties (Fig. 2 and Fig. 3) are more similar for ECHAM-HAM and ECHAM-HAM-SALSA, which share the same atmospheric model, rather than between the two and NorESM. Nevertheless, the ACI CDNC estimates show good agreement between the three models and, even more important, with ACI CDNC derived from MODIS observations. 5 5 Summary, discussion and conclusions The differences between observed and modeled aerosol and cloud properties can be related to many factors, among which are the different parametrizations of aerosol and cloud physical processes in the models, or differences in observation characteristics by satellite, as well as meteorological influences on aerosol-cloud interactions. In The results presented here indicate that the cloud droplet number concentration appears to be more sensitive to changes in aerosols in models than observations and these results are in agreement with many previous studies found in the literature (e.g. Ban-20 Weiss et al., 2014;Quaas et al., 2004;McComiskey and Feingold, 2012;Penner et al., 2011). Some of the challenges and limitations in assessing ACI CDNC are now highlighted. AOD retrievals are limited to cloud-free conditions, which creates challenges to studying the ACI CDNC where the intention is to study collocated aerosol and cloud observations. Unless height-resolving instruments (i.e. lidars) are considered, the ver-25 tical location of the AOD level is unknown. Aerosol and cloud measurements may contain retrieval errors, which are further propagated to ACI CDNC estimates, as well as they reciprocally may bias the respective retrievals (Jia et al., 2019) . The interpretation of the observed aerosol-cloud relationships is complicated by the effect of meteorology (Quaas et al., 2010;Gryspeerdt et al., 2014Gryspeerdt et al., , 2016Brenguier et al., 2003). 30 As cloud formation happens in high humidity conditions, aerosol humidification can severely affect the assessment of ACI by causing positive correlation between AOD and cloud properties (Myhre et al., 2007;Quaas et al., 2010;Grandey et al., 2013;Gryspeerdt et al., 2014). Additionally to aerosol particles, water vapour also affects precipitation (Boucher et al., 2013), obviously linked to the presence of clouds, and 35 consequently causes spurious correlations between aerosols and clouds (Koren et al., 2012). Therefore, some of the differences in the ACI CDNC estimates from satellites and models could be associated with limitations in satellite measurements. For example, the estimates of ACI CDNC might suffer from an averaging effect due to the large spatial averages of satellite aerosol and cloud properties. L3 data can introduce spuri-5 ous relationships between aerosols and cloud properties (e.g. McComiskey and Feingold, 2012;Christensen et al., 2017), and provide a rather limited pool of data samples enabling the analysis only over large regions. This was not explored in this study be- crepant results from the comparison of COSP-derived CDNC and the modeled direct output of CDNC, where the MODIS data represents solely the reference dataset rather than "truth" data, stem from the aerosol model set-up and this discrepancy has potentially important implications for the modelling community, using satellite observations to evaluate standard (not satellite-simulated) model output. 10 Finally, we point out that the model deficiencies identified here may lead to an improvement of model parametrization and to more robust results. As future work, a regional-based analysis would enable a better understanding of the physical processes responsible for the model biases. Additional research should be conducted to evaluate the aerosol-cloud-interaction following the approach suggested by Neubauer  Uncertainties estimates are calculated as 95% confidence interval from the daily values.