Evaluating MODIS cloud retrievals with in situ observations from VOCALS-REx

Microphysical measurements collected during eleven profiles, by the UK BAe-146 aircraft, through marine stratocumulus as part of the Variability of the American Monsoon Systems (VAMOS) Ocean-Cloud-AtmosphereLand Study Regional Experiment (VOCALS-REx) are compared to collocated overpasses of the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the Aqua and Terra satellite platforms. The full depth of the cloud is sampled in each case using a Cloud Droplet Probe (CDP) and a Two-Dimensional Stereo Probe (2DS) together sizing cloud and precipitation droplets in the diameter range 2–1260 μm. This allows the total optical depth ( τc) of the cloud and effective radius ( re) of the droplet size distribution to be compared to MODIS cloud retrievals of the same quantities along with the secondarily derived total liquid water path. When compared to the effective radius at cloud top, the MODIS retrievedre using the 2.1 μm wavelength channel overestimates the in situ measurements on average by 13 % with the largest overestimations coinciding with the detection by the 2DS of drizzle sized droplets. We show through consideration of the full vertical profile and penetration depths of the wavelengths used in the retrieval that the expected retrieved values are less than those at cloud top thus increasing the apparent bias in re retrievals particularly when using the 1.6 and 2.1 μm channels, with the 3.7 μm channel retrievals displaying the best agreement with in situ values. Retrievals of τc also tend to overestimate in situ values which, coupled with a high bias in re retrievals, lead to an overestimation of liquid water path. There is little apparent correlation between the variation of the three near-infrared re retrievals and the vertical structure of the cloud observed in situ. Retrievals are performed using measured profiles of water vapour and temperature along with an accurate knowledge of the width of the droplet size distribution which improve agreement between in situ and retrieved values but cannot completely explain the observed biases. Additionally we show that cloud heterogeneity and three-dimensional radiative effects may high skew the mean when averaging over comparison domains but cannot explain all of the apparent high bias. An intercomparison between in situ measurements from the BAe-146 and C-130 platforms is also presented, highlighting the uncertainties associated with in situ observations.


Introduction
Low-level marine stratocumulus clouds cover large regions of the global oceans on a quasi-permanent basis and are therefore critical modulators of the global radiation budget (Klein and Hartmann, 1993).Their complex interactions with aerosols (Lohmann and Feichter, 2005) and their spatial and temporal variability mean that their accurate representation in global models remains one of the largest uncertainties in modelling future climate (Forster et al., 2007).Accurate observations of cloud frequency, spatial distribution and microphysical properties over large spatial and temporal scales are therefore an important requirement for improved model representation.Two of the most important cloud parameters governing their radiative properties are cloud optical thickness τ c and droplet effective radius r e .Together they specify the total liquid water path of the cloud and are used as indicators of aerosol-cloud-precipitation interactions (Twomey, 1991;Albrecht, 1989;Lohmann and Feichter, 2005).
Operational measurements of r e and τ c from space can be made by inverting observations of reflected solar radiation at two or more wavelengths (Nakajima and King, 1990).
Published by Copernicus Publications on behalf of the European Geosciences Union.
Typically, reflection at a wavelength which does not experience absorption by liquid water droplets is used to estimate the total optical thickness of the cloud whilst a wavelength which does experience absorption, and is thus sensitive to the droplet sizes in the cloud, provides information about r e .A series of radiative transfer calculations for different values of r e and τ c are undertaken to construct libraries of two-band reflectance lookup tables for specific solar and viewing geometries.Measurements of reflection are then used to search the lookup tables for the r e , τ c pair which minimises the difference between the measurement and lookup table values.The retrieval of r e and τ c can then be used to estimate the total liquid water path of the cloud, which is proportional to the product of r e and τ c .This technique or similar is utilised on data from a variety of satellite platforms, most notably the Moderate Resolution Imaging Spectroradiometer (MODIS) instruments aboard the Aqua and Terra satellites (Platnick et al., 2003).
Between them Aqua and Terra provide near global coverage every 1 to 2 days and therefore provide an important global dataset of cloud properties for use in climate and cloud process studies.A clear understanding of the quality of the MODIS cloud products therefore has important implications and can be explored by comparison with collocated in situ measurements.Such comparisons are however difficult due to the difference in spatial and temporal sampling of an overpassing satellite and an in situ aircraft.The situation is complicated further by the fact that a satellite retrieval assumes a vertically uniform cloud with one single bulk value of effective radius retrieved for the entire cloud.Numerous in situ profiling studies of droplet size and liquid water content (many of which are conveniently summarised in Miles et al. (2000)) have found considerable vertical variation throughout real clouds.In the context of a cloud with vertically varying droplet size it is therefore unclear to what level in the cloud the satellite retrieved effective radius should correspond, thus making comparison to an in situ aircraft flying at a single level problematic.Platnick (2000) showed that the effective radius retrieved from a satellite reflectance measurement is dependent on the vertical penetration of reflected photons into the cloud.The vertical penetration of photons is a function of the profile of droplet size and liquid water content of the cloud and most notably the absorption properties of the wavelength which is being used for the retrieval.The MODIS instrument employs one of three channels in the near-infrared for droplet size retrievals, centred around 1.6, 2.1 and 3.7 µm (channels 6, 7 and 20 respectively), with the degree of absorption increasing with wavelength.With increasing absorption the probability of a photon being reflected back out of the cloud without first being absorbed decreases.The 3.7 µm channel is therefore expected to be scattered largely by droplets in the uppermost layer of the cloud and the retrieval of effective radius utilising this channel (hereafter referred to as r 3.7 ) corresponds to the droplet sizes found near to the cloud top.The 2.1 and 1.6 µm channels are less absorbing and their respective retrievals of effective radius (r 2.1 and r 1.6 from hereafter) are determined by droplet sizes deeper in the cloud.
The three retrievals of r e from the MODIS algorithm utilising the three different near-infrared wavelength bands in theory represent r e at different heights in the cloud.Comparison of these retrievals should therefore offer information on the vertical variation of r e .Parcel theory predicts that in non-precipitating liquid water clouds r e monotonically increases from cloud base to cloud top, a theory supported by numerous in situ observations.Given the relative penetration depths of photons at the three wavelengths this should in general lead to r 3.7 > r 2.1 > r 1.6 .Studies of the differences between the three retrievals however have found that r 3.7 is most frequently less than r 2.1 (Seethala and Horváth (2010), Zhang and Platnick (2011), Nakajima et al. (2010b)).Methods have also been proposed to combine measurements in all three near-infrared channels to retrieve a vertical profile of r e (Chang and Li (2002), Chang and Li (2003), Kokhanovsky and Rozanov (2012)).However attempts to verify profile retrievals on real data have thus far been limited and the information content of MODIS channels alone may not be sufficient to gain a clear picture of the vertical variation of droplet size (King and Vaughan, 2012).Nakajima et al. (2010a) showed that the presence of small droplets near to cloud top, presumably caused by evaporation, and/or the presence of a drizzle mode lower in the cloud, could explain the tendency in real data for r 2.1 to be larger than r 3.7 .By comparison to simultaneous observations from the CloudSat Cloud Profiling Radar Nakajima et al. (2010b) showed that r 2.1 could be used to infer droplet growth processes in warm oceanic clouds and the relationship between r 2.1 and r 3.7 can be explained by the stage of droplet growth occurring in the cloud.In particular Nakajima et al. (2010b) postulated that when r 2.1 < 14 µm condensation processes dominate and as r 2.1 increases so too does the gradient of r e with height in the cloud thus resulting in an increase in r 3.7 r 2.1 with r 2.1 due to the differing penetration depths of the two wavelengths.However r 3.7 r 2.1 values of < 1 still dominate in this range which Nakajima et al. (2010b) explain by the presence of small droplets near to cloud top reducing r 3.7 .When r 2.1 exceeds 14 µm Nakajima et al. (2010b) showed that r 3.7 /r 2.1 decreases with increasing r 2.1 as droplet growth by coalescence dominates, with large drizzle and precipitation sized droplets forming lower in the cloud influencing r 2.1 to a greater extent than r 3.7 .
A global study of the differences in MODIS r e retrievals for marine water clouds by Zhang and Platnick (2011) showed that when r 2.1 < 15 µm the difference in retrievals (r 3.7 − r 2.1 ) remains small but becomes increasingly negative when r 2.1 increases.This further supports the idea that formation of drizzle lower in the cloud can have a large influence on retrievals.Zhang and Platnick (2011) also found a large dependence on the homogeneity of the pixel in question suggesting that sub pixel inhomogeneities can influence r e retrievals differently in different channels causing (r 3.7 −r 2.1 ) to be negative for inhomogeneous scenes.They tested these concepts by using cloud fields generated by a large eddy model and 3-D radiative transfer simulations to generate realistic synthetic measurements on which to perform cloud retrievals.Zinner et al. (2010) found that for overcast stratocumulus scenes, biases induced by introducing a drizzle mode and 3-D radiative effects were typically only of the order of a few tenths of a micron.
A clear comparison between satellite retrievals and in situ measurements can therefore only be made when the full vertical extent of the cloud is measured by an aircraft profiling throughout the cloud.One such recent study was presented by Painemal and Zuidema (2011) where in situ cloud measurements were taken from the NSF/NCAR C-130 research aircraft during the Variability of the American Monsoon Systems (VAMOS) Ocean-Cloud-Atmosphere-Land Study Regional Experiment (VOCALS-REx) in the South East Pacific region during October and November in 2008 (Wood et al., 2011).Painemal and Zuidema (2011) compared the effective radius measured near to cloud top in several cases of in situ profile measurements with collocated MODIS overpasses.They found that the MODIS retrievals of r 1.6 , r 2.1 and r 3.7 all systematically overestimated the effective radius values measured near to cloud top by between 15 and 20 %.This result agrees with previous studies (Bréon and Doutriaux-Boucher (2005), Nakajima et al. (1991), Nakajima and Nakajima (1995)) which also suggest a high bias in MODIS r e retrievals in marine stratocumulus regions.Painemal and Zuidema (2011) also found no apparent link between the relative variation of the r e retrievals using the three different near-infrared channels with the vertical variation of r e measured within the cloud.
In this study we present a similar data set from the UK BAe-146 aircraft also taken during the VOCALS-REx campaign.We use eleven cases of in situ profile measurement of the stratocumulus cloud deck with collocated MODIS overpasses to test the ability of MODIS retrievals to accurately represent the droplet sizes within the cloud.In particular we assess whether r 1.6 , r 2.1 and r 3.7 correspond to the droplet sizes at levels within the cloud predicted by radiative transfer theory and use a more accurate knowledge of key ancillary model parameters in an effort to improve the retrievals.In Sect. 2 we introduce the data, instruments and comparison techniques employed before presenting the results in Sect.3. In Sect. 4 we explore potential sources of retrieval error and attempt to improve agreement between in situ and retrieved parameters by performing our own retrievals using a more accurate knowledge of ancillary model parameters.Finally a summary and conclusions are presented in Sect. 5.

In situ measurements
The UK BAe-146 flew a total of 13 research flights during VOCALS-REx carrying a large suite of instruments measuring a range of microphysical and thermodynamic variables.This study focuses on measurements of cloud microphysics using a Droplet Measurement Technologies, Cloud Droplet Probe (CDP) aand a Stratton Park Engineering Company Two-Dimensional Stereo Probe (a 1-D array version used in VOCALS, which will henceforth be referred to as the 2DS in this paper).The CDP is an optical probe that determines the size of droplets by measuring the forward scattering intensity as droplets pass through the sample area of a focussed laser (Lance et al., 2010), measuring the number concentration of cloud droplets in 30 size bins ranging from 2-50 µm diameter.The 2DS is an Optical Array Probe (OAP, Lawson et al., 2006) which utilises a photo-diode array illuminated by a laser.As particles pass through the sample volume they cast shadows on the photo-diode array which are sampled to generate 2-D images and subsequently processed to generate the size distribution (Crosier et al., 2011).The 2DS detector array generates images with 10 µm pixels ranging from 10-1260 µm.
Together the CDP and 2DS measure the full droplet size distribution.We consider the CDP to be the most accurate instrument within its measurement range so the two instruments are combined by utilising all CDP size bins and 2DS bins >50 µm.The size bins of the CDP were calibrated before the campaign and checked before each flight (at two sizes) by injecting glass beads of known, narrow size distributions through the sample area.This was done at six different sizes across the range of the instrument and a linear fit between the manufacturer specified bin locations and the response of the instrument to these known size distributions was performed to ascertain the bin edges.The calibration beads have a given uncertainty which is propagated through the calibration process to calculate an uncertainty on each bin location.The 2DS pixel size was checked using a spinning disc unit and was found to be stable at 10 µm.Images from the 2DS were corrected for miss-sizing (due to being out of focus) using the method of Korolev (2007).
During the campaign the BAe-146 flew a number of flight patterns for a variety of science purposes.The cases studied here are limited to those where the aircraft profiled continuously through the entire depth of the cloud passing through both the cloud base and cloud top.In-cloud measurements were defined when the total liquid water content (LWC) measured by the CDP rose above a threshold value of 0.02 g m −3 .In this way the cloud top and base were defined as the highest and lowest points in the profile at which the LWC was greater than this threshold.The effective radius at each level in the cloud was calculated from the combined CDP and 2DS measurements of number concentration n(r) and by using the  In total eleven complete profiles through cloud taken within one hour of a MODIS overpass were selected; a summary of the profiles used is shown in Table 1.The geographical positions of the profiles used are shown in Fig. 1.Given the emphasis placed in the literature, on the role of large drizzlesized droplets in altering the retrieved effective radii we aim to identify cases where drizzle may play a role by inspection of the 2DS liquid water path (LWP), calculated from droplets of diameter >50 µm.Of the 11 profiles, two clouds had a 2DS LWP >10 g m −2 thus indicating the presence of larger drizzle or precipitation sized droplets which cause r e to increase towards the base of these clouds.In subsequent plots these profiles are plotted in red.These clouds had total LWPs of around 55 g m −2 and effective radii at cloud top >8 µm.A further three clouds had a 2DS LWP in the range 2-6 g m −2 and increasing r e with height (these profiles are plotted in green).These clouds had total LWPs of >90 g m −2 and optical depths >15.The remaining six clouds had a 2DS LWP < 1 g m −2 and tended to consist only of effective radii <8 µm (these profiles are plotted in blue).These clouds all had optical depths < 10 and total LWPs <50 g m −2 .Unfortunately no cloud radar was available on the BAe-146 to confirm the presence of precipitation and instead we assume that droplets of diameter > 50 µm are precipitating and use the 2DS size distribution to calculate the rain rate (RR) of these droplets where, ρ w is the density of water LWC i is the liquid water content in each size bin and V i is the fall velocity of droplets in each size bin calculated from a droplet diameter dependent parametrisation of the data presented in Gunn and Kinzer (1949).For the remainder of the paper we denote clouds with a 2DS LWP in the range 2-6gm −2 as light drizzling and those with a 2DS LWP> 10gm −2 as heavy drizzling.This does not necessarily correlate with the peak rain rates displayed in Table 1 but in terms of the influence on radiation the total vertically integrated liquid water of drizzle sized droplets is more important than the highest value of rain rate seen in a cloud.

Comparison methodology
MODIS cloud products are delivered at a resolution of 1x1 km at nadir whereas the in-cloud horizontal extent of the aircraft profiling measurements ranged from 2-7 km.To compare the aircraft profile to MODIS retrievals an assumption is made that the cloud is horizontally homogeneous over the extent of the profile and in order to remove any biases resulting from comparisons to individual MODIS pixels a 5 × 5 km region of MODIS data is averaged over to compare to the in situ profile.The validity of this comparison and the assumption of horizontal homogeneity was tested by analysing sections of in-cloud flight where the aircraft flew a straight and level trajectory.In total 280 5 km long cases were selected from throughout the VOCALS-REx campaign when the aircraft was in cloud throughout the segment and its altitude did not deviate by more than 20 m.As a measure of the variability of r e and liquid water content (LWC) over the 5km segments we divided the standard deviation by the mean of the measurements contained within each segment to calculate the fractional standard deviation.Histograms of the fractional standard deviations for r e , LWC and number concentration are displayed in Fig. 2 and show that on a 5km scale the effective radius experiences a smaller degree of variability than the liquid water content.The mean standard deviation of r e during these segments was 0.79 µm.These results suggest that the droplet effective radii measured during VOCALS-REx displayed a high degree of horizontal homogeneity.The variability of LWC however is significantly larger suggesting that the total liquid water path and optical depth of the clouds was less homogeneous over the 5x5 km domain size used.The variation of LWC is driven by changes in number concentration whose variability closely matches that of LWC in Fig. 2 and results from inhomogeneous updrafts.
In order to adjust for any time differences between the MODIS image and aircraft profile the mean wind vectors recorded during the in-cloud aircraft profile were used to adjust the central position of the profile.This adjusted profile position was then matched to the MODIS image and pixels in the 5 × 5 km region centred on the adjusted aircraft position were selected for comparison.In all cases used the MODIS cloud fraction in all pixels within the comparison regions were reported as 1 so clear sky contamination should have no impact on retrievals.The MODIS algorithm includes flags for cases of high cirrus and multi-layered clouds.Neither multi-layered clouds or high cirrus were flagged for any of the scenes used.
To compare in situ measured and satellite-retrieved r e an in situ r e from the profile must be selected for comparison to the retrieved value.Painemal and Zuidema (2011) averaged over the 4 measurements closest to cloud top and in order to take a comparable approach to Painemal and Zuidema (2011) we assigned an in situ cloud top effective radius r t by averaging the in situ measurements taken within an optical depth of 1 from the cloud top.Nakajima et al. (2010a) proposed that mixing at cloud top can reduce r e in this region and showed through two layered cloud simulation that this can influence the retrievals of r e .In the cases presented here however the effective radius near cloud top was almost constant with no observed reduction in r e in any of the cases.The r t presented here therefore represents the largest value of r e found throughout the vertical extent of the cloud apart from the two heavy-drizzling cases where r e was found to increase towards cloud base.
Cloud liquid water path and optical depth are vertically integrated quantities.In order to calculate these quantities from the in situ profiles each in situ measurement is used to define a layer of the cloud with a depth defined by the vertical seperation of the measurements.Vertical integration is then carried out by summing the relevant quantities multiplied by the depth of each layer.In situ values of total cloud liquid water path were computed by integrating the combined CDP and 2DS liquid water content over the vertical extent of the cloud.The calculation of in situ τ c was carried out by integration of the volume extinction coefficient calculated for each droplet size using the Mie scattering code included in the Plane-Parallel Spherical Harmonic Discrete Ordinate Method (SHDOMPP) radiative transfer model (Evans, 2007).

Results
The comparison between MODIS-retrieved effective radii and the in situ measured cloud top r e is shown separately for the three MODIS retrievals in Fig. 3.The dashed lines in Fig. 3b represent where the 15 % to 20 % high MODIS bias reported by Painemal and Zuidema (2011) would lie.Whilst the majority of points are found to lie above the 1:1 line for all three MODIS retrievals and a few points lie in the 15-20 % region the majority of r 2.1 points lie below the 15 % bias line.The horizontal error bars in Fig. 3 represent the propagated bin location calibration error of the cloud probes.MODIS r 2.1 retrievals are accompanied by a corresponding retrieval uncertainty product which for the scenes studied varied from 0.7 to 1.5 µm with a mean of 0.94 µm; the r 1.6 and r 3.7 retrievals do not have an uncertainty product.A more appropriate error estimate on the MODIS retrieval was found to arise from the variability of the 25 pixels averaged for this comparison, and the vertical error bars in Fig. 3 are the corresponding standard deviations.In all but two cases this is < 0.5 µm for r 2.1 and r 3.7 .The variability of r 1.6 is slightly larger which is consistent with the fact that small variations in reflectance cause the largest change in r e retrieval in this channel.Both the largest values of r e and the largest biases between retrieved and in situ values are seen in the profiles with 2DS LWPs >10 g m −2 thus indicating that the presence of a drizzle mode in the size distribution may have influenced the retrieval.
The comparison of in situ and retrieved optical depth shown in Fig. 4 displays generally good agreement with the exception of the two drizzling points marked in red.The horizontal error bars again represent the sizing calibration errors propagated to the calculation of optical depth whilst the vertical error bars represent the standard deviations of the 25 τ c retrievals that were used to calculate the mean in each case.That the vertical error bars shown in Fig. 4 are relatively large supports our findings that in situ LWC tended to show a greater degree of horizontal variability than r e .This variability makes the comparison of in situ and retrieved values problematic especially as an aircraft profile always has a significant horizontal extent and therefore several satellite pixels with varying optical depths may be flown through to collect a single profile.
It might appear from Fig. 4 that the presence of drizzle in the two red points causes a large bias in the retrieval of optical depth.This uses a wavelength channel (0.86 µm over ocean) which is dominated by scattering and experiences negligible absorption by liquid water.The scattering properties of non-absorbing wavelengths remain relatively unchanged with the introduction of a drizzle mode to the size distribution (Zinner et al., 2010) and the retrieval of optical depth is largely invariant to changes in the vertical profile of droplet size.It is therefore unexpected that drizzle should influence the retrieval of τ c on the scale suggested by figure 4. Precipitating marine stratocumulus has been associated with open cellular organisation (Feingold et al., 2010) which is inherently less homogeneous than a closed cellular regime and therefore this may contribute either to a bias in optical depth retrieval through 3-D radiative effects or a mismatch between the aircraft observation and satellite pixels.Inhomogeneity may also result from the horizontal extent of the aircraft profile meaning that the aircraft samples differing regions of cloud throughout its profile and thus some of the liquid water path imaged by the satellite may have been missed by the in situ profile.However the minimum MODIS τ c retrieval in the regions are still considerably greater than the suggested in situ values.It is not clear exactly what the reason is behind the large miss matches in these two cases but it may impact on the comparison of r e which also displays a larger than average high bias.
Despite the bad mismatch of the two drizzling cases, Fig. 4 presents no compelling evidence of any systematic bias in the MODIS retrieval.Whilst all but one of the points lie above the 1:1 line in most cases the offset is within the standard deviation of the MODIS τ c retrievals used for comparison.This is thus similar to the findings of Painemal and Zuidema (2011).
The effective radius and optical depth products are used by the MODIS algorithm to calculate the total liquid water path of the cloud according to where ρ w is the density of water and Q e ≈ 2 is the extinction efficiency.This relation stems from the assumption that there is no vertical variation of r e and can therefore lead to both over and underestimates of LWP depending on the droplet profile.In a previous study into the possibility of retrieving a vertical profile of droplet size (King and Vaughan, 2012) we used all the cases of of BAe-146 cloud profiles throughout the VOCALS-REx campaign to calculate the LWP retrieval bias, using Eq. ( 3) and assuming no error in the retrieval of r e and τ c .The majority of profiles observed during VOCALS-REx displayed increasing droplet size from cloud base to top and the theoretical retrievals of r 2.1 tended to correspond to droplet sizes in the upper half of the cloud.This led to an overestimate of LWP ranging from 5-25 % using the MODIS method.An alternative approach to calculating the LWP from r e and τ c based on an adiabatic assumption of linear increase of LWC with height (Wood, 2006) leads to and has been shown to improve the estimation of LWP in overcast stratocumulus (King and Vaughan (2012), Seethala and Horváth (2010)).We employed both equations 3 and 4 (substituting r 3.7 for r t , since r 3.7 corresponds to the retrieval of droplet size nearest to the top of cloud) to calculate the LWP from MODIS retrievals which we then compared with the in situ measured values (Fig. 5).
In the marine stratocumulus cases studied here the droplet size was found to increase from cloud base to cloud top in all but two instances and the retrieval of r 2.1 usually overestimated r e at cloud top.The use of Eq. ( 3) to estimate LWP therefore results in an overestimation compared to the in situ measured values.The use of the adiabatic assumption to calculate LWP brings the in situ and retrieved LWPs into closer agreement but the majority of points still lie above the 1:1 line.This is likely to be in part because r 3.7 tended to overestimate r t and the retrieval of τ c tended to be slightly larger than the in situ measured optical depth, which will be carried through to the calculation of LWP.The drizzling cases (red points) again correspond to large overestimations due to the large offset between in situ and retrieved τ c for these cases.

Aircraft intercomparison
The comparison of in situ and retrieved r e carried out using an analogous method to Painemal and Zuidema (2011) (Fig. 3) suggested a smaller systematic bias in r e retrievals.Given the consistency of the evidence shown by Painemal and Zuidema (2011) and the fact that the study was conducted in the same region and time period as the data presented here it is important to asses whether any differences exists between the measurement techniques and instrumentation used by the two aircraft.During the campaign the two aircraft flew an inter-comparison leg where the C-130 flew approximately 4 minutes ahead of the BAe-146 along flight paths which deviated from each other by less than 800m in the horizontal and 20m in the vertical.Painemal and Zuidema (2011)  2D-C to measure larger droplets where we use a 2DS.Given that in the majority of cases the contribution of droplets larger than the range of the CDP to the effective radius was extremely small we only compared the CDPs of the two aircraft.
The CDP-measured effective radii as a function of longitude and the size distributions averaged over the flight section from the two aircraft are shown in Fig. 6.In order to account for any possible coincidence counting Painemal and Zuidema (2011) compared the CDP LWC measurements to LWC values from a King hot-wire probe on a flight to flight basis and adjusted the size bins of the CDP so that there was no mean bias between the two instruments.For the flight shown in Fig. 6 the C-130 CDP was found to slightly underestimate the LWC compared to the King probe.The CDP bin sizes were therefore increased according to the method of Painemal and Zuidema (2011) to bring the LWC measurements into agreement.The corresponding change in r e for the section displayed in Fig. 6 was always <0.1 µm.The CDP was considered the best instrument for LWC measurements aboard the BAe-146 and therefore no adjustments to size bins were made by comparison to alternative LWC measurements.
The CDP comparison displayed in Fig. 6 displays a systematic bias between the two aircraft with the BAe-146 measuring larger r e than the C-130 with a mean difference of 1.2 µm.The aircraft did not fly exactly the same flight paths and a time difference of between 4 and 5 minutes could allow some change in the cloud between sampling.Similarly the path of the C-130 flying ahead of the BAe-146 could have induced changes in the cloud before measurement by the BAe-146.However the r e measurements in Fig. 6 appear well correlated by longitude (r = 0.96) and figure 6b shows a uniform bias in the size distributions measured by the instruments.This indicates a size bin calibration differ-ence of the order of around one bin width.The calibration 1 sigma uncertainty of the BAe-146 derived r e is shown by the shaded region in Fig. 6; whilst the calibration uncertainty of the C-130 CDP is not known it is likely to be of a similar magnitude and therefore the two platforms agree within their error bounds.There are however additional sources of unknown uncertainty in the potential sampling differences of the aircraft.Lance et al. (2010) showed that the optical model used to adjust the calibration of glass beads to water droplets meant that calibration water droplets were oversized by around 1 µm (when measuring radius) by using a glass bead calibration.Coincidence counting was also shown by Lance et al. (2010) to be important at number concentrations >200 cm −3 however this was not typically the case during VOCALS-REx.The cloud probe sizing calibration uncertainties shown in this study can therefore be considered to be a lower bound on the true uncertainty which may involve contributions from a number of sources.
The apparent difference of ≈1 µm between the CDP measured effective radii is consistent with the MODIS bias displayed in Fig. 3 being smaller than that found by Painemal and Zuidema (2011).It is not our aim to claim the superiority of one data set over another but it is important to note the scale of discrepancy that can occur between instruments flying on different platforms as well as the inherent uncertainties which exist in both data sets.Although the difference between C-130 and BAe-146 CDP-measured r e is significant it is still smaller than the mean bias when compared to MODIS of 2.08 µm reported by Painemal and Zuidema (2011) and the measurements from the BAe-146 shown in Fig. 3 still display a high bias.These results highlight the need to devise calibration methods to reduce systematic discrepancy between probes during field campaigns.

Accounting for the vertical variation of droplet size
Satellite-retrieved effective radii are often interpreted as the r e at cloud top and this is the assumption made in the analysis of Painemal and Zuidema (2011) and that presented in Fig. 3. However this interpretation is not aligned with the predictions of radiative transfer simulations such as those performed by Platnick (2000).Platnick (2000) showed that the retrieved r e depends on the vertical profile of the cloud in question as well as the viewing and solar geometries and wavelength used for the retrieval and whilst r 3.7 should correspond closely to r t , the values of r 2.1 and r 1.6 should be influenced by droplet sizes deeper in the cloud.Platnick (2000) suggested that the retrieved r e is best calculated from the profile by defining a weighting function w λ such that the retrieved effective radius r ret is estimated from The form of w λ found to best agree with synthetic retrievals by Platnick (2000) was a weighting by maximum vertical photon penetrated given by where R(τ c ) is the reflectance of the cloud of total optical thickness τ c and w λ (τ, τ c ) is calculated by starting at cloud top and calculating the change in reflectance dR that results from incrementally adding layers of optical thickness dτ .
A true comparison of in situ measured profiles and satellite retrieval products must therefore account for the vertical variation of droplet size in the profile rather than simply compare it to that at cloud top.To calculate a vertically weighted in situ value with which to compare the satellite retrievals we employ two methods.The first is to use the full in situ measured cloud profile and Eqs. ( 5) and ( 6) to compute a Platnick-weighted retrieval of r 1.6 , r 2.1 and r 3.7 .This is performed by using the SHDOMPP radiative transfer model with a black surface, no atmospheric absorption or Rayleigh scattering and the viewing and solar geometries of the satellite pixel in question.
The second method is to use the in situ measurements and radiative transfer calculations to simulate a synthetic MODIS measurement in the channels required.This synthetic measurement can then be used to perform a retrieval that represents what a cloud retrieval scheme would retrieve if the in situ measurement were considered representative of the true state of the atmosphere and the forward model in the retrieval scheme consistent with the real three-dimensional radiative transfer.The retrieval scheme used to retrieve the cloud parameters is similar to that introduced in King and Vaughan (2012) for retrieving a vertical profile of droplet size.The scheme uses an iterative Bayesian optimal estimation approach to retrieve the parameters that minimise a cost function between the measurement and simulated radiative transfer.The retrieval method used in this study differs from that presented in King and Vaughan (2012) in that the profile of r e is not allowed to vary as a function of optical depth in the cloud and instead a single r e and τ c is retrieved to mimic the set up of the MODIS algorithm.All three nearinfrared retrievals (r 1.6 , r 2.1 , r 3.7 ) are calculated by simulating the measurement and retrieval by integrating the reflection across the MODIS spectral response function in each MODIS band.
A comparison of the results of the two methods in calculating a profile-weighted in situ value of r e for all the cloud cases showed very good agreement (to within 0.1 µm) in all cases and serves as a validation of the Platnick (2000) weighting functions for computing theoretically-retrieved values.For the remainder of this study however we use the simulated retrieval method as this more closely replicates the true physics of the atmosphere and the retrieval process.
A comparison of the MODIS-retrieved effective radii and the in situ simulated retrievals are shown in Fig. 7 and display a larger MODIS high bias than the equivalent comparison of values at cloud top in Fig. 3.This is because in the majority of cases accounting for the vertical profile of r e reduces the in situ values below that at cloud top with the mean differences between weighted and cloud top values of 0.65, 0.6 and 0.34 µm for the 1.6, 2.1 and 3.7 µm channels respectively.This is a subtle difference but it proves to increase the apparent MODIS high bias and suggests that a similar approach using the data from the C-130 would yield a larger high bias than that reported by Painemal and Zuidema (2011).

Resolving the vertical variation of droplet size
Given the difference in vertical weightings between the three near-infrared r e retrievals one could potentially glean information on the vertical variation of droplet size from a comparison of these three retrievals.In the case of monotonicallyincreasing droplet size from cloud base to cloud top, comparisons of r 1.6 , r 2.1 and r 3.7 should reveal an estimate of the magnitude of variation below cloud top and indeed it has been shown that measurements in these bands can be combined to retrieve a linear or adiabatic fit to the droplet size profile (Chang and Li (2002), Kokhanovsky and Rozanov (2012)).Studies examining the differences in near-infrared retrievals have however found unexpected relationships between the retrievals particularly since r 3.7 is overwhelmingly found to be < r 2.1 (Seethala and Horváth (2010), Nakajima et al. (2010a), Zhang and Platnick (2011)).Nakajima et al. (2010a) and Nakajima et al. (2010b) showed that the presence of small droplets near to cloud top caused by evaporation and/or the presence of a large drizzle mode in the droplet size distribution could explain the observations.However comparisons between observed cloud profiles and simultaneous retrievals have thus far been limited to the study of Painemal and Zuidema (2011) who found that the relative relationships between the retrievals did not relate to the vertical structure observed in the clouds.
Figure 8 shows the differences between r e retrievals as a function of MODIS r 2.1 .Dots represent the differences between MODIS retrievals and crosses the differences calculated from the synthetic retrievals carried out in section 3.2 using the in situ profiles.MODIS retrievals of r 3.7 were < r 2.1 in all but three cases whereas the synthetic retrievals from the in situ profiles suggest that r 3.7 should only be < r 2.1 in the two drizzling cases.The expected values of r 3.7r 2.1 are however relatively small (typically <1 µm) which is of the same order as the r 2.1 uncertainty product reported by MODIS which suggests that in these cases the vertical variation of r e in the cloud is not large enough to produce a signal in the differences between r e products larger than the uncertainties associated with the retrievals.
The fact that for the majority of points r 3.7 < r 2.1 when the opposite is expected does however indicate a consistent underlying bias in one or both of the r 2.1 or r 3.7 retrievals.Nakajima et al. (2010b) propose that small droplets at cloud top are the cause of r 3.7 < r 2.1 values even in nonprecipitating clouds whose droplet size increases towards cloud top throughout the rest of the cloud.However in none of the clouds sampled were small droplets measured near to cloud top which appears to rule out this hypothesis.The only profiles where the in situ values suggest that r 2.1 should exceed r 3.7 are the drizzling cases which would appear to suggest a potential for diagnosing drizzle from differences between r e retrievals but this is not borne out in the MODIS data.Nakajima et al. (2010b) suggested that r 2.1 alone could be used to diagnose drizzle with values greater than 14 µm correlating with the presence of collision coalescence processes.Painemal and Zuidema (2011) suggested that this threshold was more like 12 µm in their data and this correlates well with the r 2.1 retrievals for the cases that we have diagnosed as drizzling although the cases we diagnosed as light drizzle have r 2.1 retrievals as low as 10 µm.However given the limited number of data points it is difficult to draw conclusions on the use of r 2.1 to diagnose drizzle.
The differences between r 2.1 and r 1.6 expected from the in situ data are generally smaller than the corresponding differences between r 3.7 and r 2.1 due to the similar vertical weightings of the 1.6 and 2.1 µm band.The MODIS data are however slightly more scattered, largely because of the scattered nature of the r 1.6 product.The majority of MODIS points show that r 2.1 > r 1.6 which is the relationship expected from the vertical structure of the cloud in all but the drizzling cases.
Figure 7 suggests that r 3.7 more closely resembles the vertically weighted in situ effective radii expected than r 2.1 which appears to systematically overestimate its corresponding in situ values.This apparent overestimation of r 2.1 could explain the dominant trend for r 2.1 > r 3.7 ; however assuming that any such bias is consistent the ratios between the retrievals should change with changes in vertical structure/presence of drizzle.This is not immediately obvious from figure 8 but each of the points in Fig. 8 is calculated from the averages of the 25 points that make up each 5x5 km region.We therefore have a total of 275 individual MODIS Fig. 8. Difference between r e retrievals plotted as a function of the MODIS r 2.1 retrievals.Dots represent the differences between MODIS retrievals and crosses represent the differences between synthetic weighted retrievals using the in situ profiles.Fig. 9. Differences between all MODIS r e retrievals used in the comparison analysis.Red represents those pixels which are diagnosed as drizzling by being matched to an in situ profile with a 2DS LWP >2 g m −2 .Blue represents non-drizzling pixels.retrievals which we can bin into drizzling and non drizzling cases using the in situ data and examine if there is a change in the differences between r e retrievals for each case.This assumes that for those in situ profiles where large droplets were detected that this is consistent across the scene.We defined drizzling MODIS pixels as those that were matched to profiles which had 2DS LWP >2 g m −2 .
Histograms of the differences between r e retrievals are shown in Fig. 9 and show that for the majority of retrievals r 3.7 < r 2.1 and r 2.1 > r 1.6 .The expectation is that the presence of larger droplets lower in the cloud in the drizzling clouds will serve to increase the r 2.1 retrievals to a greater extent than r 3.7 and r 1.6 will be increased even further.Some evidence of this can be seen in a very slight shift in the histograms of the differences between r 3.7 and r 2.1 retrievals in Fig. 9 with r 3.7 -r 2.1 tending to be slightly more negative in the drizzling cases.There appears very little difference be-tween drizzling and non-drizzling cases in the difference between r 2.1 and r 1.6 .
The calculated peak rain rates in the drizzling cases are of the same order as in the typical stratocumulus clouds used in a modelling study by Zinner et al. (2010) where peak rain rates of 0.05 mm hr −1 were used.Zinner et al. (2010) reported changes in r 3.7 − r 2.1 of around 0.2 µm when introducing a drizzle model into the cloud which broadly agrees with the small shift between the drizzling and non-drizzling histograms in Fig. 9.This small shift along with the small differences between retrievals shows the difficulty in diagnosing any useful information on the vertical structure or presence of drizzle from the MODIS retrievals for these clouds.The small shift in drizzling cases also appears to rule out the presence of a drizzle mode in the droplet size distribution as the main reason behind the high bias in MODIS r e retrievals.

Sources of error
Given some of the discrepancies between in situ and retrieved values noted in Sect. 3 it is important to analyse these in the context of known potential sources of cloud retrieval errors in an effort to ascertain the possible contributions from these errors.Painemal and Zuidema (2011) identified the variability of droplet size distributions, above cloud water vapour absorption and viewing geometry dependent biases as potential contributors to the discrepancies that they observed.The relative effects of most of these error sources are well documented (eg.Platnick and Valero (1995), Kato and Marshak (2009), Nakajima et al. (2010b), Zinner et al. (2010)).We therefore aim not to repeat these analyses and instead test whether a more accurate knowledge of the width of the droplet size distribution and profiles of water vapour and temperature can improve agreement between retrievals and in situ data.We also assess any dependence of retrieval biases on the homogeneity of the cloud scene in an attempt to ascertain whether three-dimensional radiative effects could explain the apparent bias in r e retrievals.

Improved retrievals using in situ profiles
It has been noted by several authors (e.g.Garay et al., 2008;Harshvardhan et al., 2009;Painemal and Zuidema, 2011 that the MODIS algorithm places the cloud top too high in the presence of large cloud top inversions.The location of the cloud top of stratus clouds is estimated by MODIS from a cloud top temperature retrieved using band 31 (11.1 µm) combined with a corrected temperature profile computed from GDAS data (private communication Richard Frey).The retrieved cloud top temperature is then used to calculate the thermal emission in the 3.7 µm band which must be removed before retrieving r 3.7 from the solar reflection.The cloud top location is used to derive the above-cloud water vapour amount from reanalysis data which is then used to adjust the reflection product for the effects of vapour absorption.
Figure 10 shows the MODIS cloud top pressure and temperature compared to the in situ values.MODIS underestimates cloud top temperature in 7 out of 11 cases with a maximum difference of 3.7 degrees.The resulting estimate of cloud top pressure is consistently around 250 hPa too high in the atmosphere.Depending on the accuracy of the water vapour profile from reanalysis data used this has the potential to underestimate the above cloud water vapour used to correct for absorption.Any biases in cloud top temperature assignment may also influence the thermal correction in the 3.7 µm channel.
Whilst r e is the most important parameter of the droplet size distribution which governs its radiative properties, variations of the width of the size distribution from that assumed in the construction of lookup tables can influence the retrieval of r e (Platnick and Valero, 1995).The MODIS algorithm assumes a uniform lognormal size distribution with a standard deviation (σ ) of 0.32 when calculating lookup tables.In general if the actual σ of the size distribution is smaller than the lookup table value the resulting retrieval of r e increases (Chang and Li (2001), Painemal and Zuidema (2011)).By performing least square fits of a lognormal distribution to that measured by the CDP (and therefore representing the width of the cloud droplet distribution and not accounting for any drizzle mode) we found that σ was mostly < 0.32 with a mean value from all the profiles analysed of 0.26.
Given the available data from the in situ cases, a more accurate estimate of the profiles of water vapour and temperature, the position of the cloud in the atmosphere and shape of the droplet size distribution is available for use in a retrieval.We use this ancillary data in combination with the MODIS measured reflectance (Level 1B product) to perform our own retrieval for each MODIS pixel used in the comparison process.For each in situ profile case we have 25 MODIS pixels on which to perform a cloud retrieval using estimates of ancillary parameters from in situ data.To perform the retrievals we employ the same optimal estimation retrieval scheme described in section 3.2.Whilst computationally more expensive the use of an optimal estimation scheme allows us to specify a unique atmospheric state for each retrieval case rather than having to adjust a pre-calculated library of standard lookup tables.For each retrieval case the forward model is set up assuming a vertically uniform cloud with a lognormal droplet size distribution and σ equal to the mean in situ measured value for the given profile.
A profile of water vapour mixing ratio is estimated by using the dew point temperature measured by a General Eastern Chilled Mirror Hygrometer instrument aboard the BAe-146.Simultaneous measurements of temperature and pressure are then used to convert the dew point temperature measurements to a mass mixing ratio of water vapour.Whilst this gives a reasonably accurate knowledge of water vapour amount in the region of the atmosphere profiled by the aircraft, estimations must be used for the rest of the vertical extent of the atmosphere.The VOCALS region is characterised by a strong temperature and humidity inversion at cloud top with a longitudinal gradient in both the inversion height and the moisture above the inversion (Bretherton et al., 2010).For many of the profiles the aircraft ascended or descended significantly beyond the cloud top/base, in which case we used these measurements to specify the water vapour profile for the full extent of the aircraft profile.Beyond the extent of the measurements available in each case we used longitude binned mean measurements from profile cases where the aircraft ascended significantly above the cloud layer.These profiles cover the region of the atmosphere which contains the vast-majority of the water vapour profile and above this we assumed a standard tropical atmosphere.To calculate the absorption from the water vapour profile the Reference Forward Model (RFM) (http://www.atm.ox.ac.uk/RFM/) is used to calculate the optical depth due to vapour absorption of several layers of the atmosphere at the wavelength required.These layers are then added to the forward model of the retrieval scheme so that for each retrieval the best water vapour absorption information available is used.
In a method analogous to the MODIS algorithm we performed three retrievals for each pixel by combining the reflectance product in the 0.86 µm channel with that in the 1.6, 2.1 and 3.7 µm channels.The 3.7 µm channel however does not have an associated reflectance product and is reported as a calibrated radiance value due to the fact that the MODIS instrument does not feature a solar diffuser in this channel.The radiance in the 3.7 µm channel includes significant contributions from both solar reflectance and thermal emission.Using an optimal estimation technique the thermal and solar components do not need to be separated, instead a surface temperature and temperature profile is used by the forward model to include thermal contributions to simulated measurements.In order to model the solar component of the ra-diance a reference value of the top of atmosphere solar irradiance is calculated from the parametrisation of Platnick and Fontenla (2008) and used as input to the forward model.
Including an accurate temperature profile in the forward model hinges largely around diagnosing cloud temperature and the location and magnitude of the inversion at cloud top.Since all the profiles used include measurements of the whole cloud and cloud top region this is accurately measured.Above the inversion the measured lapse rate is extrapolated to higher altitudes and then a standard tropical temperature profile used.The surface temperature used in the forward model is set according to the Reynolds sea surface temperature (Reynolds et al., 2002) which was shown by Bretherton et al. (2010) to be in close agreement with C-130 downward-looking radiometric temperature measurements during VOCALS-REx.
The results of these retrievals averaged over each comparison domain are plotted against the corresponding MODIS retrievals in Fig. 11 and show a decrease in r e compared to the MODIS values.The mean decrease in r 1.6 , r 2.1 and r 3.7 is 0.11, 0.23 and 0.30 µm respectively.Figure 11 shows that the reduction in retrievals is to some extent correlated with r e which is consistent with changes in the retrieval due to assigning a narrower droplet size distribution (Chang and Li, 2001).This effect is also expected to increase with wavelength which could explain the larger difference in the r 3.7 and r 2.1 retrievals.Equally this could be explained by an increase in the water vapour absorption used in the retrieval as the effects of vapour absorption are greatest in the 3.7 µm channel and least in the 1.6 µm channel.Whilst these reductions in r e retrievals improve the comparison to in situ values to some extent (Fig. 12) the changes are within the approximate retrieval uncertainties and they do not remove the apparent biases particularly in r 1.6 and r 2.1 .Additionally the larger reduction in r 3.7 compared to the other channels increases the tendency for r 3.7 to be < r 2.1 .
We also used the MODIS measurements to perform profile retrievals using the full retrieval algorithm described in King and Vaughan (2012).These retrievals use the measurements in all the MODIS cloud retrieval bands and knowledge of their penetration depths to estimate the effective radius at cloud top and cloud base assuming an adiabatic profile of linearly increasing liquid water content.The retrieval system relies on a monotonically increasing or decreasing droplet size with optical depth.Whilst this assumption holds in the in situ profiles as we have previously shown, in most cases r 3.7 < r 2.1 and r 2.1 > r 1.6 .Therefore the signal from the reflectance measurements suggests that droplet size increases and then decreases from cloud base to cloud top, a profile shape which the retrieval algorithm is unable to retrieve.This results in retrievals which either fail to converge on a solution or produce results which bear little resemblance to the in situ profiles, thus providing further evidence that the information content of the MODIS bands is not sufficient to retrieve a vertical profile of r e .

Cloud homogeneity
Another important source of uncertainty in the cloud retrieval problem stems from the contribution of three dimensional radiative effects that result from deviations of the cloud field from the standard plane-parallel geometry assumed in the forward model.Given the lack of knowledge of the full 3-D structure of the cloud this is very difficult to account for in the retrieval process but its possible influence on retrievals should be considered when interpreting comparisons to in situ data.Several studies have examined the influence of both sub pixel and surrounding pixel inhomogeneities on effective radius retrievals (e.g.Marshak et al. (2006), Vant-Hull et al. (2007), Zinner et al. (2010), Zhang and Platnick (2011), Zhang et al. (2012)) which have highlighted the potentially large influence of 3-D effects.Marshak et al. (2006) showed that non-linearities between reflectance and r e can mean that averaging quantities over a certain domain size does not necessarily return the true average.In particular it was shown that pixels which due to 3-D effects were dimmed compared to their plane-parallel equivalents increase r e retrievals more than pixels brightened by the same amount tend to decrease the retrieval of r e .When averaging r e retrievals over a region such as the 5 × 5 km area chosen in each comparison case this can therefore lead to a high bias in the domain average.To examine the possible influence of this effect we compared each of the 275 MODIS pixels used in the comparison analysis directly to their matched in situ values (Fig. 13) comparing both to the cloud top and vertically weighted in situ values.If we are to assume that all MODIS pixels see the in situ effective radius profile to which they are matched but that 3-D effects serve to dim half the pixels below their plane-parallel equivalent and brighten the other half by equal and opposite amounts then the analysis of Marshak et al. (2006) would suggest that the resulting r e retrievals would be positively skewed around the true in situ value.The asymmetric shape of the histograms in Fig. 13 appear to support this idea in that r e retrievals larger than the peak of the distribution tend to be further from the peak than those that are smaller.This serves to high bias the mean differences but it cannot explain the entire discrepancy as the fraction of points where the MODIS retrieval was less than the vertically weighted in situ value was only 2, 0.4 and 8 % for the r 1.6 , r 2.1 and r 3.7 retrievals respectively.The mean difference between MODIS and vertically weighted in situ values was 1.73, 1.81 and 1.31 µm for r 1.6 , r 2.1 and r 3.7 whilst the median values were 1.57, 1.64 and 0.87 µm.When compared to the in situ values at cloud top the mean differences were 1.04, 1.23 and 0.97 µm and the medians 0.98, 0.92 and 0.56 µm.
The MODIS 0.86 µm channel reflectance product (R 0.86 ) used for optical depth retrievals is collected on 250x250m pixels at nadir resolution, aggregated to 1 × 1 km for use in cloud retrievals.Each pixel used for cloud retrievals therefore consists of 16 R 0.86 pixels which can be used to assess the sub pixel homogeneity of the 1 × 1 km size pixel.Zhang and Platnick (2011) found that the r 1.6 and r 2.1 retrievals increase sharply with a sub-pixel homogeneity index defined as σ sub = stdev(R 0.86,250 m ) mean(R 0.86,250 m ) (7) whereas r 3.7 displayed no such behaviour.It was shown by Zhang and Platnick (2011) that because the retrieval of r e and τ c is not orthogonal, combining regions of different optical depth and uniform r e into a 1km pixel can result in a bias of the retrieval of r e .This problem is at its worst for r 1.6 and r 2.1 retrievals of optically thin clouds.The more absorbing properties of the 3.7 µm band however results in a near orthogonal retrieval thus minimising the influence of sub-pixel inhomogeneity.We showed in Fig. 2 that the variability of LWC (and by extension LWP and optical depth) over 5km regions was larger than that of r e and therefore sub-pixel inhomogeneities of τ c could be an influential mechanism in the clouds used in this study.We note however that the same measure of homogeneity presented in Fig. 2 taken over a 1km scale as opposed to a 5km scale is reduced and therefore inhomogeneities on a scale larger than a 1 km MODIS pixel may equally be influential.Horizontal photon transport across pixel boundaries can influence r e retrievals when surrounding pixels are significantly different to the pixel under observation.To diagnose any possible effects from surrounding pixels and to add a further metric by which cloud heterogeneity can be assessed we define a surrounding pixel homogeneity metric σ sur equivalent to σ sub except that the reflectance values used to calculate the surrounding pixel index are the nine 1km pixels that make the 3 by 3 pixel box centred on the pixel in question.
Histograms of σ sub and σ sur for all 275 MODIS pixels that went into the comparison analysis are shown in Fig. 14.Zhang and Platnick (2011) found little influence on the differences between r e retrievals until σ sub > 0.3.Only seven points used in our comparison analysis had values of σ sub > 0.3 and 10 values had σ sur > 0.3.Interestingly these pixels with either high σ sub or σ sur all correspond to pixels whose difference between retrieved and in situ matched values exceed 3 µm.However given that over 80 % of pixels have both σ sub < 0.1 and σ sur < 0.1 we do not have enough heterogeneous pixels to draw strong conclusions.For pixels with heterogeneity indexes < 0.3 there was no dependence of the differences between retrieved and in situ values on either σ sub or σ sur.This further suggests that cloud heterogeneity and 3-D radiative effects alone can not explain the biases displayed in Fig. 13.

Summary and conclusions
In situ cloud probe measurements taken during ascending and descending profiles through marine stratocumulus during VOCALS-REx allowed us to asses the quality of collocated MODIS cloud retrievals.We used a total of eleven cases with varying amounts of larger precipitation sized droplets to compare optical depth and the three MODIS effective radius retrievals to the in situ measurements.We found that all three r e retrievals tended to overestimate the effective radii measured near to the cloud top with the largest mean overestimation in the retrieval of r 2.1 .Retrievals of optical depth were found to vary more within the 5x5 km regions used to compare to the in situ measurements than the r e retrievals.This aligned with evidence from straight and level in situ measurements which suggested that liquid water content was less homogeneous than effective radius over a 5km scale.Consequently the comparison of optical depth retrievals were considered less reliable but displayed little evidence of a systematic bias when compared to in situ measurements.Two cases which we denoted as drizzling courtesy of their relatively high liquid water paths from droplets larger than 50 µm diameter displayed particularly large overestimates of optical depth from MODIS retrievals.
Estimations of the total liquid water path calculated from the r e and τ c retrievals also tended to overestimate the in situ measured values.This was in part because overestimations of r e and τ c were propagated through to the calculation of LWP but also because the assumption of a vertically uniform droplet size tends to lead to an overestimation of LWP in clouds whose droplet size increases from cloud base to cloud top.Using an adiabatic assumption to calculate LWP from r e and τ c retrievals reduced the overestimation.
These results broadly agreed with those of Painemal and Zuidema (2011), who conducted a similar study using data from a different aircraft platform.However Painemal and Zuidema (2011) reported a larger overestimation of r 2.1 than we found and an intercomparison leg of the two aircraft suggested a discrepancy between cloud probe measured effective radius of around 1 µm.This highlighted the uncertainty in cloud probe calibration as a possible contributor to apparent biases in MODIS retrievals.
We also used the full in situ measured vertical profile to calculate the theoretical values of r 1.6 , r 2.1 and r 3.7 which should be retrieved given the vertical variation of droplet size found in the cloud and the penetration depths of photons in each wavelength channel.This highlighted the fact that the small differences between vertically weighted synthetic retrievals and the values at cloud top in most cases served to increase the discrepancy between in situ and retrieved values.Agreement was on average worst in the 2.1 µm channel with a mean high bias of 1.81 µm.
We also examined the relationship between r 1.6 , r 2.1 and r 3.7 retrievals and sought to establish whether the differences between them could be explained by the vertical structure of the clouds observed through in situ measurement.For the majority of MODIS retrievals r 3.7 was < r 2.1 although differences were typically < 1µm.Nakajima et al. (2010a) and Nakajima et al. (2010b) have previously explained this relation by the presence of a thin layer of small droplets near to cloud top and in clouds with r 2.1 > 14 µm the presence of larger drizzle sized droplets lower in the cloud.In none of the cases examined in this study was there any evidence of small droplets near to cloud top which would cause the reduction in r 3.7 hypothesised by Nakajima et al. (2010a).We examined retrievals corresponding to those clouds in which the 2DS suggested the presence of some larger droplets separately to those with no evidence of drizzle and found a small shift in the difference between r 3.7 and r 2.1 for the cases denoted as drizzling.This suggests that some element of the vertical structure in drizzling clouds is borne out in MODIS retrievals but cannot explain the tendency for r 3.7 < r 2.1 in non-drizzling cases.This relationship between r 3.7 and r 2.1 and the lack of small droplets at cloud top is further evidence of a high bias in MODIS r 2.1 retrievals.
Comparisons between the three near-infrared retrievals also highlighted that despite significant vertical variation of r e in the in situ measurements, synthetic retrievals based on these measurements displayed differences between r 1.6 , r 2.1 and r 3.7 of typically <1 µm.This is of the same order of the expected uncertainty in each retrieval product and indicates that the signal in the measurements at different wavelengths due to vertical structure is comparable or smaller to the noise due to measurement and modelling uncertainty.This is in agreement with the findings of King and Vaughan (2012) where we showed that modelling error in a retrieval algorithm must effectively be negligible for useful information on the vertical variation of r e to be gleaned from MODIS channels alone.The apparent biases in MODIS retrievals with different magnitudes in different channels displayed in this study further suggests that the accuracy of the measurement and modelling in the retrieval algorithm is not sufficient to resolve any useful information on the vertical structure of r e .
Armed with a more complete knowledge of the true profile of water vapour and temperature and the shape of the droplet size distribution from in situ measurements we tested whether setting these parameters more accurately in a retrieval algorithm served to improve agreement between retrieved and in situ values.We performed our own retrievals on each of the MODIS pixels that were matched to in situ measurements and used a combination of the BAe-146 measured temperature and moisture profiles and mean longitude binned profiles measured during the campaign.We found some reduction in the r e retrievals which were reduced more with increasing r e and wavelength channel.However even in the retrieval of r 3.7 this change was only 0.3 µm and did little to reduce the bias in r 1.6 and r 2.1 retrievals.
Finally we attempted to diagnose the potential influence of inhomogeneities in the clouds and subsequent 3-D radiative effects on the retrievals.We examined the sub-pixel and surrounding pixel inhomogeneity indexes which gave a measure of the variation in reflectance within each MODIS pixel and surrounding each MODIS pixel respectively.Although the few cases which had the largest indicators of inhomogeneity correlated with some of the largest differences between in situ and retrieved values the vast majority of pixels used in this study were classed as homogeneous and no correlation was found between either the difference between retrievals and in situ values or the difference between the three nearinfrared retrievals with either homogeneity index.
The shape of the histograms of the differences between retrieved and in situ matched effective radii indicate that potential 3-D effects which serve to increase the retrieval do so by a larger amount than those that decrease it.This is in support of the analysis of Marshak et al. (2006) and displays the fact that a high bias could be introduced to mean values in such comparisons.This is further displayed by the fact that the median differences between retrieved and in situ values were smaller than the means, thus highlighting that simple averaging over a retrieval domain must be treated with care.However any such effect is not large enough to explain the whole discrepancy as the peak of each histogram is significantly biased from zero and very few pixels were found to retrieve an effective radius smaller than its corresponding in situ value.
All variables analysed were also tested for their dependance on solar and viewing geometries.However no correlations were found which could help explain any of the observed discrepancies.
This study has added to the body of evidence (Bréon and Doutriaux-Boucher (2005), Nakajima et al. (1991), Nakajima and Nakajima (1995), Painemal and Zuidema (2011)) suggesting that the standard retrieval approach overestimates r e in marine stratocumulus beyond the extent that can be explained by known uncertainties of the cloud retrieval problem.The results suggest that r 3.7 offers the best agreement with corresponding in situ data but since no explanation for this is readily available further exploration is required before blindly accepting r 3.7 as the best retrieval.Given the slightly different nature of the 3.7 µm channel in that it includes a thermal component and has no on board measurement of downwelling solar irradiance it is plausible that these factors result in a difference in retrieval bias.Platnick and Fontenla (2008) highlighted the uncertainties involved in estimating the solar irradiance in the 3.7 µm channel given the lack of available observations and showed that using different estimations of this quantity can result in around a 5 % shift in r e retrievals.A shift of this order for instance could significantly change the relationship between r 2.1 and r 3.7 examined in this study.
Given the inherent difficulties of matching simultaneous airborne and satellite observations and the relatively few studies of this nature currently in the literature, many further campaigns, with carefully calibrated cloud probes, in different regions and cloud regimes would be needed to fully quantify any biases.The indications from studies so far however suggest a bias in satellite observations of r e similar to the 15-20 % reduction in global low cloud mean droplet radius suggested by Slingo (1990) that would offset the radiative effects of a doubling of carbon dioxide concentration, thus highlighting the importance of verifying satellite based cloud retrievals.

Fig. 1 .
Fig. 1.Geographical positions of the BAe-146 profiles used to compare to MODIS retrievals

Fig. 2 .
Fig. 2. Histogram of the fractional standard deviation of r e , LWC and number concentration measured during 280 5km long straight and flight level segments.

Fig. 3 .Fig. 4 .
Fig. 3. Comparison of r e retrieved by MODIS using 1.6, 2.1, and 3.7 µm channels (a,b and c respectively) and mean in situ measured r e in the top layer of the cloud within an optical depth of 1 from cloud top.Vertical error bars represent the standard deviation of the MODIS r e retrievals within the 5 km × 5 km domain used for the comparison.Horizontal error bars represent the propagated bin location calibration error.Red points are clouds whose 2DS LWPs were >10 g m −2 , green points have a 2DS LWP in the range 1-10 g m −2 and blue points have a 2DS LWP <1 g m −2 .The dotted lines in (b) represent the 15-20 % bias reported byPainemal and Zuidema (2011)

Fig. 5 .
Fig. 5. Comparison of LWP retrieved by MODIS and measured in situ.Dots represent LWP retrievals using the standard assumption of a vertically constant profile crosses represent estimates of LWP using an adiabatic assumption.Red points are clouds whose 2DS LWPs were >10 g m −2 , green points have a 2DS LWP in the range 1-10 g m −2 and blue points have a 2DS LWP <1 g m −2

Fig. 6 .
Fig. 6.CDP-measured effective radius (a) and averaged size distribution (b) during a comparison leg between the BAe-146 and C-130 aircraft.

Fig. 7 .
Fig. 7. Same as figure 3 except the in situ r e is calculated by synthesising a measurement from the full vertical profile and performing a theoretical retrieval.

Fig. 10 .
Fig. 10.Comparison of MODIS and in situ cloud top temperatures and pressures.

Fig. 11 .
Fig.11.Mean effective radii retrievals using an optimal estimation retrieval with in situ knowledge of water vapour and temperature profiles and against the equivalent MODIS retrievals.

Fig. 12 .
Fig. 12.Comparison between retrievals using an optimal estimation retrieval and the in situ vertically weighted values (dots and solid error bars).The corresponding MODIS retrievals are also shown with crosses and dashed error bars.

Fig. 13 .
Fig. 13.Differences between r e retrievals for all MODIS pixels used in the comparison analysis and their corresponding in situ matched values.

Fig. 14 .
Fig. 14.Sub pixel and surrounding pixel homogeneity indices of all MODIS pixels used in the comparison analysis.

Table 1 .
Summary of profiles used.January 2008, effective radii are in µm, LWPs are in g m −2 , RR is in mm hr −1 , number density is in cm −3 , θ sol is the solar zenith angle, θ sen is the MODIS sensor zenith angle.