Global retrieval of ATSR cloud parameters and evaluation (GRAPE): dataset assessment

. The Along-Track Scanning Radiometers (ATSRs) provide a long time-series of measurements suitable for the retrieval of cloud properties. This work evaluates the freely-available Global Retrieval of ATSR Cloud Parameters and Evaluation (GRAPE) dataset (version 3) created from the ATSR-2 (1995–2003) and Advanced ATSR (AATSR; 2002

in GRAPE due to poorly-resolved inversions in the modelled temperature profiles used.Global cloud fields are compared to satellite products derived from the Moderate Resolution Imaging Spectroradiometer (MODIS), Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) measurements, and a climatology of liquid water content derived from satellite microwave radiometers.In all cases the main reasons for differences are linked to differing sensitivity to, and treatment of, multi-layer cloud systems.The correlation coefficient between GRAPE and the two MODIS products considered is generally high (greater than 0.7 for most cloud properties), except for liquid and ice cloud effective radius, which also show biases between the datasets.For liquid clouds, part of the difference is linked to choice of wavelengths used in the retrieval.Total cloud cover is slightly lower in GRAPE (0.64) than the CALIOP dataset (0.66).GRAPE underestimates liquid cloud water path relative to microwave radiometers by up to 100 g m −2 near the Equator and overestimates by around 50 g m −2 in the storm tracks.Finally, potential future improvements to the algorithm are outlined.
A. M. Sayer et al.: Assessment of GRAPE cloud products 1991;Hansen et al., 1997;Stephens, 2005;Stevens and Feingold, 2009).An accurate knowledge of cloud coverage and properties is therefore important to understanding climate.For the last several decades satellite remote sensing has been able to provide a global view of clouds, complemented by ground-based observations over smaller areas.The different measurement techniques used by the assorted sensors each have their own advantages and disadvantages.
There is a long time-series of cloud properties derived from passive visible and infrared (IR) imaging instruments, including the Advanced Very High Resolution Radiometers (AVHRR; Rossow and Schiffer, 1991;Heidinger and Pavolonis, 2009 and others), the High Resolution Infrared Sounders (HIRS; Wylie et al., 1994), the Along-Track Scanning Radiometers (ATSR; Muller et al., 2007;Poulsen et al., 2011), the Moderate Resolution Imaging Spectroradiometers (MODIS; Platnick et al., 2003;Minnis et al., 2008), the Multiangle Imaging Spectroradiometer (MISR; Moroney et al., 2002) and the Spinning Enhanced Visible and Infrared Imager (SEVIRI; Siddans et al., 2009).Visible and nearinfrared (nIR) measurements from these instruments are used to to derive cloud optical depth and particle size, while height information is provided through thermal IR brightness temperatures, parallax methods, or the CO 2 -slicing technique.New algorithms to retrieve cloud-top pressure from oxygen A-band measurements from the Medium Resolution Imaging Spectrometer (MERIS; Preusker and Lindstrot, 2009) and Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY; Kokhanovsky et al., 2007) have also been developed.These imaging instruments offer good spatial coverage, although are generally forced to assume single-layer plane-parallel clouds, leading to difficulties in multi-layer cloud systems.
A similarly-long time series of passive microwave radiometers such as the Special Sensor Microwave/Imagers (SSM/I), Advanced Microwave Scanning Radiometer (AMSR), and Tropical Rainfall Measurement Mission (TRMM) Microwave Imager (TMI) provide information on cloud cover and liquid water content (O'Dell et al., 2008).Penetration of microwaves through ice clouds ameliorates some of the difficulties faced by visible/IR radiometers.They are, however, limited to water clouds over oceanic regions.
Limb-viewing instruments such as the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS; Hurley et al., 2009) have high sensitivity to optically thin clouds but provide limited horizontal resolution or tropospheric information.
Active sensors such as the CloudSat Cloud Profiling Radar (CPR; Stephens et al., 2008) and Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP; Winker et al., 2007;Chepfer et al., 2010) are able to provide high-resolution height-resolved information on cloud (and aerosol) properties although coverage is limited to the sub-satellite track and the time series is, at present, short.
The Oxford-Rutherford Appleton Laboratory (RAL) Aerosol and Clouds (ORAC) retrieval algorithm to derive cloud properties from measurements from ATSR-2 aboard the satellite ERS-2 is presented by Poulsen et al. (2011).The algorithm has been applied to the ATSR-2 record from 1995 to 2003 to create the Global Retrieval of ATSR Cloud Parameters and Evaluation (GRAPE) dataset.This is freely available from the British Atmospheric Data Centre (BADC), along with a product user guide, and can be accessed at http://badc.nerc.ac.uk/browse/badc/cwvc/ data/grape/arc/v3.GRAPE data from ATSR-2 have been used in several studies examining the impacts of aerosols on cloud properties (Bulgin et al., 2008;Quaas et al., 2009;Sayer and Grainger, 2010), which is an area of strong current interest (Stevens and Feingold, 2009 provide a recent review), due to considerable uncertainty on the strengths and mechanisms of the effects (Forster et al., 2007).
The dataset has recently been extended using measurements from the Advanced ATSR (AATSR), from July 2002 onwards (currently available until the end of 2009).This provides over fourteen years of data processed with a consistent algorithm, including almost one year of overlap between ATSR-2 and AATSR.This is longer than that available from the MODIS sensors, without the orbital drift issues affecting some of the AVHRRs, and will be continued into the future through the forthcoming ATSR-derived Sea and Land Surface Temperature Radiometer (SLSTR), scheduled to launch in 2012-2013.
Although others have made use of the dual-viewing capabilities of the ATSRs to derive cloud-top height through stereo matching (Muller et al., 2007), the version of ORAC applied in GRAPE only uses the nadir view of the instruments.This is due to the additional geometrical complexity of using a dual-view technique for the retrieval of cloud properties other than height.A single view is generally sufficient for this purpose.However, methods for utilising both views of the ATSRs for cloud retrievals are in development for future versions of ORAC.
The directly-retrieved parameters provided by the ORAC scheme are the cloud optical depth (COD, reported as a base-10 logarithm and referenced to 0.55 µm), the cloud effective radius (CER), cloud-top pressure (CTP), surface temperature (although this provides little improvement on the a priori value) and fraction of the 3 km×4 km retrieval pixel (comprised of 12 instrument pixels) covered by cloud.Additionally, the cloud phase (water or ice) is retrieved.Two auxiliary datasets are ingested.The first consists of atmospheric profile data from the European Centre for Medium-Range Weather Forecasting (ECMWF); the ERA-40 dataset (Uppala et al., 2005) is used up to August 2002, and the operational reanalysis (ECMWF, 2008) afterwards.The second is surface white-sky albedo from MODIS (Wanner et al., 1997;Schaaf et al., 2002 and others).The retrieved CTP is also provided converted to cloud-top temperature (CTT) and cloud-top height (CTH).Additionally, the information Atmos.Chem.Phys., 11, 3913-3936, 2011 www.atmos-chem-phys.net/11/3913/2011/ on phase, COD and CER is used to calculate the cloud water path (CWP).For cloud-free scenes, the GRAPE dataset includes an aerosol retrieval.This is described by Thomas et al. (2009) and Thomas et al. (2010), and is not further considered here.A summary of available cloud properties and relevant notation is provided in Table 1.This work concerns an evaluation of version 3 of the GRAPE cloud products through an intercomparison with ground-based and other satellite datasets.This provides an independent verification of the data.Generally, the GRAPE data are compared on a parameter-by-parameter basis with the sensor(s) best-suited to deriving the cloud parameters in question.The paper focusses on ATSR-2 data; this was the initial application of the ORAC algorithm and the data have been available longer, and are more widely-used as a result.Further, ATSR-2 is thought to be the better-calibrated of the sensors (D. Smith, personal communication, 2010).ATSR-2 and AATSR are almost identical instruments and both are processed with the same retrieval algorithm.The consistency between the ATSR-2 and AATSR sensors is examined using the year of overlapping data.

Length of record
The ATSR-2 portion of the GRAPE dataset extends between June 1995 and June 2003.The period from January to June 1996 is absent from the record due to a temporary failure of the ATSR-2 instrument in December 1995.The ERS-2 satellite suffered from data-downlinking restrictions, one effect of which was that the visible channels over the ocean operated in a narrow-swath mode.Another impact was that, for parts of the mission, there was a data gap where no measurements were available between approximately 80 • -90 • W and 30 The maximum number of level 2 (individual orbit) products available per month is approximately 440; the average number available in GRAPE is 392 for ATSR-2 and 393 for AATSR.Processing of an orbit may fail for reasons including missing level 1 files (satellite measured radiance) or ancillary data needed by the retrieval.Additionally, intermittent outgassing has been performed throughout the missions to remove contaminants which condense on to the instrument.During these outgassing periods (lasting for several days each) no data are collected from the visible channels, meaning the retrieval cannot be performed.Monthly aggregated products (known as level 3) are not currently available, but will be generated according to the recommendations outlined in this work.

Retrieval cost
The GRAPE algorithms use the optimal estimation (OE) methodology described by Rodgers (2000); this provides a robust statistical approach to retrievals, with several advantages: 1. Simultaneous utilisation of information from all measurements, for all retrieved quantities which are sensitive to them, ensures that the retrieval is physically selfconsistent.In the case of GRAPE, top-of-atmosphere (TOA) reflectances or brightness temperatures from ATSR bands centred near 660 nm, 870 nm, 1.6 µm, 10.8 µm, and 12.0 µm are used.Datasets where different parameters are retrieved independently from different measurements have been shown to not always provide consistency when properties derived from one part of the spectrum are used to predict radiances in another (Ham et al., 2009).
2. A priori information, where available, can be included in a statistically-robust manner.
3. Error propagation enables estimates of the products' uncertainty, and a measure of how consistent the retrieved state is with the measurements and any a priori data used (the "cost function"), for each retrieval.
The application of OE to the ORAC cloud retrieval is detailed in Poulsen et al. (2011).The basic principle is to maximise the probability of the retrieved state x based on the value of the measurements y m and a priori information x a , with associated Gaussian covariances S y (combination of measurement and forward model uncertainty) and S a (a priori uncertainty).Following Rodgers (2000) the maximum probability is given for the minimum of J , the retrieval cost: The terms present in the equation represent weighted deviations from measurements and the a priori state, where n y denotes the number of measurements.Here y(x) refers to the measurements predicted by the forward model from the current value of the state vector.Measurement and a priori uncertainties in GRAPE are described by Poulsen et al. (2011).If, at the solution, none of the measurements deviate from the calculated values by significantly more than their expected noise, and the state has no significant a priori constraints, then J will be of order 1.However, in this case two of the state elements (f and T s ) have significant a priori constraints, so a more realistic expectation of J is 1.4 (1+2/n y ), and over a large dataset the distribution of J should approximate a χ 2 distribution with 1.4 degrees of freedom.
Cumulative frequency distributions of J from all converging ATSR-2 cloud retrievals are shown in Fig. 1, along with that of the theoretical χ 2 distribution.The theoretical curve shows that over 99% of retrievals where the retrieved state is consistent with the measurements should converge with J ≤10.This is a sensible threshold to pick to ensure only the highest-quality retrievals are considered.Over land, approximately 80% of water clouds, and 85% of ice clouds, converge with this cost; over sea, the proportion is 70% of water clouds, and 55% of ice clouds.This cost is also around the knee of all the distributions, showing that the forward model is appropriate for the bulk of attempted retrievals, although there are a significant number of outliers (causes including multi-layer cloud scenes, mixed cloud and high aerosol loading, highly inhomogeneous scenes where 3-D cloud effects may be important, or regions where the a priori or auxiliary data are not appropriate).Such conditions will pose a problem for any retrieval of cloud properties from satel- lite radiometers.A strength of OE is that it provides the cost as a goodness-of-fit statistic to check the consistency of the retrieved state with the measurements and identify poorlyretrieved scenes.
Retrieval cost is used in GRAPE to define retrieval quality flags (QF), provided in the level 2 files.These range from 0 (lowest quality) to 3 (highest quality).The cost thresholds for each are shown in Table 2; from this and Fig. 1 it follows that the majority of converging retrievals are assigned a flag of 3. From hereon, unless stated otherwise only retrievals with QF = 3 are considered, and this is recommended for general use of GRAPE data.

Known issues and recommendations for use
Aside from cases of a poor forward-model fit indicated by a high retrieval cost, there are other known performance issues with the retrieval.The quality flag will be updated to reflect these in future versions of the dataset, but for the moment users are advised to note the following caveats.

State limits
The permitted limits of log 10 τ c , r eff , and p c have been chosen to encompass the range of expected cloud properties (Poulsen et al., 2011).Points lying exactly on state limits should be treated as suspicious and may be indicative of a problem with the retrieval pixel.Therefore it is recommended that, for general use, these retrievals be discarded.The ranges for state variables are −0.301≤ log 10 τ c ≤ 2.408, 1 µm ≤ r eff ≤ 23 µm and 440 hPa ≤ p c ≤ 1000 hPa for water clouds, and −0.301 ≤ log 10 τ c ≤ 2.408, 20 µm ≤ r eff ≤ 50 µm and 100 hPa ≤ p c ≤ 680 hPa for ice clouds.This recommendation has been adopted for the results discussed here.The limits on log 10 τ c correspond approximately to 0.5≤ τ c ≤256.The conversion between pressure and height is more complicated, as it depends on the temperature profile and so varies in space and time, but the pressure limits correspond typically to 0 km ≤ h c ≤ 6.5 km for water clouds and 3.5 km ≤ h c ≤ 16 km for ice clouds.Atmos.Chem. Phys., 11, 3913-3936, 2011 www.atmos-chem-phys.net/11/3913/2011/

Polar regions
Due to the bright surfaces, cloud identification and retrievals are difficult over snow and ice.As a result, the algorithm is known to perform poorly over polar regions, and retrieval cost can be a poor indicator of quality.For clouds polewards of 55 • requiring that w c < 300 g m −2 as an additional quality test has been found to remove the majority of these poor retrievals.This threshold has been found to remove some retrievals not suffering from snow contamination over land in the Northern Hemisphere, so a relaxed threshold of 700 g m −2 is advised for this region instead.Water paths of this magnitude are typical only for deep convective clouds, which are unlikely to form in polar regions.Users interested in polar regions are advised to apply this test or auxiliary cloud detection schemes, however, GRAPE data are not generally recommended for use in polar regions.This threshold test has been applied for the remainder of the results discussed here.Since creation of the dataset, improvement has been obtained in some polar cases by altering the retrieval's initial guess at the solution.

Broken cloud fields and aerosol misidentification
As detailed in Poulsen et al. (2011), the GRAPE cloud algorithm assumes a single-layer homogeneous cloud field, with a simple split between the cloudy and cloud-free portions of the retrieval scene.This neglects 3-D radiative transfer effects found in strongly inhomogeneous or broken cloud fields (f < 1).Because of this, retrieval of cloud properties is prone to error in these situations (Iwabuchi and Hayasaka, 2002;Marshak et al., 2006;Wolters et al., 2010).Additionally, heavy aerosol loadings have been found to be frequently misidentified as being partially cloudy scenes by the ATSR cloud flag.In this case, typically the retrieved cloud pressure is highly uncertain.Both broken cloud fields and aerosol misflagging have been found to lead to very extreme (small and large) retrieved effective radii.To avoid biases in the data, for general use of retrieval properties (aside from cloud fraction) it is recommended to consider only those scenes mostly or fully overcast (f ≥ 0.8) and with an uncertainty on p c better than 50 hPa.Elevated uncertainty on p c has also been found in multi-layer cloud systems.Although this threshold on f will not reliably identify cases where cloud fields are broken at the sub-instrumental-pixel resolution (1 km), it will remove the scenes known to have extensive broken clouds at the sub-retrieval (3 km × 4 km) resolution.This recommendation has been adopted for the results discussed here.A stricter threshold of f = 1 would minimise these problems but substantially decreases the data volume.

Proportion of successful retrievals
The surface and atmosphere of the Earth are not uniform and so it is reasonable to suspect that the performance of the retrieval algorithm may vary in different regions.Figure 2 shows, from all orbits processed, the proportion of retrievals deemed of a high quality fit (quality flag 3), the proportion converging with a cost higher than 10, and the proportion failing to converge.The following general points of note can be made: -Over land, around 70% of attempted retrievals converge and are well-fit.
-Over ocean, around 50% of attempted retrievals converge and are well-fit.
-Of the remainder, the majority of retrievals fail to converge.
There is clear structure in Fig. 2 aside from the land-sea contrast.Over much of the ocean, high cost or a failure to converge may be an indicator of complex multi-layered cloud systems, or temperature or water vapour profiles not wellmodelled by the ECMWF data ingested by the retrieval, particularly in the tropics around the intertropical convergence zone (ITCZ).It is also possible that the forward model error over oceans (Poulsen et al., 2011) is underestimated.
The retrieval failure rate is higher over continental aerosol outflow regions (such as desert dust and African biomass burning).This is likely due to a combination of aerosol being misflagged as cloud and the presence of absorbing aerosol in or above the cloud layers.The first of these two situations occurs as the instrumental cloud flags were designed to screen out cloud and strong aerosol which would disrupt retrievals of sea surface temperature, a primary goal of the ATSR instruments (Závody et al., 1995;Závody et al., 2000).Therefore they were not designed to explicitly identify only cloud.Recent work by Lean (2009) attempted to derive masks for different aerosol types based upon techniques similar to those used in cloud flagging; it was found that approximately 60% of pixels identified as Saharan dust over the ocean by these aerosol masks had been previously flagged as cloud.Similarly, Brennan et al. (2005) show that the MODIS cloud mask is not reliable for aerosol optical depths of 0.6 or more and misclassification of strong aerosol events is likely an issue for all current satellite radiometers.Therefore it is likely that a proportion of apparent failed cloud retrievals in these regions do not, in fact, represent cloud and in these situations the failure to converge with a good fit to the measurements is a strength of the retrieval algorithm.
The second situation of aerosol within or above cloud may lead to failure because this possibility is not accounted for by the retrieval forward model.Finally, at near-polar latitudes a small band of 40-50% failure rate can be seen in the ocean.This arises due to sea ice, which has a high albedo in contrast to the assumed dull sea surface.Regions of comparatively high failure rate over the land are associated with deserts and mountainous regions.Over deserts, failure may occur due to poor knowledge of surface reflectance and misflagging of aerosol.Over elevated ground regions the horizontal and vertical inhomogeneity of the terrain are thought to be largely responsible, with the MODIS albedo data and coarse-resolution ECMWF profiles providing poorer representations of the surface and atmosphere in these situations.Almost all attempted retrievals converge with a low cost over Western Antarctica; as discussed previously, however, the retrieval is thought to be less reliable in polar regions.The low cost arises due to the bright surface because the forward model uncertainty for visible channels is set proportional to surface albedo and modulated by atmospheric transmittance (Poulsen et al., 2011).

Consistency between ATSR-2 and AATSR cloud retrievals
Approximately one year of data (between late July 2002 and late June 2003) are available where ATSR-2 and AATSR are both in orbit along the same track, with a time difference of approximately 30 min between their overpasses of a point on the Earth.Given the similarity between the two ATSR sensors, this provides an ideal (and unique) opportunity to examine the consistency between them.The 30 min difference implies that in the time between sensor overpasses, a cloud will move approximately 1.8 km for every 1 m s −1 local wind speed.Because of this, and the fact that GRAPE data are gridded in terms of pixels along the retrieval swath (as opposed to referenced to a fixed grid on the Earth's surface), some aggregation is necessary to ensure that the same clouds (or at least cloud fields) are compared for each sensor.For this comparison, data have been averaged to a 0.5 • grid (given typical wind speeds may be up to tens of m s −1 , dependent on altitude and location).Data from the 1st, 6th, 11th, 16th, 21st, 26th, and 31st of each month during the overlap period have been considered, to provide a high data volume.Quality-controlled retrievals have been averaged to this grid and the mean and standard deviation of retrieved parameters and input measurements noted.Each grid box is classified as "land" or "sea", according to the type of the majority of the underlying surface.
These colocated datasets have then been used to generate density histograms and difference histograms for each of COD, CER and CTP (Fig. 3).The comparison reveals that AATSR retrieves higher COD than ATSR-2; the relative bias is of the order of 10% in log 10 space, and higher for very optically thick clouds (roughly log 10 τ c > 1.5, or τ c > 30), particularly over land, although the majority of retrievals are for cloud optically thinner than this.AATSR has a similar relative high bias of 1-2 µm in water cloud effective radius (r eff < 20 µm).AATSR and ATSR-2 retrieve very similar CTP; AATSR CTP is slightly higher (of the order of a few tens of hPa).
In all cases, however, the data are highly correlated with Pearson's linear correlation coefficient r for land (sea) equal to 0.70 (0.83) for COD, 0.94 (0.90) for CER and 0.95 (0.94) for CTP.Additionally, the differences between colocated grid-box means are generally smaller than the variability within the grid boxes for all retrieved parameters.The average scene cloud fraction is less strongly correlated between the two instruments with r = 0.75 for land and 0.70 for sea.When the requirement of f ≥ 0.8 is removed, these correlations increase to 0.90 for land and 0.79 for sea.Because COD and CER are higher for AATSR, the derived CWP is also higher, although again with high correlations (0.79 over land and 0.84 over ocean) between the instruments.The sensors also generally agree with the retrieval of cloud phase; this is partially indicated by the agreement in effective radius (as water and ice clouds occupy different ranges of r eff ).Additionally, the proportion of clouds retrieved as ice phase in the grid boxes is very highly correlated (0.91 over land and 0.89 over ocean).
Atmos.Chem. Phys., 11, 3913-3936, 2011 www.atmos-chem-phys.net/11/3913/2011/Repetition of the analysis for different dates within each month, or splitting the bins dependent on latitude range, does not significantly affect the results (the relative biases are global in nature).
Differences in retrieved properties between the two sensors are likely to arise principally through a combination of the following mechanisms: -Physical differences in cloud properties during the 30 min time difference between overpasses.
-Sensor-related issues such as imperfect characterisation of calibration and noise.
-Algorithm-related issues such as degenerate retrieval solutions, or different retrieval success rate in different cloud regimes.
Within the bounds of the state vector uncertainty estimates provided by OE, the cost function generally has a single min-imum and so degeneracy in retrieval solution would contribute to the scatter in Fig. 3 but not any offset or bias.Additionally, the ATSR-2 and AATSR sensors are essentially the same and so no difference in retrieval success rate is likely (and none has been observed).Additionally, such a potential source for difference would be unlikely to manifest globally.Therefore the third point can be considered unlikely.
The first two points would lead to differences between the TOA reflectances and brightness temperatures measured by the sensors.Correlations between the measurements are, unsurprisingly, high; over land (sea) these are 0.96 (0.91) at 660 nm, 0.94 (0.90) at 870 nm, 0.94 (0.90) at 1.6 µm, 0.96 (0.93) at 10.8 µm and 0.96 (0.94) 12 µm.AATSR has a high relative bias compared to ATSR-2 at 660 nm and 870 nm, while the other three channels used are much more similar.This explains the relative high bias in COD and CER, and the comparative similarity in CTP.The median ratio between the reflectance or brightness temperatures (defined as AATSR:ATSR-2) is shown in Fig. 4 for each month during the overlap period.The mean ratio (not shown) is similar.Reflectances are Sun-normalised to account for the change in solar zenith angle.
The figure shows that AATSR has a seasonally-varying relative high bias of order 3-10% at 660 nm and 5-12% at 870 nm.The 1.6 µm channel is comparatively unbiased (except for scenes over sea, which show a slight low bias in AATSR) although the median ratio varies by about 0.05 over the course of the year.These relative biases are larger than the reported random error on the measurements, which are of the order of 2-3% of the signal (Smith et al., 2001(Smith et al., , 2002)).The shape of the relative bias is consistent between all three channels through the course of the year.The bias is generally consistent for land and ocean scenes and across both hemispheres.If it was dominated by changes in cloud properties between overpasses, then the seasonality would be expected to differ across hemispheres and possibly for land and ocean.This supports the conclusion that there is a difference in the relative calibration of the instruments.During the first year of the AATSR mission, the instrument suffered from stronger than expected buildup of ice, which degraded the quality of the calibration during this period (Smith, 2003).Additionally, Smith et al. (2009) found relative biases of similar magnitudes to those reported in this work for stable bright cloud-free ground targets (deserts and snow surfaces).The difference between land and ocean at 1.6 µm may arise due to known nonlinearities with the channel response, meaning the ratio between sensors is different for bright and dark scenes.
ATSR-2 is presently thought to be the better-calibrated sensor (D. Smith, personal communication, 2010), informing the decision in this work to focus on the ATSR-2 part of the data record.Additionally, the spectral response functions of the sensors are slightly different, which may lead to a difference in measurements; this is currently being quantified, although is expected to be smaller than the relative biases noted here, and should lack temporal variability (B.Latter, personal  communication, 2010).These response-weighted central wavelengths for ATSR-2 (AATSR) are 658.2nm (660.0 nm) for channel 2, 863.9 nm (862.5 nm) for channel 3, 1.609 µm (1.593 µm) for channel 4, 10.94 µm (10.86 µm) for channel 6 and 12.07 µm (12.05 µm) for channel 7.
In contrast, the thermal channels show a smaller degree of variability, with the median ratio generally between 0.999 and 1.001.For typical brightness temperatures this corresponds to biases of order 0.2 K or smaller.Over land from October 2002 to March 2003 AATSR measured brightness temperatures are typically cooler than ATSR-2 in both channels and for both hemispheres over land but not over sea.The reason for this is unknown, although the land-sea difference indicates cloud or surface change is more likely than calibration.The impact on cloud retrievals is minimal.The high level of agreement in the thermal channel measurements suggests that the small difference in retrieved CTP may result from different calculated infrared cloud opacities (due to the difference in retrieved COD and CER) rather than calibration issues.
In summary, colocated retrievals reveal a positive bias in COD and CER retrieved from AATSR as compared to ATSR-2.This is linked to a similar relative bias in visible and near-infrared measurements.The consistency over land and sea, between hemispheres, and with the biases observed over cloud-free scenes (Smith et al., 2009) implies the difference is dominated by differences in calibration between the sensors as opposed to changes in cloud or surface properties during the 30 min time difference.This highlights the need for an accurate consistent calibration for derivation of longterm climate records.GRAPE users are advised to bear in mind the potential for misinterpreting calibration differences as trends in cloudiness when using data from both sensors.

Examination of retrieval uncertainty estimates
Direct validation of the uncertainty estimates provided by Optimal Estimation is difficult because of the paucity of data to validate against (both in terms of coverage and type of measurements), and the need for uncertainties in the dataset being compared with to be well-characterised.However, for a homogeneous region containing multiple cloud retrievals, the variability of cloud properties within that scene can be taken as a proxy for the true random error on the retrieval.The advantage of this is that the satellite retrievals themselves may be used as a check; the difficulty then becomes one of identifying homogeneous scenes.This approach is therefore taken here: a typical retrieval uncertainty smaller (larger) than the standard deviation for a homogeneous retrieval would suggest the uncertainty estimates are too low (high).
For this comparison retrievals passing quality control tests (as described previously) are split into eight categories dependent on whether the clouds are optically thin (τ c < 10) or thick (τ c ≥ 10), liquid or ice phase, and over land and sea.Retrievals in each category are then aggregated onto a 0.25 • grid.With the retrieval performed on an approximately 3 km × 4 km grid, there are approximately 65 ×cosθ (where θ is the latitude) potential retrievals in a grid cell.If a grid cell contains at least 75% of this potential maximum number of retrievals from one category then the standard deviation of COD, CER and CTP are extracted, along with the mean retrieval uncertainty estimate for each quantity.The 75% threshold reduces the data volume but helps to ensure the scene is comparatively homogeneous (and so that the standard deviation is likely to provide a good estimate of the random error).
Analysis of 1425 orbits evenly spaced throughout the dataset provides the joint histograms of standard deviation and mean uncertainty as shown in Fig. 5 for clouds over ocean and Fig. 6 for clouds over land.If the mean uncertainty is larger than the standard deviation, this suggests the uncertainty estimates provided are reasonable (or may be an overestimate).If the standard deviation is larger, this suggests that either the uncertainty estimates are too small, or there is heterogeneity in the cloud field.
The retrieval uncertainties are generally in line with the results of simulations performed by Poulsen et al. (2011).These are typically slightly higher for COD and CER over land than sea, due to the higher contribution to TOA radiance from surface reflectance.Standard deviations and mean uncertainties are of similar sizes; for τ c < 10 the standard deviation is typically higher than the mean uncertainty, suggesting the retrieval uncertainty estimates may be too low in these cases.This is true for clouds of both phases, and over both land and sea.The underestimation is approximately a factor of 2 for COD and CER but less clear-cut for CTP.For τ c ≥ 10 the standard deviation is generally the same size or smaller, suggesting the retrieval uncertainty estimates are appropriate or too large.The exception is liquid COD where standard deviations remain slightly larger than the mean uncertainty estimate.For liquid clouds over land with τ c ≥ 10 (Fig. 6) there are two distinct populations in the COD and CER histograms, with different uncertainty estimates but comparable ranges of standard deviation.The lower-uncertainty population in this follows the shape of the τ c ≥ 10 liquid cloud histograms in Fig. 5; this corresponds to optically-thick liquid cloud retrievals over dark land surfaces.As much of this lies below the 1:1 line, retrieval uncertainty may also be underestimated in these conditions.Standard deviations being higher then uncertainty estimates could also arise from inhomogeneity in the scenes being considered, although the strict criteria for inclusion should ameliorate this problem.Overall the results suggest that for τ c < 10 the COD and CER uncertainty estimates are too small by a factor of approximately 2, while the CTP uncertainty estimates are of a similar magnitude to typical standard deviations but are only weakly linked to the apparant standard deviation at that location.For τ c ≥ 10, uncertainty estimates are similar to the random error estimated from standard deviation, although may again be too small by approximately a factor of 2 for the COD and CER of liquid clouds over dark surfaces.
Although OE provides a full covariance matrix for the retrieved state, at present only the diagonal elements are retained in the output, due to the data storage overhead.This means that information on the estimated correlation between the uncertainty on state variables is not presently available.However, from retrieval simulations and test cases it has been determined that for single-layer, homogeneous, overcast clouds these correlations are typically small, while they can be significant in cases which diverge strongly from this model (i.e.multi-layer or broken cloud fields) or when clouds are optically thin and over bright surfaces.

Dependence of retrieved state on satellite zenith angle
This section examines the GRAPE dataset for spurious trends in cloud properties relating to the viewing geometry (in terms of the satellite zenith angle θ v ).There are physical reasons to expect some relationships between cloud properties and solar geometry: for example, a convective cloud forming in the morning would be expected to be low when the Sun is low in the sky, but a few hours later may have increased in altitude.Optical depth and effective radius may also change, Compared to the number of retrievals where 0 • < θ v < 7 • , there are approximately 50% as many retrievals per degree with 7 • < θ v < 20 • and only 1-10% with θ v > 20 • (where θ v ≈ 22.5 • at the edge of the full swath).This is a consequence of the nonlinear change in θ v across the instrument's swath, and the fact that in some cases ATSR-2 operated in narrow-swath mode (where θ v ≈ 8 • at the edge of the swath).Narrow-swath mode is typically encountered over oceans; however, in coastal regions and in some cases over continental land masses ATSR-2 also operated in narrow-swath mode, and there are some coastal regions where ATSR-2 was in full-swath mode.
Figure 7 shows the mean COD, CER and CTP retrieved by ATSR-2 as a function of satellite zenith angles, with the different lines corresponding to different types of cloud (with splits for the tropics and extratropics, Northern and Southern Hemispheres, land and sea, and liquid and ice clouds).This shows that there are some trends in cloud properties, although they are generally small.Most regions and cloud properties show a jump in cloud properties for θ v > 20 • ; the reasons for this are unclear, but possible interpretations include comparatively poor sampling, some problem with the geometry or radiance information stored in the level 1 files used in the retrieval, or some problem with the forward model.For this reason retrievals with θ v > 20 • are excluded from the rest of this analysis, although the number of these retrievals, and so the impact of the exclusion on the results presented here, is small.
For COD the change across the range of θ v is mostly 2 or smaller.For liquid clouds over sea, there is an increase in mean COD with θ v up to about 11 • , after which values plateau, such that the difference between near-nadir and swath-edge retrievals is of order 3.The fact that this behaviour is observed for sea but not land suggests it may be related to sampling, with near-coastal clouds having a higher COD than open-ocean clouds.This is consistent with fullswath mode being more common near coasts, and liquid COD for high θ v being closer to liquid COD for land regions.Liquid CER is similarly largely invariant with θ v over land, but again shows changes with θ v over ocean; again, one possible interpretation is differences between coastal and open-ocean clouds.Ice CER shows different trends with θ v in different regions, but the variability across the range is smaller than 1 µm.These changes could be due to errors in the ice crystal phase function used in the retrieval.CTP tends to decrease slightly with increasing θ v , with changes of order 10 hPa in most regions, although larger changes of up to 40 hPa are seen in some cases (i.e.Northern Hemisphere tropics).One speculative reason for this is that increased θ v causes an increased path length through clouds, which particularly in the case of multi-layer systems may mean the retrieved effective radiating height of the cloud is closer to that of the upper layer, corresponding to a lower CTP.However, the strength of this effect may be small, and it is also possible that some bias related to lookup table interpolation or another effect is responsible for the trend.
In summary, in general over the range of θ v spanned, aside from the very edge of the swath, no significant trends in cloud properties are found.

Ground-based cloud datasets
In this work, two sources of ground-based cloud data are considered.The first of these is the Chilbolton Facility for Atmospheric and Radio Research (CFARR), at which a 94 GHz Galileo cloud radar was used intermittently to derive CTH information during the ATSR-2 period.The second network is that operated by the Atmospheric Radiation Measurement (ARM) programme.The CTH data provided by the ARM sites is part of the actively remote sensed cloud locations (ARSCL) value-added product (Clothiaux et al., 2000;Clothiaux et al., 2001)

Comparison methodology
As cloud fields are highly variable in space and time, the colocation criteria used for the comparison are strict.First, the closest good retrieval to the ground-based station is extracted.This retrieval must contain the station within its footprint (so the maximum distance from the centre of the retrieval area is of order 2 km).
Comparisons of this type are hindered by the fact that, because measurements of different types are sensitive to different parts of the cloud, the same true cloud may not result in the same apparent cloud for all sensors.The ground-based radar and/or lidar measurements provide a profile through the atmosphere (with a generally low minimum detectable optical depth, which varies from instrument to instrument) while the IR measurements obtained by ATSR-2 and similar sensors penetrate some depth into the cloud.For an opticallydense (high optical thickness per unit geometric thickness) cloud this will be close to the "true" CTH as observed from the ground, but if the cloud has a comparatively low COD and high vertical extent, as can be the case with cirrus clouds, then the CTH observed by the IR method will be lower than that seen by lidar or radar.For example, a low bias in IRbased MODIS CTH retrievals for optically thin clouds as compared to the CALIOP lidar was noted by Holz et al. (2008).
The effect of IR penetration can, to a first-order approximation, be accounted for, as the ground-based radar reflectivity profile provides information about the geometric thickness of the cloud, and the GRAPE dataset includes the COD, CER and cloud phase.These last three quantities can be used to convert the retrieved (550 nm) τ c to the equivalent COD at the IR wavelengths, τ c,IR , using the cloud microphysical models defined by the ORAC algorithm (taking as τ c,IR the mean for the 10.8 µm and 12 µm channels).This has been calculated by taking the ratio of the extinction coefficient at 550 nm to the average of the extinction coefficients at 10.8 µm and 12 µm (which are typically within 5% of each other).Due to the use of all measurements simultaneously through OE, this is possible in a physically-consistent manner.The resulting variation between visible and IR COD is shown in Fig. 8.For ice clouds τ c,IR is very similar to τ c , while for water clouds τ c,IR is smaller for r eff <13 µm and up to 10% larger for larger droplets.
The assumption is made that the vertical profile of optical depth within the cloud is proportional to the strength of the radar reflectivity between the cloud-top and cloud-base.The radar reflectivity profile (provided in dBz) is converted to an equivalent reflectivity profile using the standard formula Z = 10 dBz/10 mm 6 m −3 , normalised by the total equivalent reflectivity, and scaled by the GRAPE τ c,IR to approximate the vertical profile of COD.The distance below the cloud top at which the cumulative IR COD (counting from the top) reaches 1 is noted, and this depth δh c is subtracted from the ground-based h c .The fractional uncertainty of this correction term is taken as the fractional uncertainty on τ c .For clouds with an IR optical depth under 1, the emission is assumed to arise from the base of the cloud.This provides an estimate of the effective IR-radiating height assuming that the IR radiance seen by the satellite arises from an optical depth of 1 into the cloud.The median of these IR-radiating ground-based heights within 5 min of the ATSR-2 overpass is the quantity compared here; additionally, it is required that over 90% of the observations during the 5-min period must be cloudy, and the standard deviation of the radar cloud-top height must be smaller than 1 km, to ensure a relatively homogeneous scene.
Any vertical inhomogeneity in cloud particle size will lead to an error in the assumed vertical profile of cloud optical depth, because radar reflectivity is a strong function of particle size as well as number.This will be particularly evident for those cases of an ice cloud overlying a low water cloud, but is also likely to have an effect for a typical nonprecipitating liquid water cloud with larger particles nearer the top.In both situations, the IR-radiating height of the ground-based data is likely to be overestimated by this method.Despite this it is still likely to provide a better estimate than simply using the highest altitude from which a radar return is obtained.
The ARSCL product provides quality control information and so, for the ARM sites, the data are restricted to those with a quality control flag of 0-2 (where 0 is best and 5 is worst).An uncertainty is assigned to the ground-based data corresponding to the sum of the standard deviation, half the vertical bin size (to account for digitisation in the recorded CTH: 30 m at Chilbolton and 22.5 m at the ARM sites), and the uncertainty on δh c mentioned previously.Precipitating clouds are removed using quality flags provided with the data.Remaining scenes are then either deemed "deep clouds" if the top of the cloud is higher than 3 km, and the vertical extent of the cloud greater than 50% of its height, and "shallow clouds" otherwise.This distinction is made because deep clouds are more likely to be vertically heterogeneous (e.g.multi-layer systems) and so the quality of the comparison is expected to be poorer.

Results of comparison
Figures 9 and 10 show the results of the comparison for shallow and deep cloud fields, respectively, and some summary statistics are presented in Tables 4 and 5.For low (ground-based h c < 3 km) shallow clouds there is a tendency for GRAPE to place the clouds slightly higher at all sites.The high bias is of order 1 km.This is a consequence of the retrieval technique: temperature inversions low in the atmosphere may lead to cloud at low altitudes, with the same temperature at several different altitudes in the atmosphere.As the fairly (horizonally and vertically) coarse ECMWF temperature profiles used do not fully resolve these low-level inversions the clouds are placed higher at the next altitude to have that temperature.Use of higher-resolution temperature profiles may help.This issue has been noted in other satellite radiometer datasets, particularly for marine stratocumulus (see, for example, Garay et al., 2008or Holz et al., 2008).In these cases although the CTH may be too high the equivalent CTT is likely to remain accurate.
For higher shallow clouds (h c > 5 km), GRAPE matches well but has in some cases a low bias.In most cases the GRAPE height is within a layer of the cloud with a strong radar reflectivity, suggesting that the cloud has been placed reasonably by GRAPE but that the estimated IR-effective radiating height is inaccurate.The few cases where GRAPE Atmos.Chem. Phys., 11, 3913-3936, 2011 www.atmos-chem-phys.net/11/3913/2011/places the cloud below the cloud base generally correspond to clouds with weak radar reflectivity, which may be more transparent to ATSR-2 than calculated in the retrieval.Overall Pearson's correlation coefficient r is high for shallow clouds (0.8 or better except at Chilbolton; Table 4).
At Chilbolton, all shallow-clouds are also boundary-layer clouds and the variability in altitudes is of a similar size to the GRAPE high bias, explaining the lower correlation.The RMS difference is between 0.84 km and 2.17 km dependent on site.
For deep clouds, this lower bias is more pronounced.This is likely a result of the estimated IR-effective radiative ground-based cloud height being inaccurate and biased high for the previously-discussed reasons of Z profiles not reflecting accurately the profile of COD for verticallyinhomogeneous clouds, and uncertainties in converting τ c to  τ c,IR .For this reason the correlations between the datasets are low and RMS differences between 1.53 km and 2.91 km dependent on site (Table 5).However, as in the shallowcloud case, GRAPE tends to place clouds near regions of strong radar reflectivity, providing some confidence in the retrieval for these more complicated cases.

Comparison of cloud cover with CALIPSO
CALIOP, aboard the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) platform, was launched in 2006 and provides data about the vertical profiles of cloud and aerosol (Winker et al., 2007).As a polarisation lidar, in contrast to the passive remote sensing techniques employed by ATSR, MODIS, AVHRR and others it is able to resolve multiple cloud layers (provided the signal is not fully attenuated), making it a useful tool for comparison of cloud vertical structure.Unlike the comparatively wide swath of a radiometer, CALIOP records data only along-track.Additionally, as data are available only from June 2006 onwards there are no direct coincidences with the ATSR-2 record.There was no similar instrument to CALIOP flying during the ATSR-2 mission lifetime.To overcome these difficulties, the comparison is carried out on a statistical basis using multiyear aggregated data to provide representative distributions of cloud properties.One further source of difference which cannot be removed by this method is the fact that CALIPSO flies as part of the A-Train, with a local solar overpass time of approximately 01:30 p.m. at the Equator, three hours later than ATSR-2.This means that any systematic change in cloud properties due to a diurnal cycle will appear in the comparison.
In this work, data from the General Circulation Model (GCM)-Oriented CALIPSO Cloud Product (GOCCP), described by Chepfer et al. (2010) range compared to all clouds.The quantities compared with GRAPE are not strictly the same (except for the overall cloud fraction) as, given multi-layer cloud systems detected by CALIPSO, it is possible for the sum of low, mid-level and high clouds to exceed the total "any-cloud" fraction.
Figure 12 indicates the combined total of the GOCCP low, mid-level and high cloud fractional cover, and the difference between this and the overall fractional cloud cover (shown in Fig. 11).This difference provides information on the geographical distribution of multi-layer cloud systems (or deep cloud systems penetrating multiple height ranges).Where this difference is close to zero (some parts of the tropical oceans, and polar land) indicates that low, mid or high-level clouds occur without clouds in the other height ranges, i.e. that the single-layer forward model employed in the ORAC algorithm and others is appropriate.However this difference is large over northern and southern latitudes higher than approximately 45 • , the ITCZ, and regions of Africa, Asia and South America, indicating a significant proportion of multilayer (or deep cloud) systems.If these situations consist of an optically thin cloud layer over an optically thick lower cloud (such as thin cirrus overlying a thick stratus deck) it is likely that GRAPE retrievals will provide useful information about the lower cloud layer.However, if the upper layer is optically thicker (but not thick enough to be opaque) or the lower comparatively optically thin, the retrieval may fail or retrieve cloud properties intermediate between the cloud layers.If the upper layer is optically thick, then the CTP at least of the upper layer should be well-retrieved.
The top part of Fig. 11 shows that the total cloud fraction observed in the two datasets shows similar patterns and distributions.The GOCCP dataset includes more cloud over Northern Asia and desert regions of Africa and Australasia, as well as the poles.Looking at the low, mid-level and high cloud amounts in these regions, the majority of this difference is mid-level or high cloud, suggesting thin cirrus which may be subvisible to the ATSR-2 sensor.Over the poles the cloud water path cut to remove suspicious retrievals will also contribute to the lower cloud fraction.GOCCP uses a lidar scattering ratio (SR) of 5 to define cloudiness, corresponding to a visible COD of 0.03-0.05(Chepfer et al., 2010).Over deserts it is also possible that the bright surface causes optically-thin clouds to be missed by the instrumental cloud flag.ATSR also observes less cloud over biomass-burning regions of Africa and South America, which is likely to result from absorbing biomass burning aerosol mixed in with or above cloud layers.This possibility is not accounted for by the ORAC retrieval and it is likely that the retrieval will fail in these situations.Overall the total cloudiness is slightly higher in GOCCP (0.66) than GRAPE (0.64).
The comparison of the fractional cover of mid-level clouds is similar between the two instruments.The low and high cloud fractions, however, are lower in GRAPE than GOCCP, particularly in those regions identified as having high frequency of multi-layer cloud in Fig. 12.Such cloud systems may lead to failed or high-cost retrievals, or retrieval of the effective radiative height.For the ITCZ these are generally retrieved as high clouds, suggesting the ice layers are optically thick, while elsewhere the radiative height is typically that of a low or mid-level cloud.Figure 2 shows that in many regions with frequent occurrence of multi-layer cloud the retrieval has a high rate of converge with a low cost, suggesting that cost alone is not always sufficient to identify multi-layer cloud cases and the second of the above situations is more common.High cloud may also be missed if below the detection limit for ATSR-2 but not CALIPSO, leading to the lower high cloud fraction in GRAPE for regions in Fig. 12 where multi-layer systems are observed as infrequent.
Overall, this comparison shows that the total cloudiness observed in GOCCP and GRAPE are similar.The CALIPSO instrument is able to detect multi-layer cloud decks, which are observed by imagers such as ATSR to have intermediate heights between the two layers, and is able to resolve optically-thinner cloud.These factors result in a lower fractional cover of low-level and high-level clouds in GRAPE.The spatial distributions of cloud of different altitudes are similar.CALIPSO provides a useful tool to validate the vertical distribution of observed clouds, although the advantages of an imaging sensor include the much greater spatial coverage and ability to derive other quantities such as the COD, CER and CWP.

Data used
Both the MODIS sensors and the AVHRR series have a similar spectral range and spatial resolution to ATSR-2, making them good sources of data against which to compare ATSRderived cloud properties.In this work MODIS-Terra data are used to remove the influence of diurnal variability in cloud from the analysis.Two main MODIS cloud retrieval datasets exist, making use of different MODIS bands and retrieval www.atmos-chem-phys.net/11/3913/2011/Atmos.Chem.Phys., 11, 3913-3936, 2011 techniques.In both cases only daytime retrievals are considered here (as only daytime scenes are processed in GRAPE).
The first is that produced by the MODIS Atmosphere Science Team, hereafter referred to as MODIS-ST.This derives cloud properties using a suite of algorithms as summarised by Platnick et al. (2003).One uses visible reflectance at 0.65 µm over land (0.86 µm over ocean) and near-infrared reflectance at 2.1 µm to determine cloud optical depth and effective radius (and hence water content).The other uses CO 2 -slicing and 10.8 µm bands to determine the altitude of clouds.The second (Minnis et al., 2008;Minnis et al., 2010b) was developed as part of the Clouds and Earth's Radiant Energy (CERES) project, designed to be applied to the MODIS imagers on Terra and Aqua as well as the Visible and Infrared Scanner (VIRS) on TRMM.These platforms also contain broadband radiometers, so that cloud and radiation fields can be obtained simultaneously.This dataset is hereafter referred to as MODIS-CE.The daytime algorithm uses information from the 0.65 µm channel (1.6 µm over bright surfaces) to obtain cloud optical depth, 3.7 µm to obtain cloud effective radius, and 10.8 µm for cloud altitude.Measurements at 1.6 µm and 12 µm are used to aid cloud phase determination.
In this comparison, statistics for all three datasets are calculated on a daily basis on a 1 • grid for the year 2001 (for which ATSR-2 and MODIS-Terra provide full years of data).The cloud properties considered are the COD, CER, CWP (each separately for liquid and ice clouds), and the CTP and CTT (for all clouds).Both CTT and CTP are reported, although each algorithm retrieves one and derives the other, as differences between the comparison for each individual quantity can illustrate the effects of differences in the model temperature profiles assumed on the retrieval output.For ATSR-2, all retrievals passing quality control checks are aggregated to this grid.For MODIS-ST, the daily level 3 product (MOD08 D3) from the current Collection 5 dataset is used; updates since the Collection 4 algorithms summarised by Platnick et al. (2003) are listed by NASA (2005) and King et al. (2006).For MODIS-CE the single-scanner footprint (SSF) product from Edition 2B of the data is used (Minnis et al., 2008(Minnis et al., , 2010b)).This provides cloud retrievals averaged to the CERES footprint for up to two distinct cloud layers.From this product, cloudy daytime footprints are averaged to a 1 • grid for the comparison, with cloud properties weighted by the footprint cloud fraction.Differences between the two MODIS datasets are discussed by Minnis et al. (2010a).
All statistics presented in this section are calculated from these daily-averaged datasets, for grid cells where all three contain at least 20 cloud retrievals.Recently, Maddux et al. (2010) found trends in MODIS-ST retrieved cloud properties as a function of viewing zenith angle.For this reason the comparison is restricted to grid cells with a mean MODIS-ST viewing zenith angle of 20 • or smaller, for consistency with ATSR-2 sampling.Note that this means that the same cut is indirectly applied to the MODIS-CE dataset.The comparison is further restricted to 60 • and Equatorward, as cloud retrievals are less certain over polar regions and at high latitudes retrievals cannot be performed year-round.Statistics of the comparison are provided in Table 6.Figures 13-15 show the coincident daily data averaged to an annual mean at 10 • resolution, to illustrate typical regional values.

Cloud optical properties
Liquid COD shows similar patterns in all datasets, although the COD differs in the the oceanic extratropics in both hemispheres from about 30 • and polewards, with ATSR-2 retrieving highest and MODIS-CE lowest optical depths.The same differences are found over parts of Northern Asia and China.The converse is true in these regions for ice cloud COD, with differences of similar magnitudes between the datasets.All these regions have frequent occurrence of multi-layer cloud systems (Fig. 12).It is likely that these differences, then, arise from retrieval difficulties in these multi-layer systems.Overall there is a good agreement, with Pearson's linear correlation coefficient r ≥ 0.7 for all cloud types with both datasets.
The interpretation of CER is more difficult, with similar regional patterns in all datasets but differences in size frequently 3 µm or greater over ocean.The CER (for both phases) shows the lowest correlation between datasets found in Table 6  than ocean, and largest in tropical oceans.The differences between datasets can to an extent be explained in terms of the different wavelengths used in the retrieval algorithms.The channel most sensitive to CER used in ORAC is 1.6 µm; for MODIS-ST the 2.1 µm channel is used, while in MODIS-CE 3.7 µm is used.The absorption coefficient of water increases between 1.6 µm and 3.7 µm so the shortest wavelength will penetrate deeper into the cloud, while the longest is sensitive to particles near the cloud top.This will have no effect on retrievals for clouds with a constant vertical particle size distribution.However, real nonprecipiting clouds typically show an increase in particle size with height.Platnick (2000) quantified the impact of this for typical cloud profiles and showed that, as compared to retrievals using 2.2 µm, use of 1.6 µm (3.7 µm) would result in retrievals of liquid CER smaller (larger) of order 0.5-1 µm.For precipitating clouds the biases may become smaller and/or change sign, as typically in these cases larger droplets are found near the cloud base (Platnick, 2000;Chen et al., 2008).
Considering ATSR-2 and MODIS-CE, the majority of the difference over ocean (mean relative bias of −2.29 µm) can be explained by these wavelength choice effects.As both sensors have 1.6 µm and 3.7 µm channels (although for ATSR-2 only, 3.7 µm data are absent when the visible channels are in narrow-swath mode), the use of both in future versions of the retrieval algorithm would be expected to improve the description of particle size in vertically-inhomogeneous clouds.Indeed, such methods have been developed (such as Chen et al., 2008, and references therein).However, as noted by Minnis et al. (2010a) this cannot explain the smaller liquid cloud effective radii in MODIS-CE than MODIS-ST as for non-precipitating clouds it would be expected that MODIS-CE radii would be larger.
Level 2 products from MODIS-ST include CER retrievals performed using the 1.6 µm and 3.7 µm bands instead of 2.1 µm, providing a mechanism to examine the effect of wavelength selection on CER within the context of the MODIS-ST algorithm itself.Over 30 000 granules from the year 2000 were used to create Fig. 16, which shows the difference between CER retrieved using 1.6 µm and 2.1 µm for liquid and ice clouds.As before, sampling was restricted to those retrievals with a sensor zenith angle smaller than 20 • .
For liquid clouds over much of the ocean the 1.6 µm retrieval is 1-2 µm smaller, which is consistent with the simulations of Platnick (2000).However, in some regions of frequent multi-layer cloud (Fig. 12) the difference is positive.The absorption of ice is stronger than water at 1.6 µm while at 2.1 µm the two phases are more comparable.It is therefore likely that the higher 1.6 µm-algorithm liquid CER here results from the 1.6 µm-algorithm being more sensitive to the upper (ice) phase layer despite the retrievals being assigned a liquid phase.Kobayashi (2007) used TRMM VIRS data (using 3.75 µm for effective radius) to examine differences in the size of precipitating and nonprecipitating clouds in the tropics.This revealed modal CER of 13 µm for nonprecipitating clouds, with few cases of r eff > 15 µm, and modal CER of 17 µm for precipitating clouds.Given multiannual mean liquid CER frequently exceeds 17 µm for MODIS-ST one interpretation would be that marine clouds are almost always precipitating, which would be expected to lead to higher CER observed using the deeper-penetrating 1.6 µm techniques, but this is consistent neither with the ATSR-2 results nor the MODIS-ST 1.6 µm retrievals in these regions (Fig. 16).
Over land, liquid CER are generally closer in all three datasets, with ATSR-2 being smaller on average than either MODIS dataset by approximately 0.2 µm.The similarity between the three techniques indicates either that cloud vertical structure over land can generally be described as adiabatic (similar occurrence of positive and negative CER gradients within liquid clouds), or that some or all of algorithms contain systematic biases over land which cancel out any true wavelength-dependent biases.Figure 16 shows that liquid CER over land is generally retrieved as 1-3 µm larger by the 1.6 µm MODIS-ST algorithm than the 2.1 µm MODIS-ST algorithm.This is surprising because it is a difference of larger MODIS-ST 1.6-2.1 liquid CER difference, microns MODIS-ST 1.6-2.1 liquid CER difference, microns magnitude, and opposite sign, than would be expected from wavelength choice effects from examination of the ATSR-2 or MODIS-CE data (which show little overall relative bias).
The reasons for these differences between the MODIS-ST products from different wavelengths are therefore uncertain.
The differences in liquid COD and CER cause corresponding differences in liquid CWP.Minnis et al. (2010a) note that the MODIS-ST product does not include some optically-thin clouds (of both phases) which are retrieved in the MODIS-CE dataset, explaining a positive bias in MODIS-ST COD and CWP relative to MODIS-CE.Similar effects may be important here, as the ATSR cloud flag is known to miss some thin clouds.Additionally, in many cases over tropical oceans the proportion of attempted ATSR cloud retrievals in which a high-quality fit to the measurements is obtained is low (Fig. 2); failed retrievals may lead to different regional sampling biases in each dataset.Minnis et al., 2008 report that the MODIS-CE cloud mask detects fewer clouds than MODIS-ST, but is able to perform a successful retrieval more frequently.
GRAPE reports generally larger liquid water paths than either MODIS dataset over parts of the Sahara.However, as overall cloudiness is low in that region (Fig. 11), the difference will be small in terms of the absolute amount of cloud water.
Ice cloud retrievals are known to be sensitive to ice shape and size distribution assumptions, although it is difficult to know which are the most appropriate assumptions to make (Cooper et al., 2006;Zhang et al., 2009;Baum et al., 2010, and references within these).Because of this, ice phase COD and CER show greater differences than the respective liquidcloud properties, although again spatial distributions are similar.In particular, the shapes of the GRAPE and MODIS-CE ice CER match closely although the range of sizes is smaller in GRAPE (most areas 22 µm<r eff <29 µm) than MODIS-CE (most 15 µm < r eff < 33 µm).Both GRAPE and MODIS-CE have smaller ice crystals in tropical oceans than midlatitudes, while the reverse is true for MODIS-ST.The differences between GRAPE and MODIS-ST ice CER are not consistent with the differences between 1.6 µm and 2.1 µm MODIS-ST retrievals in Fig. 16.Minnis et al. (2010a) attribute the bulk of the difference between MODIS-ST and MODIS-CE ice cloud retrievals to a combination of effects from the wavelengths used, the ice crystal habits modelled, and the retrievals included in averages.The same effects will be important here and so it is difficult to make any statement about which is the more representative dataset.

Cloud-top pressure and temperature
The CTP and CTT fields are the cloud parameters in closest agreement between the sensors (r ≥ 0.76 over land and r ≥ 0.8 over sea); however, ATSR-2 retrieves the lowest and MODIS-ST the highest pressures in marine stratocumulus regions (p c > 800 hPa in MODIS-ST).The higher MODIS-ST pressures are thought by Minnis et al. (2010a) to be an overestimate and placed too low.This is because the MODIS-ST algorithm reverts to using a 10.8 µm measurement to determine height for very low clouds (Platnick et al., 2003), and assumes an opaque cloud, while these clouds are often semitransparent (such that the observed brightness temperature has a component from the warmer surface).Conversely, Holz et al. (2008) compared colocated MODIS-ST (Aqua) and CALIOP heights and found that MODIS-ST overestimated the height of low marine clouds due to the same boundarylayer inversion difficulty as found in the ATSR-2 data.It is likely, then, that biases of either sign are possible.
Comparisons of MODIS-ST and MODIS-CE cloud heights with other data sources have been carried out (Holz et al., 2008;Minnis et al., 2010a); in general, both were found to underestimate CTH, particularly for the tropics and multi-layer cloud systems, largely due to the penetration of IR photons a significant depth through optically-thin clouds.The same was noted for GRAPE (Sect.6).In this sense all algorithms retrieve the effective radiative height dependent on the wavelengths used.It is likely that the low bias from this effect will be smaller for GRAPE than MODIS-CE, due to combined use of 10.8 µm and 12 µm bands: the absorption of cloud water is stronger near 12 µm than 10.8 µm (as used by MODIS-CE), so the radiative height should be nearer The MODIS algorithms are able to make comparisons using the Aqua sensor (also on the A-Train); for Terra, and the ATSRs, diurnal variability of clouds means even a correctlyretrieved height would likely be lower than the afternoon height observed by CALIPSO due to convection.Also discussed in Sect.6, poorly-resolved boundarylayer inversions in the ECMWF temperature profiles used in GRAPE will lead to low-level clouds forming at such inversions being placed too high.Their temperatures, however, should be correctly retrieved.The MODIS-CE algorithm allows for these inversions and so should not encounter this problem.Figure 15 shows that, aside from in the tropical Pacific, the CTT fields are more similar than CTP.Over ocean the ATSR-2 CTT is 1.24 K warmer than MODIS-ST and 1.69 K cooler than MODIS-ST, corresponding to differences of order 0.5 km in height; this implies that MODIS-ST is approximately 3 K cooler than MODIS-CE on average over oceans.

Comparison of liquid water path with microwave radiometry
As well as visible/infrared imaging instruments, the liquid water content of clouds may be obtained by microwave radiometers, including the SSM/I, TRMM and AMSR-E instruments.The original algorithm of Wentz (1997) applied to SSM/I data has undergone several iterations, most recently as described by Wentz and Spencer (2008), with the same basic algorithm (albeit with sensor-specific coefficients) applied to all instruments.Coverage is limited to ice-free oceanic regions, and liquid clouds only, although multiple cloud layers can be penetrated.The microwave sensors have been combined by O'Dell et al. (2008) to create the UWisc climatology, providing monthly mean LWP on a 1 • grid.The different overpass times (including drifts over mission lifetimes) have been exploited to additionally provide parameters to fit the diurnal cycle of LWP for each 1 • grid cell, for an average year.
In this work, the diurnal cycle from Version 2 of the climatology has been used to create maps of the seasonal mean LWP for each of December-January-February (DJF), March-April-May (MAM), June-July-August (JJA), and September-October-November (SON), calculated for the ATSR-2 overpass time of approximately 10:30 a.m.The GRAPE dataset has then been used to create analogous average seasonal values over ocean.
The seasonal means are shown in Fig. 17.The seasonality is similar and globally averaged the differences are small, with the GRAPE dataset being higher by 5 g m −2 .However, GRAPE underestimates CWP for the ITCZ by up to 100 g m −2 and overestimates for the storm tracks of order 50 g m −2 .These differences, again, can be linked to multilayer cloud systems.The microwave sensors are insensitive to ice crystals and so provide a good estimate of the liquid water path.Figure 12 shows frequent multi-layer systems in these regions.In the ITCZ these are frequently retrieved as ice-phase (visible in the low pressures in Fig. 15 and difference between low and high cloud fraction in Fig. 11, particularly in the Indian Ocean and tropical Pacific) and so such clouds are consequently removed from liquid water path fields but are notable as having a high water content in ice cloud fields (Fig. 14).Ho et al. (2003) used TMI and VIRS data from TRMM to make estimates of cloud water content in tropical oceans and noted that for ice overlying water clouds, the microwave method only measured 25-30% of the total water content, with the rest being ice phase.This is consistent with these clouds being retrieved as ice phase by ATSR-2.Conversely, in the storm tracks the multi-layer Atmos.Chem.Phys., 11, 3913-3936, 2011 www.atmos-chem-phys.net/11/3913/2011/clouds are more often retrieved as liquid phase, causing an overestimate of CWP as the ice water is mistakenly attributed as liquid water.The explanation for the different behaviour at different latitudes is likely an increased ice cloud optical depth at the ITCZ than in the storm tracks.The above figures also show that the same regional patterns are found in both MODIS-ST and MODIS-CE products, likely for the same reasons.Therefore microwave-derived datasets are likely of greater utility than visible-derived products for climatological examination of oceanic LWP.

Conclusions
The GRAPE dataset of cloud properties derived from the AT-SRs has been evaluated, focussing on the ATSR-2 portion of the record (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003).This analysis has been in two parts: firstly, through an examination of retrieval statistics and the consistency between GRAPE data between the two ATSR sensors used; secondly, through a comparison with other widely-used ground-based and satellite cloud data products.A summary of recommendations and caveats for data use is given in Table 7.
The identification and treatment of multi-layer clouds systems in particular contributes to the differences between different satellite cloud data products, and the reliability of these products in multi-layer cases.While applying the recommendations made in this work can identify GRAPE retrievals which are likely reliable, it is recommended that users performing case studies of individual events make use of correlative data (such as simultaneous radar/lidar profiles, or microwave LWP data) where available.
This analysis has suggested avenues for improvement in future versions of the ORAC algorithm used in GRAPE, of which some are likely applicable to other, similar algorithms.Chiefly, these include improved identification and treatment of multi-layer and mixed-phase cloud systems; improved identification of cloud and description of surface reflectance in snow or ice-covered regions; use of higher-resolution temperature profiles to improve the modelling of boundary-layer inversions; use of multiple channels with high sensitivity to cloud effective radius (i.e.inclusion of the 3.7 µm channel) to retrieve a profile of particle size through the cloud rather than assuming vertical homogeneity; and implementation of a variety of different ice crystal phase functions in the retrieval, to either pick the one most appropriate for each case, or else gauge the sensitivity of retrievals to the assumed ice crystal model.Additionally, refinement of the retrieval error budget may improve the extent to which the retrieval cost statistic can be used as a reliable measure of quality.The relative calibration of the ATSR sensors needs to be addressed to improve the utility of the dataset for climate and trend studies.Additional planned changes for the next version of the ORAC cloud algorithm not covered by the above include implementing an anisotropic rather than Lambertian surface reflectance treatment, making use of the forward view of the ATSR sensors, and the potential for synergistic cloud retrievals (such as including oxygen A-band measurements from MERIS to improve AATSR estimates of cloud altitude).
radiometer data used.The MODIS-CE data were obtained from the NASA Langley Research Center Atmospheric Science Data Center.The MODIS-ST data were obtained from the NASA level 1 and Atmosphere Archive and Distribution System.ARM and CFARR are thanked for the ground-based cloud data used.Dave Smith, Tim Nightingale, Barry Latter, Chris Mutlow, and Jack Abolins of STFC-RAL are thanked for numerous useful discussions about the construction, calibration, and in-orbit performance of the ATSRs.Brent Maddux, Steve Platnick, Patrick Minnis, Benjamin Grandey and Johannes Quaas are thanked for advice and assistance regarding the MODIS cloud products.Finally, the authors would like to thank the reviewers (Alexander Kokhanovsky and Andi Walther) for their helpful suggestions on the manuscript.

Fig. 1 .
Fig. 1.Cumulative frequency distributions of retrieval costs.Solid lines indicate water cloud and dashed ice; green lines indicate cloud over land, and blue over sea.The solid black line corresponds to a theoretical χ 2 distribution with 1.4 degrees of freedom.

FractionFig. 2 .
Fig. 2. Regional performance of cloud retrieval.From left to right are shown the fraction of attempted retrievals converging with cost of 10 or less (QF=3); the fraction converging with a cost over 10; and the fraction failing to converge.

Fig. 3 .Fig. 3 .
Fig. 3. Joint histograms of quality controlled and colocated ATSR-2 and AATSR cloud properties over land (top row) and sea (second row).From left to right, the columns show the base-10 logarithm of cloud optical depth; the cloud effective radius; and the cloud-top pressure.The colour scale indicates the number of retrievals in each bin.Bottom row: Histograms of the difference (AATSR−ATSR-2) between bin mean retrieved cloud properties, divided by the root sum of the bin variances for that cloud property.This gives a normalised (unitless) quantity.The plots are ordered as in the top row.Black lines indicate retrievals over land, and red over sea.Dashed lines indicate 0, ±1 and ±3.

Fig. 4 .Fig. 4 .
Fig. 4. Variability of the median ratio betweeen AATSR and ATSR-2 measurements (Sun-normalised radiance or brightness temperature, as appropriate) for quality controlled and colocated cloud retrievals during the 2002-2003 overlap year.Results are shown for (from top-bottom) 660 nm, 870 nm, 1.6 µm, 10.8 µm, and 12 µm.Solid lines indicate land, and dashed lines ocean.Red lines indicate the Northern Hemisphere, and blue the southern.Error bars show the standard error.

Fig. 5 .
Fig. 5. Joint histograms of retrieval standard deviation and mean uncertainty estimate for clouds at sea gridded to 0.5 degree resolution.From top to bottom, rows show the cloud optical depth, cloud effective radius, and cloud-top pressure.From left to right, columns show optically-thin liquid water clouds (τ c < 10), optically-thick liquid water clouds (τ c ≥ 10), optically-thin ice clouds (τ c < 10), and optically-thick ice clouds (τ c ≥ 10).The total number of retrievals for a given cloud type is indicated at the top of the column.White indicates bins containing no data.

Fig. 7 .Fig. 7 .
Fig. 7. Mean retrieved cloud properties as a function of satellite zenith angle.The top row shows results for liquid clouds, and the bottom ice clouds.From left to right, the columns show the COD, CER (µm) and CTP (hPa).In all figures, solid lines indicate retrievals over land and dashed over sea.Black lines indicate the region 23 • N-46 • N, red 0 • N-23 • N, green 23 • S-0 • N, and blue 46 • S-23 • S.

Fig. 8 .Fig. 8 .
Fig. 8. Ratio of cloud IR optical depth to visible optical depth, τ c,IR : τ c , as a function of cloud eff The black line indicates water clouds, and the red line ice clouds.

Fig. 9 .
Fig. 9. Comparison between ground-based and ATSR-2 CTH for shallow cloud fields, using the definitions presented in the text.Clockwise from top-left, the sites are Chilbolton, SGP, NSA and TWP.The comparisons are ordered in ascending ground-based effective IR-radiating height, with the date of each coincidence noted below the plot.For each comparison, the coloured area indicates the radar reflectivity profile for the coincidence, and the black star the estimated effective IR-radiating ground-based CTH and its uncertainty.Diamonds with error bars show the ATSR-2 CTH.

Fig. 10 .
Fig. 10.As Fig. 9, except for deep cloud fields.From top to bottom, the sites are Chilbolton, SGP, NSA and TWP.

Fig. 13 .Fig. 14 .
Fig. 13.Comparison between annual mean (colocated data for the year 2001) liquid water cloud properties derived from ATSR-2 and MODIS-Terra.The left column shows data from ATSR-2, the centre column MODIS-ST, and the right column MODIS-CE.From top to bottom, plots show the cloud optical depth, cloud effective radius, and cloud water path.

Fig. 15 .Fig. 15 .
Fig. 15.Comparison between annual mean (colocated data for the year 2001) cloud-top pressure (top) and cloud-top temperature (bottom) derived from ATSR-2 and MODIS-Terra.The left column shows data from ATSR-2, the centre column MODIS-ST, and the right column MODIS-CE.

Fig. 16 .
Fig. 16.Comparison between MODIS-ST cloud effective radius retrievals using 1.6 µm and 2.1 µm.The top shows the difference (1.6 µm algorithm−2.1 µm algorithm) in retrievals for liquid water clouds, averaged to a 2.5 • grid.The bottom panel shows the same, but for ice clouds.Grid cells without data are indicated in grey.

Fig. 17 .
Fig. 17.Multiyear seasonal composites of cloud liquid water path for (top to bottom) DJF, MAM, JJA, and SON.The left shows maps from the UWisc microwave climatology of O'Dell et al. (2008) and those on the right are from the ATSR-2 GRAPE dataset.
• -60 • N. AATSR provides continuous coverage from late July 2002 (with the version 3 archive including data up to the end of 2009).

Table 1 .
Retrieved state vector (and derived) quantities, units, acronyms and symbols used in this work.The first five quantities are retrieved, along with the cloud phase (ice/water), and the final three derived.Note that the COD is retrieved in log 10 space.

Table 2 .
Classification of quality flags in GRAPE level 2 products.
could increase in size with age or else the cloud could precipitate or evaporate away.It is therefore difficult to disentangle changes in cloud properties with solar angle due to retrieval error with those due to physical changes in cloudiness.Conversely, there is little physical reason to expect a change of cloud properties with satellite zenith angle and so this is examined here.

Table 3 .
Locations and elevation above mean sea level of stations providing ground-based CTH data used in this work.

Table 4 .
Statistics of CTH comparison for shallow clouds.RMS indicates the root-mean square difference, and r Pearson's correlation coefficient.

Table 5 .
As Table 4, except for deep clouds.

Table 6 .
Statistics of comparison between daily (1 • averaged) coincident ATSR-2 and MODIS cloud products.Columns labelled ST indicate the MODIS-ST dataset, and columns labelled CE the MODIS-CE dataset.Pearson's linear correlation coefficient is denoted by r, the root mean square difference RMS, the mean (ATSR-2-MODIS) difference MD, and the mean absolute difference MAD.
. For liquid water clouds, CER is smaller over land www.atmos-chem-phys.net/11/3913/2011/Atmos.Chem.Phys., 11, 3913-3936, 2011 . A direct comparison of ATSR-2 heights with CALIPSO is not presented here aside from that of Sect.7 due to the difference in local solar times and years of operation.

Table 7 .
Summary of recommendations for use of GRAPE cloud data.Category Comments Use as provided Single-layer, well-fit (quality flag 3), overcast (f > 0.8) retrievals Uncertainty estimates for above where COD>10 Total cloud fraction Use with caveats Uncertainty estimates for COD<10 may be too low Cloud properties less reliable for broken cloud fields (f <0.8) CTP/CTH overestimate height for boundary-layer clouds (CTT likely accurate) CTP/CTT/CTH for low COD/multi-layer cases represents radiative average COD for multi-layer clouds (reasonable estimate of total COD) Ice CER is not well-known Liquid CER is offset between MODIS and ATSR datasets: wavelengths used Not recommended Retrievals failing to converge/high cost for general use Cloud water path for multi-layer clouds (identify using cost, CTP uncertainty, or information such as simultaneous radar/lidar profiles) Retrievals in polar regions Combining ATSR-2/AATSR records (calibration uncertainty) Retrievals near edges of swath (θ v >20 • )