Cloud type comparisons of AIRS, CloudSat, and CALIPSO cloud height and amount

The precision of the two-layer cloud height fields derived from the Atmospheric Infrared Sounder (AIRS) is explored and quantified for a five-day set of observations. Coincident profiles of vertical cloud structure by CloudSat, a 94 GHz profiling radar, and the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO), are compared to AIRS for a wide range of cloud types. Bias and variability in cloud height differences are shown to have dependence on cloud type, height, and amount, as well as whether CloudSat or CALIPSO is used as the comparison standard. The CloudSat-AIRS biases and variability range from −4.3 to 0.5±1.2–3.6 km for all cloud types. Likewise, the CALIPSO-AIRS biases range from 0.6–3.0±1.2–3.6 km (−5.8 to −0.2±0.5–2.7 km) for clouds ≥7 km (<7 km). The upper layer of AIRS has the greatest sensitivity to Altocumulus, Altostratus, Cirrus, Cumulonimbus, and Nimbostratus, whereas the lower layer has the greatest sensitivity to Cumulus and Stratocumulus. Although the bias and variability generally decrease with increasing cloud amount, the ability of AIRS to constrain cloud occurrence, height, and amount is demonstrated across all cloud types for many geophysical conditions. In particular, skill is demonstrated for thin Cirrus, as well as some Cumulus and Stratocumulus, cloud types infrared sounders typically struggle to quantify. Furthermore, some improvements in the AIRS Version 5 operational retrieval algorithm are demonstrated. However, limitations in AIRS cloud retrievals are also revealed, including the existence of spurious Cirrus near the tropopause and low cloud layers within Cumulonimbus and Nimbostratus clouds. Likely causes of spurious clouds are identified and the potential for further improvement is discussed.


Introduction
Improving the realism of cloud fields within general circulation models (GCMs) is necessary to increase certainty in prognoses of future climate (Houghton et al., 2001). However, cloud responses to anthropogenic forcing in climate GCMs vary widely from model to model and are largely attributed to differences in the representation of cloud feedback processes (Stephens, 2005). Use of relatively longterm satellite data records such as the Earth Radiation Budget Experiment (ERBE) (Ramanathan et al., 1989) and the International Satellite Cloud Climatology Project (ISCCP) (Rossow and Schiffer, 1999) have clarified cloud radiative impacts, inspired approaches to climate GCM evaluation, and contributed to further theoretical understanding of cloud Published by Copernicus Publications on behalf of the European Geosciences Union.
feedbacks (e.g. Hartmann et al., 2001). Wielicki et al. (1995) note the historical satellite record is unable to measure all cloud properties relevant to Earth's cloudy radiation budget, which include liquid and ice water path (LWP/IWP), visible optical depth (τ ), effective particle size (D e ), particle phase and shape, fractional coverage, height, and IR emittance. Illustrating the need for improved cloud observations, Webb et al. (2001) showed that some climate GCMs generate erroneous vertical cloud distributions that compensate in a manner producing favorable mean radiative budget comparisons with observations. Thus, reliable observations of cloud vertical structure will help to reduce the ambiguity in climate GCM-satellite comparisons.
Several active and passive satellite sensors with unprecedented observing capabilities are flying in a formation called the "A-train" (Stephens et al., 2002). The constellation is anchored by NASA's Earth Observing System (EOS) Aqua and Aura satellites, the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) (Winker et al., 2003), CloudSat (Stephens et al., 2002), along with the Polarization and Anisotropy of Reflectances for Atmospheric Sciences coupled with Observations from a Lidar (PARASOL), and in the near future Glory (solar irradiance and aerosols), and the Orbiting Carbon Observatory (OCO) (atmospheric CO 2 ). Several instruments on Aqua and Aura are designed to measure temperature, humidity, clouds, aerosols, trace gases, and surface properties (Parkinson et al., 2003;Schoeberl et al., 2006). The present focus is on comparisons of cloud retrievals from the Atmospheric Infrared Sounder (AIRS) located on Aqua (Aumann et al., 2003) to CloudSat, a 94 GHz cloud profiling radar, and CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization), a cloud and aerosol profiling lidar on CALIPSO. Aqua leads CloudSat and CALIPSO by ∼55 and ∼70 s, respectively, providing nearly simultaneous and collocated cloud observations.
From the perspective of a satellite-based cloud observation, inter-satellite comparisons have several advantages over surface-satellite comparisons: they (1) eliminate the ambiguity introduced from the integration of a time series of surfacebased measurements to replicate a spatial scale comparable to the satellite field of view (FOV) that is further complicated by cloud temporal evolution (e.g. Kahn et al., 2005), (2) reduce the effects of certain types of sampling biases, including those introduced by the attenuation of surface-based lidar and cloud radar in thick and precipitating clouds (Comstock et al., 2002;McGill et al., 2004), (3) provide a larger and statistically robust set of observations for comparison, and (4) facilitate near-global sampling for most types of clouds.
Many schemes have been developed to classify clouds into fixed types. For instance, the ISCCP data set provides a 3×3 classification scheme based on cloud top pressure and τ VIS (Rossow and Schiffer, 1999), while Wang and Sassen (2001) developed a scheme using multiple ground-based sensors. These (and numerous other) classification schemes are loosely based on the naming system originating from Luke Howard (Gedzelman, 1989). Although cloud classification schemes are limited by measurement sensitivity and subject to misinterpretation, they help to organize clouds into categories with unique characteristics of composition, radiative forcing, and heating/cooling effects (Hartmann et al., 1992;Klein and Hartmann, 1993;Chen et al., 2000;Inoue and Ackerman, 2002;Xu et al., 2005;L'Ecuyer et al., 2006).
No single passive or active measurement from space is able to infer all relevant cloud physical properties (e.g. Wielicki et al., 1995) spanning all geophysical conditions; hence, a multi-instrument constellation is needed to observe Earth's clouds (Miller et al., 2000;Stephens et al., 2002). Now that this type of satellite constellation is operational, the strengths and weaknesses of various instruments can be evaluated in the presence of different cloud types and ultimately observations of multiple instruments can be combined to yield retrievals superior to retrievals from any single instrument. This is motivated in part because of discrepancies in existing climatologies of cloud height, frequency and amount derived from combinations of passive (visible, IR, and microwave) wavelengths (e.g. Rossow et al., 1993;Jin et al., 1996;Thomas et al., 2004). Discrepancies exist not only from different measurement characteristics and sampling strategies, but perhaps as significantly, from retrieval algorithm differences and a priori assumptions (Rossow et al., 1985;Wielicki and Parker, 1992;Kahn et al., 2007b). CloudSat and CALIOP generally provide more direct and easily interpreted observations of cloud detection and vertical cloud structure than passive methods. A combination of radiative transfer modeling and a priori assumptions of surface and atmospheric quantities are necessary to infer cloud properties from passive measurements (e.g. Rossow and Schiffer, 1999).
The scientific literature is replete with cross-comparisons of in situ, surface-based, and satellite-derived cloud properties. However, there are few that consider the impacts of cloud type on the distribution of statistical properties. The precision of passive satellite-derived cloud quantities is not only impacted by cloud type, but temperature (Susskind et al., 2006) and water vapor variability , trace gases (Kulawik et al., 2006), aerosols (Remer et al., 2005), and surface quantities have varying degrees of precision within different cloud types. In this article, the accuracy of AIRS cloud height and amount for different cloud type configurations is quantified using CloudSat and CALIPSO. In Sect. 2 the observations and data products of the three observing platforms are introduced. Section 3 describes the comparison methodology and presents illustrative cloud climatologies of AIRS, CloudSat, and CALIOP. Similarities and differences are placed in the context of measurement sensitivity. Section 4 presents coincident CloudSat-AIRS cloud top differences spanning the breadth of cloud types. CALIPSO-AIRS cloud top differences are shown and compared to those between CloudSat-AIRS. Furthermore, strengths and weaknesses of AIRS cloud retrievals are Atmos. Chem. Phys., 8, 1231Phys., 8, -1248Phys., 8, , 2008 www.atmos-chem-phys.net/8/1231/2008/ revealed and probable causes of discrepancies are discussed. In Sect. 5 the results are discussed and summarized.

Data
The sensitivity of radar, lidar and passive IR sounders to clouds differs greatly. Active sensors provide relatively direct observations of cloud vertical structure compared to passive IR sounders, which derive cloud vertical structure using combinations of radiative transfer modeling and a priori assumptions about the surface and atmospheric state. AIRS has sensitivity to clouds with τ VIS ≤10 (Huang et al., 2004). CALIOP can be used to obtain very accurate cloud top boundaries, especially when the cloud scatters visible light well above that of the molecular atmosphere and aerosols, but has an upper bound of τ VIS ∼3 (Winker et al., 1998;You et al., 2006). CloudSat penetrates through clouds well beyond the sensitivity limit of IR sounders, but is insensitive to small hydrometeors and will often miss tenuous cloud condensate at the tops of some clouds or clouds composed only of small liquid water droplets. In this comparison, a subset of publicly released products is used: cloud top height (Z A ) and effective cloud fraction (f A ) from AIRS, the radar-only cloud confidence and cloud classification masks from Cloud-Sat, and the 5 km cloud feature mask from CALIPSO.

AIRS
AIRS is a thermal IR grating spectrometer operating in tandem with the Advanced Microwave Sounding Unit (AMSU) (Aumann et al., 2003). A substantial portion of Earth's thermal emission spectrum is observed with 2378 spectral channels from 3.7-15.4 µm at a nominal spectral resolution of υ/ υ≈1200. The AIRS footprint size is 13.5 km at nadir, whereas AMSU is approximately 40 km at nadir and coaligned to a 3×3 array of AIRS FOVs. The AIRS/AMSU suite scans ±48.95 • off nadir recording over 2.9 million AIRS spectra and 300 000 Level 2 (L2) retrievals for daily, near-global coverage. The Version 5 (V5) AIRS L2 operational retrieval system (and all previous versions) is based on the cloud-clearing approach of Chahine (1974). Unless otherwise noted the AIRS retrievals used are V5. Profiles of T (z), q(z), O 3 (z), additional minor gases such as CH 4 , CO, CO 2 and SO 2 , and other atmospheric and surface properties are derived from the cloud-cleared radiances . Up to two cloud layers are inferred from fitting observed AIRS radiances to calculated ones (Kahn et al., 2007a). Cloud top pressure (P A ) and cloud top temperature (T A ) are reported at the AMSU resolution (∼40 km at nadir), whereas f A -the multiplication of spatial cloud fraction and cloud emissivity -is reported at the AIRS resolution. (Henceforth, "AIRS FOV" refers to the spatial scale of geophysical parameters reported at the AMSU FOV resolution unless otherwise noted.) Z A is derived from P A and geopotential height using a log-linear interpolation of P A in between adjacent standard geopotential levels. An illustrative (and partial) AIRS granule (defined to be 135 scan lines or 6 min of data) is presented in Fig. 1. Shown is the brightness temperature (BT) at 960 cm −1 (BT 960 ), a BT difference between 1231 cm −1 and 960 cm −1 (BTD) that reveals a sensitivity to cloud phase (Nasiri et al., 2007), and P A and f A for two cloud layers. A wide variety of structure, including extensive multi-layer clouds, is observed in the P A and f A fields. Figure 1b indicates negative BTDs from 6-8 • S that coincide with Altocumulus (Ac) and Altostratus (As) and higher values of P A and f A , whereas scattered positive BTD are present to the north and south within thinner Cirrus (Ci) layers having lower values of P A and f A . The negative and positive BTDs coincide with cloud types consistent with liquid water droplets (Ac and As) and ice crystals (Ci), respectively (see Sect. 2.2). For further detail about AIRS cloud retrievals, cloud validation efforts, and cross-comparisons with the Moderate Resolution Imaging Spectroradiometer (MODIS) and Microwave Limb Sounder (MLS), please refer to Susskind et al. (2006), Kahn et al. (2007a, b), Weisz et al. (2007), and references therein.

CloudSat
CloudSat is a 94 GHz cloud profiling radar providing vertically-resolved information on cloud location, cloud ice and liquid water content (IWC/LWC), precipitation, cloud classification, radiative fluxes and heating rates (Stephens et al., 2002). The vertical resolution is 480 m with 240 m sampling, and the horizontal resolution is approximately 1.4 km (cross-track) ×2.5 km (along-track) with sampling roughly every 1 km. Surface reflection/clutter over most surfaces greatly reduces radar sensitivity in the lowest 3-4 range bins (roughly the lowest km) such that these data are marginally useful in release 3 (R03) (Marchand et al., 2008). An example cross-section of height-resolved reflectivity is shown in Fig. 2a for the same granule introduced in Fig. 1. Cloud-Sat reveals details in vertical cloud structure that IR sounders are unable to either resolve or sample because the IR signal is emitted by the upper 8-10 or so optical depths of a given cloud profile (Huang et al., 2004).
Range bins with detectable hydrometeors are reported in the 2B-GEOPROF product (Mace et al., 2007). A cloudy range bin is associated with a confidence mask value that ranges from 0-40. Values ≥30 are confidently associated with clouds although values as low as 6 suggest clouds approximately 50% of the time (Marchand et al., 2008). Figure 2b shows the cloud mask for confidence values ≥20. When compared to AIRS cloud fields (Figs. 1 and 2b), P A agrees better with CloudSat when f A is relatively large. In more tenuous scenes (small f A ) CloudSat infrequently observes clouds. It is unclear if this is a result of clouds with low radar reflectivities (due perhaps to small hydrometeor size), or spurious AIRS cloud retrievals, or just simple mismatches in the sensor time and space sampling. This subject is discussed in Sects. 3 and 4. About 51% of all R03 CloudSat profiles confidently contain at least one range bin with hydrometeors based on three months of data from the Summer of 2006 (Mace et al., 2007). In Release 4 (R04), a combined radar-lidar 2B-GEOPROF product will be produced (Marchand et al., 2008).
The detected clouds in 2B-GEOPROF are assigned cloud types and are reported in the 2B-CLDCLASS product (Wang and Sassen 2007). Clouds with a confidence mask ≥20 are classified into Ac, As, Cumulonimbus (Cb), Ci, Cumulus (Cu), Nimbostratus (Ns), Stratocumulus (Sc), and Stratus (St). The two-dimensional structure and maximum value of cloud reflectivity as well as cloud temperature (based on ECMWF profiles) are combined to identify cloud types. Cloud type frequency and spatial statistics are presented in Wang and Sassen (2007) for the initial 6 months of CloudSat observations. In a future version a radar-lidar cloud classification mask will be released. The radar-only cloud classification scheme has some differences when compared to a combined radar-lidar scheme. The cloud types As, Ns, Cb, and Cu (congestus) are well detected and classified with a radar-only algorithm. Ci is well classified but under-detected because of the existence of small ice particles in thin Ci that a lidar is able to detect. Ac, St, Sc, and fair weather Cu (in the absence of virga or drizzle) are under-detected using a radaronly algorithm and will be greatly improved with a combined radar-lidar algorithm. The classification of these cloud types is sufficient except that a combined radar-lidar approach is needed to partition St from Sc clouds. The relative merits between a radar-only and combined radar-lidar classification algorithm will be summarized and published elsewhere. The R03 cloud classification mask is shown in Fig. 2c. Comparison to Fig. 2b strongly suggests bias and variability statistics of AIRS and CloudSat cloud top height differences depend on cloud type. As discussed in the introduction most cloud comparison studies present statistics averaged over multiple cloud types. Thus, cloud type classification is able to provide more relevant and useful satellite-based cloud retrieval comparisons.

CALIPSO
The CALIPSO payload consists of three nadir-viewing instruments: CALIOP, the imaging infrared radiometer (IIR), and the wide field camera (WFC) (Winker et al., 2003). This instrument synergy enables the retrieval of a wide range of aerosol and cloud products including (but not limited to): vertically resolved aerosol and cloud layers, extinction, optical depth, aerosol and cloud type, cloud water phase, cirrus emissivity, and particle size and shape (Winker et al., 2003;You et al., 2006). We use the Level 1B total attenuated backscatter profiles to illustrate cloud vertical structure, and the 5 km Level 2 cloud feature mask to quantify cloud altitude. The bit-based feature mask indicates the presence of cloud and aerosol features (layers) and an associated top and base for each feature detected; up to 10 features are reported for cloud (8 for aerosol). Presently, the publicly released feature mask does not discriminate between cloud and aerosol types although type discrimination is planned for a future release. Cloud identification is considerably accurate in Version 1.10, although some thick aerosol can be misidentified  as cloud (see the data quality statement at http://eosweb.larc. nasa.gov/PRODOCS/calipso/table calipso.html). Relatively weak backscatter for tenuous aerosol and cloud approaches the limits of feature detection with CALIOP, thus varying degrees of horizontal averaging is performed to reduce noise and reveal tenuous features, reported at 333 m, 1, 5, 20, or 80 km depending on the feature. The vertical resolution is 30 m from the surface to 8.2 km; higher than 8.2 km it is 60 m (Vaughan et al., 2005).

An illustrative cloudy snapshot
The CALIOP 532 nm total attenuated backscatter and 5 km cloud feature mask is shown in Fig. 2d. Commonly observed differences between lidar-and radar-derived cloudiness that have been previously reported are seen in Fig. 2 (Comstock et al., 2002;McGill et al., 2004). When CloudSat (the radar) and CALIOP (the lidar) both detect clouds (6-15 • S), the lidar observes higher cloud tops than the radar. This difference is expected because lidar is more sensitive to small hydrometeors than radar; small ice crystals and water droplets are ubiquitous near cloud tops. The radar penetrates to the surface through nearly all clouds except for those with significant precipitation (e.g. Cb) unlike most lidars, which generally saturate at optical depth values not much greater than 3 (Comstock et al., 2002). Similarly, the lidar detects extensive thin cirrus from 4-6 • S and 15-25 • S that the radar misses. Figure 2b shows that AIRS-derived cloud tops follow the radar more closely than the lidar when thick clouds occur below tenuous clouds (Baum and Wielicki, 1994;Weisz et al., 2007). Effective cloud fraction (f A ) tends to be much higher in the presence of geometrically thick cloud (observed by the radar), or large backscatter (observed by the lidar), and vice-versa, implying qualitative agreement of f A with radar and lidar observations. AIRS detects much of the thin Ci observed by the lidar only and generally places the upper layer (Z AU ) in the middle or lower portions of the Ci layers (Holz et al., 2006). The radar occasionally misses clouds below f A <0.2-0.3 that the lidar easily observes. In some two-layered cloud systems (e.g. Ci, Cu, and Ns from 14-17 • S) AIRS retrieves realistic Z A values for both layers. In more complicated multi-layer cloud structures (e.g. Ac, As, Ns, and Ci detected by the lidar only from 6-10 • S) locating the two dominant cloud tops is problematic. Furthermore, in areas of thick and/or precipitating cloud (e.g. Cb from 11-14 • S), AIRS "retrieves" a lower layer (Z AL ) within the cloud at a depth beyond the expected range of sensitivity for IR sounders. In summary, the cloudy snapshot in Fig. 2 illustrates CloudSat's ability to profile thick and multilayered cloud structure, CALIPSO's ability to accurately determine cloud top boundaries and profile thin clouds, and reveals strengths and weaknesses of IR-based cloud top height retrievals.

Methodology
In this section the comparison approach between AIRS, CloudSat and CALIPSO is outlined for a five-day set of coincident observations ( clouds when compared to the sensitivity from different spatial averaging approaches (Kahn et al., 2007a). Clear sky and cloud frequency statistics for the three instrument platforms are shown in Table 2. Most notable is the large difference in cloud frequency between CloudSat and CALIPSO. Although the CloudSat and CALIPSO data products have 1 and 5 km ground resolution, respectively, the majority of the difference is due to the relative sensitivity of each instrument to hydrometeors that was discussed in Sect. 2. CloudSat reports the smallest frequency of clouds whereas AIRS demonstrates the greatest. That AIRS detects more clouds than CALIPSO is an indication of (1) some false cloud detections by AIRS, (2) missed clouds by CALIPSO, or (3) increases in FOV size lead to increases in perceived cloud frequency within some spatially heterogeneous cloud fields. Furthermore, a sensitivity of a few percent in AIRS frequency depends on the inclusion of the smallest values of f A . CALIPSO cloud frequency statistics may depend on the resolution of the feature mask (333 m, 1 km, and 5 km) but are not explored here.  To address the relative frequency of false and positive cloud detections, six general scenarios of coincidence are defined in Fig. 3. The frequency of occurrence for each scenario is shown, which account for heterogeneous and homogeneous cloud fields within an AIRS FOV at any altitude in the vertical column. "False" (Scenario C) or "failed" (Scenarios D and E) cloud detections occur approximately 22.0% (12.1%) of the time for CloudSat (CALIPSO) comparisons. Some cases are explained by the insensitivity of CloudSat to thin Ci (Scenario C) and the inability of AIRS to detect some low clouds such as Sc and Cu (Scenarios D and E), while others are explained by partial cloud adjacent to the Cloud-Sat/CALIPSO ground track within the AIRS FOV (Scenario C; e.g. Kahn et al., 2005), co-registration/collocation uncertainties (e.g. Kahn et al., 2007b), and other factors. For the five days in Table 1, averages of 19.3 and 10.6 Cloud-Sat profiles containing cloud (6.0 and 4.3 CALIPSO 5 km profiles) are located within a typical AIRS FOV for Scenarios (B) and (E), respectively. With regard to thin Ci, the CALIPSO comparison in Scenario C demonstrates a significant portion of either false AIRS detection (see Sect. 4.2) or clouds located outside of the CALIPSO ground track. In scenarios D and E, many of these cases are thin Ci detected by CALIPSO that are below the detection limit of AIRS. Further analysis using (for instance) MODIS radiances is required to quantify the relative contributions to false and failed AIRS detection frequency. Table 2. Percentage of clear and cloudy occurrences for Cloud-Sat, CALIPSO, and AIRS. CloudSat cloud frequency is based on whether one or more range bins have a cloud confidence mask ≥20. AIRS cloud frequency is based on whether either the upper or lower layer contains f A ≥0.01 or f A ≥0.0. CALIPSO cloud frequency is based on the 5 km feature mask and whether at least one feature is detected in a given profile. These values do not represent the true global climatology because of the small sample (5 days), and the fact the days chosen are on the 16-day orbit repeat cycle, leading to potential spatial sampling biases. For Scenarios D and E (instances when the radar senses clouds and AIRS does not), the cloud types that dominate the missed cloud detections are assessed. For Scenario D (E), the percentage of missed St is 55% (70.1%) of all cloud types, respectively. This is not a surprise given that St dominates the overall frequency statistics (Wang and Sassen, 2007). Furthermore, the AIRS channel list was modified for V5 in Table 3. Shown are the percentage of AIRS FOVs that contain at least one CloudSat profile with these particular cloud types (middle column), and the percentage of homogenous FOVs for the same cloud types (right column). A total of 52 320 AMSU FOVs and 2.37×10 6 CloudSat profiles (about 45 CloudSat profiles per AMSU FOV) are used in this comparison for the 5-day period listed in Ta Table 3). Missed detections of Ns are consistent with limitations of the AIRS algorithm in the presence of precipitating clouds (Kahn et al., 2007a). All other cloud types explain about 1.5% or less of the missed cloud detections by AIRS. For instance, it is very rare that CloudSat detects Ci cloud when AIRS does not. According to Scenarios B and E, the AIRS FOV is heterogeneous 49.8% of the time using coincident radar-derived cloud profiles, but is reduced to 26.5% using lidar profiles. The higher sensitivity of lidar in detecting small hydrometeors suggests a lower frequency of clear sky/cloud heterogeneity on the scale of the AIRS FOV than implied by the radar. Regardless of the instrument sensitivity, a significant percentage of AIRS observations contain heterogeneous mixtures of clear and cloudy sky. The frequency of each cloud type detected within an AIRS FOV and the percentage of homogeneous AIRS FOVs (where only one type occurs) are shown in Table 3. For AIRS FOVs that contain As, Cb, Ci and Ns a majority is homogeneous; in contrast Ac, Cu, and Sc are substantially more heterogeneous. Cloud profiles with vertically heterogeneous cloud types will be explored upon release of the combined CloudSat/CALIPSO cloud type mask and are not presented here.
3.2 A global five-day climatology Figure 4 shows AIRS zonally averaged cloud frequency and f A (defined in Sect. 2.1) from 70 • S-70 • N illustrating the realism of AIRS cloud height (Z A ), amount (f A ), and frequency. Cloud "frequency" is defined as the percentage of AIRS FOVs with non-zero f A . In the case of Fig. 4, cloud frequency is partitioned into vertical bins, which sum to the values shown in Fig. 5. Although the biases of Z A relative to the radar are not appreciably different in the Polar latitudes, the rate of "false" or "failed" cloud detections is greatly increased (31%) compared to all latitudes (22%). The reasons for poorer cloud retrievals in high latitudes are being explored and will be presented elsewhere. Figure 4a and b (4c and d) illustrates cloud frequency and f A for the upper (lower) layer, respectively. Familiar global-and regional-scale cloud distributions are revealed. High cloudiness is most frequent in the tropical upper troposphere and mid-latitude storm tracks, whereas low cloud occurs within the subtropics extending to the high latitudes. Furthermore, minima in cloud frequency and amount are observed in the subtropical middle and upper troposphere. These patterns are qualitatively consistent with other climatologies (Rossow and Schiffer, 1999;Wylie et al., 1999;Thomas et al., 2004).
Zonally averaged cloud frequency and f A are shown in Fig. 5a and b, respectively. Two minimum values of f A (0.0 and 0.01) used to define cloud in a frequency-based climatology (Fig. 4) illustrate the sensitivity to potentially spurious cloud. Cloud frequency is 5-15% smaller (depending on latitude) using f AU <0.01 for the upper layer, however, the corresponding change for f AL is only 1-2%. Zonally averaged f A is lower with a global mean of ∼0.4 for the sum of both layers, consistent with observations from the High Resolution Infrared Radiation Sounder (HIRS) (Wylie et al., 1999). We note that fractional global cloud cover is substantially larger than 0.4, and f A includes the effect of cloud emissivity. Since many clouds do not radiate as black bodies, the average of f A is expected to be less than the true cloud fraction (or frequency).
Zonally averaged cloud climatologies for collocated AIRS, CloudSat, and CALIPSO observations are illustrated in Fig. 6. The cloud distribution in Fig. 6 is not representative of any particular season or month (Table 1). CloudSat cloud frequency for mask values ≥40 is shown in Fig. 6a. The radar penetrates through nearly all clouds and high frequencies are present throughout the tropical column with the peak from 10-13 km. However, a climatology like that shown in Fig. 6a is not directly comparable to one derived from AIRS. A climatology of CloudSat-observed cloud tops using the highest cloudy range bin within a given vertical profile is presented in Fig. 6c. The cloud top climatology compares much more favorably with AIRS (Fig. 6e) as expected in terms of zonally-averaged spatial patterns and the magnitude of cloud frequency since AIRS does not sample the full vertical structure of a given cloudy column. Likewise, CALIPSO cloud Atmos. Chem. Phys., 8, 1231-1248, 2008 www.atmos-chem-phys.net/8/1231/2008/ frequency derived from the 5 km feature mask is shown in Fig. 6b, and the cloud top climatology is shown in Fig. 6d. As with CloudSat, the CALIPSO cloud top climatology qualitatively agrees more favorably with AIRS, although height and sampling biases are apparent from inspection of the frequency patterns with respect to height and latitude, these will be explored in more detail in Sect. 4. There are several additional notable features between AIRS and CloudSat/CALIPSO shown in Fig. 6. First, the peak frequency in the tropical upper troposphere is zonally offset between AIRS and CloudSat by ∼5 • . At least two explanations are possible: (1) the cloud types AIRS and the radar are most sensitive to are not uniformly distributed (i.e. Ci versus Cb) introducing a zonally-dependent sampling bias, and (2) precipitating clouds occasionally produce Z A retrievals too low in the troposphere with erroneously low values of f A (Kahn et al., 2007a). Second, AIRS retrieves tenuous clouds at higher altitudes than the radar in the subtropical latitudes, suggestive of either sensitivity to thin Ci with small ice particles and/or spurious AIRS retrievals. Third, the radar observes high frequencies of low clouds 1-2 km in height in most latitude bands implying a positive height bias for low clouds sensed by AIRS. Fourth, a second layer within Ns from 2-3 km is frequently observed and is inconsistent with IR sensitivity, to be discussed further in Sect. 4.
Several of the radar-lidar differences that are pointed out in Fig. 2 are also observed in Fig. 6. Cloud tops in the upper troposphere observed by the lidar are higher than the radar by 1-4 km depending on the latitude, and are more vertically extensive than observed by AIRS and the radar. This feature is more expansive from 15 S-15 N, whereas the peak frequency is shifted 5 N (10 N) relative to AIRS in Fig. 6ab. However, a more appropriate cloud top boundary-based cloud climatology in Fig. 6c-e shows that the tropical cloud features compare much better. The broader zonal extent in the lidar climatology is expected because of high sensitivity to thin Ci. The northward shift is consistent with vertically thick and tenuous Ci layers persisting along the edge of the ITCZ allowing the lidar to detect higher cloud frequencies at lower altitude bins. The lower frequency of lidar-detected clouds from 5 • S-5 • N is a result of sampling biases. At this latitude, clouds are more frequently opaque and precipitating and the lidar observations are restricted to a narrow vertical range resulting in fewer detected clouds. Furthermore, the lidar and radar ( Fig. 6a and b) observe low clouds across most latitudes, however, the radar observes more in the ITCZ and less in the Northern Hemisphere (NH) subtropics than the lidar. The low cloud frequency differences are likely a result from a combination of sampling biases (e.g. upper cloud layers obscuring the lidar's view of low cloud, the insensitivity of radar to smaller droplets, etc.), and CloudSat's limitations in the lowest 1.0-1.25 km in R03. Lastly, the frequency minima within subtropical gyres in Fig. 6a extend more poleward into the midlatitudes in Fig. 6b, consistent with the high opacity of clouds in the storm tracks.

Height differences partitioned by cloud type
While AIRS estimates up to two cloud layers, the vertical structure cannot be profiled in the manner of a radar or lidar, making comparisons less straightforward than some other studies (Mace et al., 1998;Miller et al., 1999). In this section, coincident cloud top height observations between AIRS, CloudSat, and CALIPSO are differenced to quantify the precision of Z A as a function of f A and cloud type. The resolution of CloudSat and CALIPSO is not degraded to AIRS, instead each CloudSat and CALIPSO profile is compared to the nearest AIRS retrieval. Random sampling of one CloudSat profile per AIRS FOV demonstrates that the bias and variability are within ±0.1-0.3 km for the approach taken in this section. Furthermore, we show that biases and variability in cloud top differences among different cloud types are several factors larger than those introduced from choosing a particular averaging methodology (Kahn et al., 2007a). More importantly, we will show that the differences among the different cloud types are several factors larger than biases and variability introduced by the choice of sampling strategy. Approximately 45-50 CloudSat profiles (9-10 CALIPSO) coincide with the AIRS FOV. A "nearest neighbor" collocation approach is applied using latitude/longitude pairs. The gap between AIRS nadir view and CloudSat and CALIPSO depends on latitude. As a result, an AIRS FOV occasionally contains less than 45-50 CloudSat and 9-10 CALIPSO match-ups since the index of the collocated footprint is not constant with successive scan lines. Fields of f A are averaged to the resolution of Z A . Additional challenges of collocating multiple satellite measurements are addressed further in Kahn et al. (2007b).

CloudSat-AIRS
Globally averaged differences of AIRS upper (Z AU ) and lower (Z AL ) cloud layers with radar-derived cloud top height (Z CS ) are shown in Fig. 7. About 72.1% of AIRS FOVs are comparable to CloudSat, following Scenarios A and B presented in Fig. 3; the remaining FOVs are clear or represent false or failed detections, which encompass several possibilities (see Sect. 3). The Z CS is the highest altitude range bin with a confidence mask ≥20; no other cloud layer detected by the radar is used in the comparison, even in the presence of additional layers. The cloud type associated with the highest range bin classifies the comparisons by cloud type. As  discussed in Sect. 3, a histogram approach like that taken by Kahn et al. (2007a) to account for multiple radar-derived cloud layers, changes the biases and variability by a smaller amount than those found between different cloud types. Figure 7a and b shows differences of Z CS -Z AU ≡ Z U and Z CS -Z AL ≡ Z L , respectively, as a function of f A averaged over all cloud types. The variability is greater (especially for Z U >0) if the confidence mask is relaxed to values less than 20 (not shown). Figure 7a shows that Z U is a strong (weak) function of f AU <0.2 (f AU >0.2). The mean bias (solid red line) is −1.0 to −4.0 km for f AU <0.2, increasing to 0.5 km as f AU approaches 1.0. Likewise, the variability (dashed red lines) ranges from ±3.5 km for f AU ∼0.01 to ±1.25 km for f AU ∼1.0. There are two contributing factors to the negative bias for f A <0.2: (1) the radar is insensitive to thin and tenuous Ci layers that AIRS detects above lower cloud layers that the radar detects, and (2) some of the small f AU retrievals are spurious. In Fig. 7b, two broad clusters are suggested for Z L . As f AL increases, Z L decreases for the cluster with smaller f AL because the lower layer becomes the dominant cloud layer. The cluster with higher f AL is centered near Z L ∼0 km and is independent of f AL . This second cluster suggests that AIRS retrieves a quantitatively meaningful lower cloud layer. We will show that the second cluster is associated with particular cloud types.
The results in Fig. 7a are partitioned into individual cloud types using the 2B-CLDCLASS product and are shown in Fig. 8. Several differences of Z U among the assorted cloud types are observed. First, the negative bias for low f AU in Fig. 7a is primarily due to Sc (the count in Fig. 8h exceeds Fig. 8b-g), with additional contributions from Ac, Cu, and Ns. For these cases the radar detects low or middle clouds while Z AU is located at a higher altitude. Some Z AU are physically plausible (e.g. thin Ci residing over Sc or Cu in the subtropics or tropics) and some are spurious (to be discussed in Sect. 4.2). Second, the magnitude of f AU for individual cloud types is qualitatively consistent with expectations. For instance, Cb is dominated by f AU >0.8 (low values occur for partial coverage in the AIRS FOV), Ci is 0.05<f AU <0.4, and Ns is in between Cb and Ci with 0.5<f AU <0.9. Few cases of Ns with f AU >0.9 are observed because non-zero f AL several km below the Ns cloud top is frequently retrieved (f AL +f AU typically sum to 1.0); a similar tendency is also observed within some Cb as well (see Fig. 2). Ac has a lower range of f AU compared to As, consistent with the classification used in Rossow and Schiffer (1999) and the increased heterogeneity of Ac (Table 3). . The relative frequencies of each cloud type are given by the magnitudes of each PDF; further frequency statistics on cloud-type frequencies are given in Wang and Sassen (2007). The solid and dashed red lines are the mean and ±1 σ variability, respectively, as in Fig. 7.
Third, both bias and variability strongly depend on cloud type. Sc and Cu have negative Z U , consistent with the high height biases shown for low clouds in Fig. 6. Cb and Ci (and As and Ns for higher values of f AU ) have positive biases of Z U . Holz et al. (2006) showed that Ci cloud top retrievals derived from IR measurements are frequently placed 1-2 km or more below the physical cloud top. Likewise, Sherwood et al. (2004) showed that height differences derived from geostationary imagery and coincident lidar are 1-2 km even within highly opaque cloud tops. The variability in bias decreases as f AU increases for all cloud types except Ci, which remains somewhat constant with f AU . The variability is smallest for As, Ci, and Ns (for f AU >0.5) and largest for Cb (f AU <0.6), Cu (f AU <0.4), and Sc (f AU <0.4). Furthermore, As shows less variability than Ac. Therefore, more heterogeneous clouds (see Table 3) tend to have larger variability in Z U . Figure 9 shows the results for Z L . The cluster at small f AL is dominated by As, Cb, Ci, and Ns. Whether Z AL is a physically reasonable second cloud layer, or a consequence of retrieval algorithm limitations, it is expected that vertical profiles of IWP derived from the radar will provide further insight on Z AL . In R03, CloudSat IWP retrievals in thick and/or precipitating clouds are not reported which hinders the exploration of Z AL within Cb and Ns; however, an improved retrieval is anticipated for the R04 release (2B-CWC-RO R03 data quality statement at http://www.cloudsat.cira. colostate.edu). Sc clouds dominate the cluster with high f AL (see the high count in Fig. 9h) with contributions from Ac and Cu. Z AL agrees best with the radar in low and middle altitude liquid water clouds. For Ns clouds, the bias in Z AL is lower as f AL increases, resulting in two cloud layers in close vertical proximity when f AL is large. Despite the complexity in the interpretation of the observed two-layer cloud fields, Atmos. Chem. Phys., 8, 1231Phys., 8, -1248Phys., 8, , 2008 www.atmos-chem-phys.net/8/1231/2008/ AIRS is shown to possess skill in detecting and assigning an altitude to low cloud layers. Figure 10 shows mean bias and variability statistics for V4 and V5 AIRS retrievals, and the results for V5 are summarized in Table 4. In Fig. 10a, the bias is substantially smaller for f AU <0.1 and f AU >0.6 in V5. This demonstrates that improvements to cloud retrievals were made for V5. The larger negative bias for f AU <0.1 in V4 was primarily a result of poorer retrievals in Ac and Ci (not shown). The larger positive bias in V4 for f AU >0.6 was a result of poorer retrievals in As and Ns, and to a lesser extent, Ci and Cu (not shown). However, in the case of Sc, the V5 bias is larger by 0.25-0.5 km depending on the magnitude of f AU . Differences in day-night and land-ocean biases and variability were explored. Between day and night, as well as between land and ocean, these differences are not qualitatively significant and are several factors smaller than the differences between V4 and V5 (not shown).

CALIPSO-AIRS
Given the known differences in lidar and radar sensitivity, Z A and lidar-derived cloud top height (Z CAL ) differences ( Z CAL ) have the potential to be significantly different than demonstrated in Sect. 4.1 with the radar. However, Fig. 11 reveals qualitatively similar distributions compared to Fig. 7. The sum of Fig. 11a and b (11c and d) is analogous to Fig. 7a (7b). Clouds are partitioned into two categories with Z CAL <7 km and Z CAL ≥7 km. About 85.8% of AIRS FOVs are comparable to CALIPSO, following Scenarios A and B presented in Fig. 3; as discussed in Sect. 3 the remaining FOVs are clear or represent false or failed detections. In Fig. 11a, the bias of Z CAL is 1-3 km with high values for small f AU . The variability is relatively large for small f AU with most of the scatter skewed towards Z CAL >0. This reaffirms the sensitivity of lidar to tenuous clouds and the tendency for IR-derived cloud tops to be located within the middle or lower portions of Ci layers (Holz et al., 2006).
Differences between Figs. 7a and 11 reveal the following about the lidar-AIRS comparisons in Fig. 11b: (1) the negative bias for small f AU is greater by 2 km, (2) the variability is smaller by 0.5-1.0 km, and (3) the largest negative biases are limited to a smaller range of f AU . The radar's insensitivity to small hydrometeors is consistent with (3). Another implication of (3) is that Z AU is "reasonable" (although biased in altitude) for many tenuous Ci. This is also suggested by (2) since slightly lower variability is observed with the lidar comparisons, which are more accurate observations of "true" cloud top boundary than radar. Both (1) and (3) suggest many spurious cloud retrievals in the upper troposphere for f AU <0.02. However, the percentage of spurious retrievals is variable and generally decreases as f AU increases and are not necessarily restricted to f A <0.02. The likelihood is small that heterogeneous AIRS FOVs explain a significant portion of the large negative bias for f AL <0.02 since sub-pixel heterogeneity tends to increase variability, not necessarily bias (Kahn et al., 2007b). In Fig. 11b, the bias in Z CAL ranges from −2 to −0.5 km as f AU increases from 0.2 to 1.0, whereas the variability is somewhat smaller than Z U in Fig. 7a. Overall, Z A shows positive height biases for low clouds and negative height biases for high clouds rel- ative to the radar and lidar (although the negative bias for high clouds is larger in the lidar comparisons and smaller for low clouds). Figure 11c and d reveals a tendency for two height clusters as with Fig. 7b. In Fig. 11c (Z CAL ≥7 km), Z AL is consistently several km below cloud top, consistent within As, Cb, Ci, and Ns shown in Fig. 9. In Fig. 11d (Z CAL <7 km), Z AL is roughly equal to Z CAL over the range of f AL , which resembles the second cluster in Fig. 7b. Since cloud classification is not applied in the lidar comparisons, certain cloud types cannot be shown to explain particular height biases. However, Fig. 11d is consistent with Cu and Sc shown in Fig. 9, which implies (like the radar) that Z AL is skillful in retrieving a lower layer. The ranges of bias and variability for V5 are summarized in Table 5. As with the CloudSat comparisons, a reduction in negative bias is seen in V5 for tenuous clouds, and day/night and land/ocean differences in bias and variability are much smaller than V4 and V5 differences (not shown).

Changes in V5 AIRS retrievals and impacts on clouds
Some of the algorithm changes to V5 have the potential to impact cloud retrievals, which include: limiting channel selection for cloud clearing and cloud retrieval to 665-811 cm −1 , treating CO 2 as a global and time-dependent constant, updating spectroscopic parameters like O 3 and HNO 3 that affect transmittance in the cloud clearing channels, changing the approach to the downwelling IR radiance term, reducing the number of cloud height retrieval iterations during cloud clearing from 4 to 3, removing the ad hoc error term that impacts the damping parameters for cloud height retrievals , and changing the basis of the empirical bias adjustment. The empirical bias correction in V4 used ECMWF analysis fields and in V5 the correction was derived from radiosondes launched during AIRS overpasses that coincided with intensive fields campaigns (Tobin et al., 2006).
The adjustments in the channel list were motivated in large part to eliminate window channels that have large contributions of radiance from the surface. Retrieval yield and precision over surfaces with large spectral emissivity features were improved, but the sensitivity to low clouds was reduced, Atmos. Chem. Phys., 8, 1231-1248, 2008 www.atmos-chem-phys.net/8/1231/2008/ including oceanic stratus. Thus, the sample size of AIRS and CloudSat comparisons for Sc clouds was smaller from V4 to V5. For instance, the frequency of occurrence of Sc within the dominant subtropical subsidence regions has decreased by as much as 10-20%. The comparisons presented here only consider cases when AIRS and CloudSat/CALIPSO simultaneously observe cloud; it should be emphasized that the V4/V5 differences in Fig. 10 do not include observations when one of the instruments and/or data versions does not sense clouds.
CO 2 was assumed to be globally constant at 370 ppm in V4. However, in V5 the treatment of CO 2 was changed to a globally constant linear trend that increases as a function of time, but is without seasonal or latitudinal variation. In the case of high clouds, sensitivity tests have shown that thin cloud frequency is impacted for changes of 5-10 ppm, typical for regional and seasonal variability, while very little change in f A is observed (consistent with a 5 ppm change equivalent to 0.4 K in BT). The appearance (disappearance) of spurious (physically reasonable) Ci is observed when CO 2 levels are assumed to be too low (high) in the forward model (Hearty et al., 2006). In practice, many thin Ci are placed near the tropopause in otherwise clear sky in retrieved cloud fields. Erroneous values of CO 2 are likely to have some impact on the misplaced cloud height for very low values of f A seen in Fig. 11. Regarding middle and low clouds, significant changes are observed in f A , not only the frequency, in the CO 2 sensitivity tests (Hearty et al., 2006). This demonstrates the need for a more realistic estimate of CO 2 in the forward model and suggests the potential utility of a simultaneous CO 2 retrieval  to more accurately retrieve cloud amount and height.
Since the AIRS cloud retrieval steps are initialized with two cloud layers (350 and 850 hPa with f A of 0.167 and 0.333, respectively), the cloud-clearing algorithm must iteratively "remove" cloud to produce cloud-cleared radiances for downstream retrievals of atmospheric and surface quantities. In a regularized algorithm like that discussed in Susskind et al. (2003), residual f A may be present in clear scenes because the effectiveness of cloud clearing is limited (in part) by the magnitude of noise in the observed radiances. Thus, small amounts of residual cloud may remain for some clear FOVs. Lastly, global-scale trends of cloud frequency in V5 are greatly reduced over V4 (T. Hearty, personal communication), although the lack of seasonal and latitudinal variability in CO 2 likely creates regionally dependent biases in cloud frequency since cloud type and frequency are not distributed uniformly around the globe (e.g. Rossow and Schiffer, 1999;Wylie et al., 1999). Both CALIPSO and CloudSat will continue to play important roles in ongoing assessments of AIRS reprocessing efforts.

Conclusions
The precision of cloud height derived from the Atmospheric Infrared Sounder (AIRS), located on EOS Aqua, is explored and quantified for a five-day set of observations. Coincident profiles of vertical cloud structure by CloudSat, a 94 GHz profiling radar, and the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) determine the precision of AIRS-derived clouds in a wide variety of geophysical conditions. By fitting simulated and observed spectral radiances, the AIRS retrieval algorithm derives up to two layers of cloud height (Z A ) and effective cloud fraction (f A ). Comparisons are shown for both cloud layers and the entire range of f A . The cloud confidence and classification masks reported by CloudSat determine cloud occurrence and height and allow the comparisons to be partitioned by cloud type. The 5 km cloud feature mask from CALIPSO is used for the same five-day set of collocated observations. The CloudSat-AIRS biases and variability strongly depend on cloud type, Z A and f A . Using Version 5 (V5) AIRS retrievals, the cloud top biases range from −4.3 to 0.5 km ±1.2 to 3.6 km, depending on f A and cloud type. Large negative biases occur for the smallest values of f A and small positive biases for large f A . Likewise, the largest variability occurs for the smallest f A and the smallest variability occurs for the largest values of f A . Therefore, AIRS cloud top height is shown to excel for most cloud types with relatively high values of f A . Given that the cloud classification scheme used in this work is developed from radar observations, the sensitivity of AIRS to particular cloud types is based strictly on clouds observed by CloudSat. The upper cloud layer has the highest sensitivity to Altocumulus, Altostratus, Cirrus, Cumulonimbus, and Nimbostratus cloud types and the lower layer to Cumulus and Stratocumulus. The bias and variability for individual cloud types vary widely, but almost all cloud types show reductions in biases and variability with increasing f A . Furthermore, a tendency for high (low) clouds to be biased low (high) in height is shown. Frequently, two layers of Z A are retrieved within Nimbostratus, and to a lesser degree, Cumulonimbus. The lower layer is not necessarily consistent with a physically plausible lower cloud layer. Some cloud types like thin Cirrus, Cumulus, and Stratocumulus are very challenging to characterize with IR measurements. The results presented herein suggest that AIRS has skill in detecting and assigning cloud top heights to these difficult cloud types. For instance, the bias and variability of Cirrus, Cu-mulus, and Stratocumulus are 0.2 to 1.5±1.1-2.8 km, −0.3 to 1.5±0.3-2.2 km, and −1.3 to −0.3±0.4-1.7 km, respectively. However, AIRS V5 detects a smaller percentage of Sc fields in and around the major oceanic Stratus regions in the subtropics compared to V4.
CALIPSO-AIRS differences qualitatively agree with those from the CloudSat-AIRS comparisons. For CALIPSO cloud tops ≥7 km and <7 km, the biases and variability are 0.6-3.0±1.2-3.6 km, and −5.8 to −0.2±0.5-2.7 km, respectively, with the largest biases and variability for the smallest values of f A . The tendency for high clouds to have low Z A biases is increased using CALIPSO (rather than CloudSat), consistent with the lidar's increased sensitivity over the radar to small particles in tenuous cloud top boundaries. Likewise, the high Z A biases for low clouds are reduced in magnitude. This demonstrates that both CloudSat and AIRS are not as sensitive to thin Ci and boundary layer clouds compared to CALIPSO. The large negative Z A biases in the CloudSat comparisons for low values of f A are increased (decreased) in the CALIPSO comparisons for clouds <7 km (>7 km) in height. This demonstrates that Z A is more precise for thin Cirrus than implied by the CloudSat comparisons alone. However, there are instances when CALIPSO does not agree with AIRS thin Ci retrievals, demonstrating the existence of spurious Z A in the upper troposphere. Significant improvements in the AIRS V5 operational retrieval algorithm are demonstrated. Some of the algorithm changes made to V5 are highlighted, and those that could have impacted cloud retrievals are discussed.
In summary, we have demonstrated the utility of Cloud-Sat and CALIPSO to evaluate the precision of AIRS cloud retrievals and identified particular cloud types for improvement. Given the relatively favourable agreement between the active-and passive-derived cloud heights, the AIRS swath will be useful to supplement the near-nadir cloud climatology from CloudSat and CALIPSO. Furthermore, since the biases and variability of AIRS cloud height have been quantified as a function of cloud type, they will help to determine biases in cloud type-dependent microphysical and optical retrievals derived from AIRS radiances and similar IR imagers and sounders because cloud vertical structure is required for these retrievals. The inter-comparison of these (and other) data sets is a necessary step towards a unified and global view of cloud properties and their validated error estimates.