Attribution of Chemistry-Climate Model Initiative (CCMI) ozone radiative flux bias from satellites

The top-of-atmosphere (TOA) outgoing longwave flux over the 9.6 μm ozone band is a fundamental quantity for understanding chemistry–climate coupling. However, observed TOA fluxes are hard to estimate as they exhibit considerable variability in space and time that depend on the distributions of clouds, ozone (O3), water vapor (H2O), air temperature (Ta), and surface temperature (Ts). Benchmarking present-day fluxes and quantifying the relative influence of their drivers is the first step for estimating climate feedbacks from ozone radiative forcing and predicting radiative forcing evolution. To that end, we constructed observational instantaneous radiative kernels (IRKs) under clear-sky conditions, representing the sensitivities of the TOA flux in the 9.6 μm ozone band to the vertical distribution of geophysical variables, including O3, H2O, Ta, and Ts based upon the Aura Tropospheric Emission Spectrometer (TES) measurements. Applying these kernels to present-day simulations from the Chemistry-Climate Model Initiative (CCMI) project as compared to a 2006 reanalysis assimilating satellite observations, we show that the models have large differences in TOA flux, attributable to different geophysical variables. In particular, model simulations continue to diverge from observations in the tropics, as reported in previous studies of the Atmospheric Chemistry Climate Model Intercomparison Project (ACCMIP) simulations. The principal culprits are tropical middle and upper tropospheric ozone followed by tropical lower tropospheric H2O. Five models out of the eight studied here have TOA flux biases exceeding 100 mW m−2 attributable to tropospheric ozone bias. Another set of five models have flux biases over 50 mW m−2 due to H2O. On the other hand, Ta radiative bias is negligible in all models (no more than 30 mW m−2). We found that the atmospheric component (AM3) of the Geophysical Fluid Dynamics LaboPublished by Copernicus Publications on behalf of the European Geosciences Union. 282 L. Kuai et al.: Attribution of CCMI ozone radiative bias ratory (GFDL) general circulation model and Canadian Middle Atmosphere Model (CMAM) have the lowest TOA flux biases globally but are a result of cancellation of opposite biases due to different processes. Overall, the multi-model ensemble mean bias is−133±98 mW m−2, indicating that they are too atmospherically opaque due to trapping too much radiation in the atmosphere by overestimated tropical tropospheric O3 and H2O. Having too much O3 and H2O in the troposphere would have different impacts on the sensitivity of TOA flux to O3 and these competing effects add more uncertainties on the ozone radiative forcing. We find that the inter-model TOA outgoing longwave radiation (OLR) difference is well anti-correlated with their ozone band flux bias. This suggests that there is significant radiative compensation in the calculation of model outgoing longwave radiation.


Introduction
Tropospheric ozone (O 3 ) is the third important anthropogenic greenhouse gas (GHG) in terms of radiative forcing (RF) as a consequence of O 3 precursor and methane (CH 4 ) emission increases from pre-industrial times to the present day. Tropospheric O 3 adjusted RF ranges widely from +0.2 to +0.6 W m −2 computed from chemistry-climate model ensembles (IPCC AR5, 2013) Stevenson et al., 2013). The large uncertainty of the tropospheric O 3 RF is driven in part by the model responses to climate change. Without a good long-term record of the historical O 3 levels (Young et al., 2017;Gaudel et al., 2018), such estimates are highly dependent on the model assumptions of past O 3 levels. Differences between models in physical climate, chemical, and radiative processes conspire to complicate the assessment of the accuracy of these RF calculations. Consequently, a method to disentangle the key players caused the model differences to observations as well as the difference between the models is critical for robust estimates of chemistry-climate coupling.
About 80 % of tropospheric O 3 RF is due to O 3 longwave absorption, with the remaining 20 % from the shortwave absorption (IPCC AR5, 2013). In the longwave, 97 % of the total longwave absorption is in the 9.6 µm O 3 band (Rothman et al., 1987). The global outgoing longwave radiation (OLR) spectra were first observed from space for a few months in 1970. Radiance observations were taken during April 1970 and January 1971 by the NASA Infrared Interferometric Spectrometer (IRIS) and then from October 1997 for 9 months by the Interferometric Monitor of Greenhouse Gases (IMG) instrument, on board the Japanese Advanced Earth Observing Satellite "Midori" (ADEOS) satellite. Harries et al. (2001) showed that the changes in the greenhouse gas features between the observed spectra taken 30 years apart by these two instruments suggest increases in greenhouse gas forcing. Over the last two decades, a new generation of thermal infrared satellite instruments has provided a unique opportunity to continuously monitor the outgoing radiances covering the 9.6 µm O 3 band globally, such as NASA's Tropospheric Emission Spectrometer (TES) and Atmospheric Infrared Sounder (AIRS), ESA's Infrared Atmospheric Sounding Interferometer (IASI), and NOAA's Cross-track Infrared Sounder (CrIS). These valuable longterm global measurements can be used to derive the top-ofatmosphere (TOA) O 3 band flux and the sensitivity of the flux to the vertical distributions of O 3 , defined as instantaneous radiative kernels (IRKs) Doniki et al., 2015).
The TES-observed global TOA outgoing fluxes at the 9.6 µm O 3 band in clear skies ( Fig. 1) show strong geographic variations as a result of the short lifetime of O 3 Bowman et al., 2013). Consequently, the global O 3 GHG effect is more unevenly distributed than long-lived GHGs, such as CO 2 . In addition, the variations of the TOA fluxes are not only highly dependent on the distributions of O 3 but are also dependent on water vapor (H 2 O), air temperature (T a ), and surface temperature (T s ) (Kuai et al., 2017).
There is an additional factor where the large-scale atmospheric structure sets the overall atmospheric opacity, which describes the fraction of the light that fails to pass through the atmosphere due to the absorption or scattering. For example, O 3 changes in more opaque regions, e.g., the western Pacific, a wet region due to convection, result in a much smaller change in TOA flux than in more transparent regions, e.g., the Middle East, a dry region due to downwelling (Kuai et al., 2017). This opacity has a direct impact on radiative forcing calculations.
Chemistry-climate models diverge significantly in the simulation of these processes, which are difficult to disentangle because it is hard to quantify the response of the TOA flux due to the change in atmospheric opacity. In this study, we introduce a method to use observational-based IRKs to quantitatively estimate the contributions of the model biases in O 3 , H 2 O, T a , and T s to the TOA flux biases.
The presence of clouds is the primary control on atmospheric opacity. Under the cloudy sky conditions, the roles of these variables other than clouds on TOA flux are much weaker. In addition, the variation in clouds could affect model estimates not only of the ozone but also of the flux sensitivity to ozone and other variables. Both ozone and sensitivity will impact the ozone radiative flux but in opposite directions. With cloud cover, the O 3 loss will be reduced. That means too many clouds would lead to more ozone production. The presence of the cloud would also cause weaker flux sensitivity to O 3 and other variables (IRKs). Therefore, the cloud effect is a battle between the impact on ozone estimation and the radiative sensitivity to ozone (IRK). The differences in cloud variations between the models will complicate the radiative effect. Furthermore, the study of the cloud effect is also currently limited by the global observations of total cloud cover and IRK product under realistic cloud conditions. Without knowing which models have better cloud cover, we benefit from using IRK based on the observed cloud-free data by TES. Therefore, here, we first try to access the role of O 3 , H 2 O, T a , and T s in the variation of the TOA flux without cloud effect. Worden et al. (2008) first attempted to disentangle these effects from satellites. They subsequently developed the IRK in Worden et al. (2011) for O 3 , which is used in this study as a powerful tool to attribute model variability. IRKs for O 3 represent the sensitivity of TOA fluxes to the vertical distributions of the observed O 3 . Aghedo et al. (2011) applied the TES IRKs to evaluate the O 3 radiative effect of chemistryclimate models' O 3 biases in the Atmospheric Chemistry Climate Model Intercomparison project (ACCMIP) . Bowman et al. (2013) found model OLR bias due to O 3 is correlated with RF in the ACCMIP models. This correlation helped to reduce the intermodel divergence in RF by about 30 % . Doniki et al. (2015) updated the IRKs' calculation with a more accurate but computationally more complicated method, a five-angle Gaussian integration (GI) method, to replace the anisotropic approximation. They computed the O 3 IRKs with IASI observations and also showed that between the two methods there are about 20 % differences in IRKs and about 20 %-25 % differences in the longwave radiative effect (LWRE). They also found that the day and night difference of LWRE is mainly controlled by the T s change instead of O 3 amount change. Kuai et al. (2017) updated the computational method for the TES O 3 IRK product with the five GI method and revealed the hydrological controls on the global distribution of the O 3 GHG effect. The study showed that H 2 O, T a , and T s affect the O 3 IRK strength through relative humidity.
Therefore, the TOA flux in the 9.6 µm band depends on more than O 3 . Consequently, in this study, we expand the TES observation-based IRKs to other quantities, including H 2 O profiles, T a profiles, and T s . We apply these IRKs to help understand the reasons for the model divergence in the TOA flux.
The questions that have never been answered before include the following. (1) How do the model-based flux and the flux sensitivity compare to the observational-based flux and sensitivity? (2) How do they compare between the models? (3) How do the flux biases in models relate to the RF variation? Thus, benchmarking present-day O 3 band flux is the first step in answering all these questions and would help to further understand the correlations between the bias in TOA flux and the bias in O 3 RF, and eventually improve the estimation of the climate feedbacks from O 3 forcing.
To benchmark the model-simulated geophysical quantities, a recently developed multi-species multi-satellite Tropospheric Chemistry Reanalysis (TCR) product (Miyazaki et al., 2015) is used in this study to compare to the model results. This chemical reanalysis assimilates data from multiple satellites with sensitivity over complementary parts of the atmosphere, which provides better information than single-species chemical data assimilation. Satellite observations have the occasional issue of temporal discontinuity due to instrument performance and irregular spatial coverage, which can be circumvented by chemical data assimilation. Miyazaki et al. (2015) showed that statistically the model error against independent aircraft and ozonesonde observations in the assimilated species, e.g., O 3 , NO 2 , and CO, is significantly reduced. The multi-species assimilation improves the Northern/Southern Hemisphere OH ratio and provides the emission estimates with interannual variation. The comparison of O 3 reanalysis to the ACCMIP ensemble O 3 simulation in  quantified the model discrepancies in terms of seasonal amplitude, spatial variability, and interhemispheric gradient. For example, the ensemble mean is 6-11 ppb too high in the northern extratropics, while up to 18 ppb too low in the southern tropics over the Atlantic in the lower troposphere. In this study, we use the same O 3 reanalysis data  to understand the model bias in the CCMI project (Morgenstern et al., 2017), a follow-up model intercomparison study for ACCMIP. The multi-species assimilation also provides the opportunity to optimize the chemical-related species of O 3 and the emission sources of the precursors simultaneously. Further work by  showed that the surface emission of nitrogen oxides (NO x ) over a 10-year period (2005-2014) has a positive trend in regions including India, China, and the Middle East, but a negative trend over the US, southern Africa, and western Europe. The global total emission stays almost constant between 2005 (47.9 Tg N yr −1 ) and 2014 (47.5 Tg N yr −1 ). Therefore, the O 3 reanalysis data from TCR represent the state of the art for the current knowledge of the global distribution of tropospheric O 3 by combining the complementary information from model and satellite observations for O 3 and its precursors.
In this paper, we demonstrate a method to use the IRK products and the model biases relative to the reanalyzed tropospheric composition (O 3 and H 2 O) and atmospheric state (T s and T a ) to quantitatively attribute the radiative biases of the flux in a suite of CCMI models to these dominant components. The method and IRKs are described in Sect. 2. The models and reanalysis data are introduced in the next section. Section 4 discusses the intercomparison between models' flux biases, the bias attribution to the dominant components, and the geospatial and vertical distribution of the biases. Lastly, the conclusion and future directions are summarized in Sect. 5.
2 Instantaneous radiative kernels (IRKs) for the climate variables The TOA flux in the 9.6 µm O 3 band ( Fig. 1) is defined as where v is the frequency, integrated over the O 3 band from 980 to 1080 cm −1 . L TOA (v, θ, φ, q) is the upwelling TOA radiance at frequency v, zenith angle θ , and azimuth angle φ. We assume here that the radiance is symmetric in the azimuthal direction. The outgoing TOA radiances, L TOA , are also a function of the atmospheric state, which is represented by variable "q", e.g., H 2 O, O 3 , and T a , that is in turn a function of altitude, z. The IRKs (Eq. 2) represent the sensitivities of the TOA radiative flux in the 9.6 µm O 3 band to the changes in the vertical distribution of an atmospheric variable.
where z l is altitude in discretized level l. When q represents the T s , z l becomes a single surface value at l = 0. The partial derivative term on the right side of the equation is the spectral radiance Jacobians calculated analytically by the TES radiative transfer model. In this study, we expanded the TES global O 3 IRKs to IRKs with respect to H 2 O, T a , and T s . These TOA flux sensitivities still refer to the spectral window region in the 9.6 µm O 3 band for the flux. All the kernels are computed with the five-angle GI method (Doniki et al., 2015;Kuai et al., 2017). . The H 2 O IRK peaks near 700 hPa, a little higher than the T a IRK. The T a IRK is maximal closest to the surface, suggesting that the O 3 band flux is most sensitive to boundary layer T a near 900 hPa. The strength of the peaks decreases with increasing latitude for all the three variables but the peak altitude does not change significantly except for the H 2 O IRKs in the polar region, which peaks at a slightly higher level than in lower latitudes.
In addition, the T s IRK is greater than zero, which means increases in T s would increase the outgoing TOA flux. However, the IRKs for the GHGs, i.e., H 2 O and O 3 , are negative, because the increase in gas concentrations reduces the upwelling flux at TOA due to radiative absorption by the gas.
The global vertical distributions of the zonal averaged kernels for O 3 , H 2 O, and T a are also shown below their profile plots in Fig. 2b, d, and f. The sensitivities of the TOA flux to these three variables are strongest in the tropics and decrease with latitude. Furthermore, the IRK for T s is also shown in Fig. 2g and h. Unlike the other IRKs, the T s IRK is not a function of altitude, so we show the winter (December-February) and summer (June-August) seasonal average of its global distribution. The flux sensitivities are found to be largest over the major deserts, like the Sahara, the Middle East, and Australia, corresponding to the regions with the highest values of T s . We also notice that the values of the T s IRKs in the Intertropical Convergence Zone (ITCZ) are much lower than those in the subtropics, which suggests that the atmosphere opacity has an impact on the strength of the T s IRKs.

A method to attribute the flux biases
The flux biases between observations and models under the clear-sky conditions can be described as where ∂F i TOA , is the total TOA flux bias in the O 3 band at the ith location. The four terms on the right-hand side of the equation are the products of the IRKs and the biases in the geophysical quantities (i.e., O 3 , H 2 O, T a , and T s ). These biases are then vertically integrated on index l, over domain L, which in our case is the troposphere. The summation is the vertical integral from the surface to the tropopause.
Here, we assume that the biases due to other physical processes, e.g., surface emissivity or other atmospheric species, have much less influence on the TOA flux variation. For example, the model bias in global emissivity is not accessible but is believed to be quite small compared to O 3 , H 2 O, T a , and T s . We also assume that the nonlinearity terms are much smaller than these four first-order terms. Following Bowman et al. (2013), the delta terms in Eq. (3) are the model biases with respect to the reanalysis data, defined as below: where q i,l mod and q i,l assim represent the model and reanalysis O 3 , H 2 O, T a , or T s at the ith location and the lth altitude level, respectively.
The mean flux bias or the mean bias components from tropospheric uncertainties are calculated from Eqs. (3) and (4) as where w i is area weighted for the latitude bands, D j is a set of observed locations, N j is the number of locations in the domain of D j and tropospheric levels of L up to the tropopause. We use the chemical tropopause O 3 = 150 ppb (Naik et al., 2005;Hansen et al., 2007;Bowman et al., 2013;Kuai et al., 2017). The domain of D j can be zonal bands for the zonal mean or global area for the global mean, respectively. The global mean of the flux bias and its components will be denoted as δF and δF q , respectively.
4 Chemistry-climate models and the reanalysis data

Models and simulations
We analyze six models from the CCMI study (  (Table S30 in Morgenstern et al., 2017). REF-C1 requires using historic forcing and observed sea surface conditions. The models are free-running and simulate the recent past . We did not choose to use REF-C1SD (specified dynamics) because specified dynamics nudged the wind and temperature of the model to be constrained to the reanalysis data. The long-term climatological biases relative to the reanalysis between the models are minimized. Our study aims to find a correlation between the present-day radiative bias and the RF from present day to future by the model predictions. Therefore, we prefer to keep the model differences in simulating longer-term climatology between their free runs. We note that SOCOL3 and EMAC are both based on different versions of the ECHAM5 climate model. We also added two additional model simulations with AM3 from NOAA and CESM from NCAR. These two simulations are not the specific CCMI experiment run; however, these two models have been used in many studies, and including them in this study provides more useful information on the TOA flux diversity among the most recent models.

Tropospheric Chemistry Reanalysis (TCR-1) data
We computed the biases in the geophysical variables between the model and the reanalysis data. To compute the O 3 bias in models, we used the satellite-based O 3 reanalysis from multiconstituent multi-satellite data assimilation: Tropospheric Chemistry Reanalysis version 1 (TCR-1) (Miyazaki et al., 2015; as the best synthesis of the observations. The reanalysis provides comprehensive spatiotemporal and multi-variable evaluation of model performance that complements direct comparisons against individual measurements, which may suffer from significant sampling bias . TCR-1 assimilated multiple species data from multiple satellite products for the period from 2005 to 2017, e.g., combined TES and Microwave Limb Sounder (MLS) observations for O 3 , integrated Ozone Monitoring Instrument (OMI), Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and Global Ozone Monitoring Experiment-2 (GOME-2) for tropospheric NO 2 column, Measurements Of Pollution In The Troposphere (MOPITT) for CO, and MLS for HNO 3 . TCR-1 used a global chemistry-transport (CTM) Model for Interdisciplinary Research on Climate with chemistry (MIROC-Chem; Watanabe et al., 2011) as a forecast, which includes 92 species and 262 reactions. The model has 2.8 • horizontal resolution with 32 vertical layers up to 4 hPa. The data assimilation was based on an ensemble Kalman filter with 32 ensemble members, which was used to simultaneously optimize concentrations and emissions of various species.
As summarized by , the mean bias in the reanalysis dataset against the World Ozone and Ultraviolet Data Centre (WOUDC) ozonesonde obser-vations is from −3.9 to −2.9 ppb at the NH high latitudes (55-90 • N); −0.9 to −0.1 ppb at the NH midlatitudes (15-55 • N); and −1.0 to −0.1 ppb at the SH midlatitudes (55-15 • S), between 850 and 500 hPa. On average, the bias is about 0.9 ppb at the tropics and midlatitudes between 500 and 200 hPa. These biases are much smaller than biases in the model simulation without data assimilation, demonstrating that the multi-satellite data assimilation provides comprehensive constraints on the entire tropospheric profile of O 3 .
For the purpose of consistency, we also use outputs of H 2 O, T a , and T s from the reanalysis to estimate the model biases. In the reanalysis calculation, meteorological fields simulated by the atmospheric general circulation model MIROC-AGCM (Watanabe et al., 2011) were nudged toward the 6hourly ERA-Interim meteorological reanalysis (Dee et al., 2011) for zonal wind (τ = 1 d) and temperature (τ = 3 d) to reproduce past meteorological fields while simulating shortterm (< 6 h) meteorological variations, which were used to drive the CTM, as similarly employed in CCMI C1SD simulations. Thus, the reanalysis dataset provides realistic and comprehensive estimates for both chemical and meteorological fields required for the TOA flux evaluations. Figure 3 shows the latitudinal distribution of the zonal and annual mean of the TOA flux bias from each model relative to the reanalysis. The largest divergence between the models is located at the tropics where most models underestimate the flux, with the exception of CMAM. The low bias in the model ensemble implies the model atmosphere is more opaque than the chemical reanalysis, leading to a 133 mW m −2 outgoing flux reduction on average. The TOA flux in an opaque atmosphere is less sensitive to the changes in tropospheric composition than a more transparent one. Under those conditions, the models would underestimate the radiative feedback from composition since the IRKs estimated under an opaque atmosphere will be weaker than those under a realistic (more transparent) atmosphere.

The latitudinal distribution of the TOA flux bias
Two models that have larger low biases at the equatorial region than other models are SOCOL3 and MRI-ESM1r1. Their global means of the flux bias are more than −200 mW m −2 ( Table 2). The following analysis will help to clarify the source of the bias in the models.

Flux bias attribution
The total TOA flux bias is caused by biases from atmospheric composition and temperature. In order to determine the primary drivers of these biases, we apply the IRKs to the differences between model and the chemical reanalysis as described in Eq. (3). Figure 4 shows the contribution of O 3   (blue), H 2 O (green), T s (red), and T a (yellow) for each model to the total TOA flux bias (black). The global mean bias is summarized in Table 2. In general, O 3 and H 2 O are the two dominant drivers for most models where the large biases are concentrated in the tropics and subtropics. There are only three models (GEOSCCM, CMAM, and CESM) whose O 3 radiative biases (δF O 3 in Table 2) are less than 50 mW m −2 and are almost negligible zonally. While the flux bias is better represented in these models, it does not follow that they represent tropospheric O 3 more accurately, as will be shown in the following section. The other five models (AM3, SOCOL3, EMAC-L47MA, EMAC-L90MA, and MRI-ESM1r1) have significant negative peaks at low latitudes (Figs. 3 and 4), actually resulting from their strong O 3 contributed biases (from 80 to 180 mW m −2 ; numbers are footnoted as b in Table 2).
The TOA flux bias from H 2 O is the second largest component for most models. Similar to O 3 , most models show the fluxes are biased low in the tropics due to the H 2 O uncertainties with the exception of CMAM, which has the strongest global mean bias (127.9 mW m −2 ). Note that, in the reanalysis, no data assimilation (or nudging) was applied for specific humidity. Watanabe et al. (2011) demonstrated a dry bias in the lower troposphere and a wet bias in the middle and upper troposphere in MIROC-AGCM, primarily attributable to temperature biases. Nevertheless, the reported H 2 O biases can be greatly reduced in the reanalysis because of the nudging applied for temperature.
The flux bias due to T a is found to be negligible in all models, which indicates that the model T a estimates provide rea- sonable radiative fluxes. T s radiative bias is also meridionally weak relative to the flux bias in O 3 and H 2 O (Fig. 4). With the exception of CMAM, the T s ensemble global mean bias is less than 35 mW m −2 (see Table 2). Figure 4 suggests the strong bias from T s in CMAM (−100.2 mW m −2 ) comes from the two subtropical regions.
Interestingly, the positive flux bias due to H 2 O (127.9 mW m −2 ) is compensated by the negative flux bias due to T s (−100.2 mW m −2 ) in CMAM, leading to the lowest global mean in δF (42.9 mW m −2 , calculated with Eq. 5). This compensation is also true for AM3 but between a positive H 2 O radiative bias (87.7 mW m −2 ) and negative O 3 radiative bias (−140 mW m −2 ). This analysis reveals that these two models are both right but for wrong -and opposite -reasons.
However, all the other models have a strong negative global mean bias and are mostly driven by the two major components (O 3 and H 2 O). SOCOL3 and MRI-ESM1r1 are the two models that have the strongest low bias up to −200 mW m −2 , which is mainly due to their strong O 3 radiative bias (−180 mW m −2 ). Their O 3 estimates are both biased high in the tropics and subtropics. We will show later that such bias is particularly strong in the upper troposphere.

Vertically resolved radiative bias of the O 3 , H 2 O, and T
The zonal flux biases among the models are both significant and mainly in the tropics. However, those biases are the vertically integrated product of the model profile bias and the IRKs both with their own vertical structures. The vertically resolved radiative bias can provide more insight into the processes leading to the biases. To further investigate, we examined the vertically resolved flux bias for O 3 , H 2 O, and T a (Figs. 5-7) and the global distribution for T s (Fig. 8). These are computed from Eq. (3) before the vertical summation. These figures show that the maximum contribution to the flux bias is a balance between the peak of the IRKs (Fig. 1) and the peak of the geophysical quantities' bias (Figs. 9-12). The positive tropical O 3 radiative bias for GEOSCCM, CMAM, and CESM is commonly centered in the midtroposphere, corresponding to the peak of the IRKs (Fig. 5). On the other hand, the primary O 3 flux bias contribution in the tropics for SOCOL3, EMAC-L47MA, EMAC-L90MA, and MRI-ESM1r1 is in the upper troposphere around 200 hPa even though the IRKs are roughly half the peak sensitivity. These strong negative biases exceed 15 mW m −2 .
The strong tropical H 2 O radiative bias collapses to shallower tropical regions below 400 hPa and is maximized near 800 hPa for most models exceeding 50 mW m −2 (Fig. 6). CMAM has the unique and strongest net positive bias of above 50 mW m −2 centered lower and close to 900 hPa. While most models' flux bias is centered near 800 hPa, particularly GEOSCCM, AM3, and CESM show a more vertically uniform -and opposing -flux bias. Figure 7 indicates that tropical T a radiative bias is largely negligible for vertical layers above 600 hPa as a consequence of the rapid decrease in sensitivity of the T a IRKs. The maximum bias is in the lower troposphere between 900 hPa and the surface. CMAM and CESM both show the strongest positive bias exceeding 10 mW m −2 over most of the tropics. However, CESM has a compensating negative bias from 700 to 800 hPa that leads to a mean global bias of only 6.4 mW m −2 (Table 2), whereas CMAM has a positive bias throughout, leading to an atmospheric T a radiative bias of 22.5 mW m −2 , the largest of the models studied here. Surprisingly, the model ensemble T s turns out to be the second largest contributor to the total bias ( Table 2), instead of H 2 O, driven primarily by three models: CMAM, CESM, and GEOSCCM, as shown in Fig. 8. CMAM shows a negative bias that covers all of Africa, exceeding 500 mW m −2 , and Asia centered over India. Consequently, CMAM has the largest total bias (−100.2 ± 93.3 mW m −2 ). CESM and GEOSCCM T s radiative biases, on the other hand, are centered at high latitudes in the western hemisphere over the eastern US and Canada, exceeding 300 mW m −2 .
The vertically and spatially concentrated radiative biases provide clues as to what processes are the most important for the total flux bias. These processes drive the distribution of the constituents, which we will discuss in detail in the next sections.

The spatial source of TOA flux bias
The source of the attributed flux biases can be traced back to their spatial origins, which can provide more insight into the underlying processes and the differences between the models. Figure 9 shows a vertically resolved zonal averaged distribution of O 3 biases between the model and the chemical reanalysis similar to that in Fig. 5. Three models (GEOSCCM, CMAM, and CESM) have the weakest globally averaged O 3 radiative bias reported in Table 2 (−33.5, −7.3, and 3.3 mW m −2 ). These three models also have the lowest O 3 bias in tropical troposphere on average (−1.1 ppb, −1.3 ppb, and 3.0 ppb, reported in Table 3) and as a consequence have weaker radiative bias in the region with the strongest O 3 IRK globally (1.0, 1.4, and 2.1 mW m −2 ; reported in Table 4). On the other hand, the global O 3 bias is greater than 7 ppb for all  Tables 3 and 4) and MRI-ESM1r1 (13.7 ppb and −10 mW m −2 ). GEOSCCM, CMAM, and CESM commonly have a vertically compensated pattern in the tropics that is biased high in the upper troposphere while biased low in the middle and lower troposphere (Fig. 9). Their O 3 low biases in the middle troposphere are approximately 5 to 10 ppb, where the peak of the IRK centered, but the high biases in the upper troposphere are about 5 ppb. Such a high-low pattern leads to compensation during the vertical integration through the troposphere into the radiative effect at the top of the atmosphere. The corresponding vertical resolved O 3 radiative bias for these three models in Fig. 5 shows the consistent tropical vertical distribution but in an opposite sign since the O 3 IRK is negative. In contrast, the other five models have vertically systematic biases high in the tropical O 3 , and the biases increase from the middle troposphere to the upper troposphere. Especially SO-COL3 and MRI-ESM1r1 strongly overestimate O 3 by more than 20 ppb in a wide region of tropical upper troposphere. The O 3 radiative biases in this region remain significantly high, stronger than −15 mW m −2 , causing these two models to have the highest O 3 radiative biases in the global and annual mean (both about −183 mW m −2 ).    The systematic bias in the entire tropical tropospheric O 3 and strong overestimation of upper troposphere in SOCOL3 and MRI-ESM1r1 could be caused by several factors. For example, the transport from the lower stratosphere could be too high. Alternatively, precursor emissions of tropospheric O 3 could also be too high. The analysis with the spatially explicit biases provides important clues to implicate the specific processes that individual modeling groups can investigate.

O 3 bias
The GEOSCCM has been used to study the tropospheric O 3 response to variations in the El Niño-Southern Oscillation (ENSO), where Oman et al. (2011Oman et al. ( , 2013 compared the model to satellite observations. These regular comparisons may have led to the improved simulation of tropospheric O 3 profiles and consequently lower vertical O 3 bias. The GEOSCCM model in the CCMI study uses the troposphericstratospheric chemical package developed within the Global Modeling Initiative (GMI) program (Duncan et al., 2007), which has more realistic ozone chemistry, an internally generated quasi-biennial oscillation, an improved air-sea roughness parameterization and other improvements (Oman and Douglass, 2014). Nielsen et al. (2017) showed that GEOSCCM has successfully reproduced the changes in the quasi-global (60 • S-60 • N) annual mean trend in total O 3 column since 1960s to the present day. For the present-day atmosphere, simulated tropospheric partial column O 3 from GESCCM Ref-C1 for CCMI was compared to satellite observations of OMI and MLS . The differences are mostly a few Brewer-Dobson units (DU) except in the Northern Hemisphere subtropics and middle latitudes in autumn and winter with the 4-6 DU biases which are under investigation.
The finding that SOCOL3 and MRI-EMS1r1 both have strong overestimates in the tropical upper troposphere is also understandable. SOCOL3 is the third generation of the coupled chemistry-climate model (CCM) SOCOL (modeling tools for studies of SOlar Climate Ozone Links). Several steps have been taken to improve the SOCOL model simulation of O 3 . Stenke et al. (2013) first attempted to reduce the O 3 bias in their middle atmosphere by updating their middleatmosphere general circulation with an advanced advection scheme. Revell et al. (2015) revealed that ozone precursor emissions are the biggest players that control the global mean change in tropospheric ozone. In a parallel study, Revell et al. (2018) developed an updated version of "SOCOL3.0", "SOCOL3.1", to reduce the tropospheric ozone bias. By improving the treatment of ozone sink processes, the tropo- spheric column ozone bias in "SOCOLv3.1" is reduced up to 8 DU, mostly due to the inclusion of N 2 O 5 hydrolysis on tropospheric aerosols. We expect that the future similar analysis with the SOCOL3.1 could show a reduced flux bias for this model.
Meanwhile, the strong tropical upper tropospheric O 3 biases in MRI-EMS1r1 are believed to be related to the weak tropical convective updraft and the large lightning NO x emissions in the model. The model with weak updraft fails to bring enough low O 3 air from the surface to the upper troposphere in the tropics or overestimates the upper tropospheric mixing of stratospheric ozone-rich air. In addition, the global lightning NO x (LNO x ) emission used in MRI-EMS1r1 is 10 TgN yr −1 . The best estimate of annual mean LNO x based on satellite data assimilation is 6.3 TgN yr −1 (Miyazaki et al., 2014). The LNO x in GEOSCCM is approximately 5 TgN yr −1 (Martini et al., 2011), which shows less tropical upper tropospheric O 3 bias compared to MRI-EMS1r1. Thus, the overestimation of the O 3 precursor in the upper troposphere is another reason for too much O 3 . Figure A1 shows the improvement in the radiative biases due to less O 3 bias in the experiment by half the LNO x emissions in MRI-EMS1r1 (see the Appendix).
In summary, the potential reasons for the prevalence of O 3 radiative bias in the tropical middle and upper troposphere in the models could be due to following facts: (1) the tropical O 3 IRK is strongest in this region (Fig. 2); (2) the largest O 3 bias in the models also centered in the same place (e.g., SOCOL3 and MRI-EMS1r1; Fig. 9); (3) the simulations with the systematic bias throughout the tropical troposphere, when vertically integrated, accumulated into a larger column bias when compared to the models with vertically random biases.

H 2 O bias
H 2 O turns out to be the primary contributor for three models (GEOSCCM, CMAM, and CESM) since their O 3 radiative bias is small. It is also the second dominant driver after O 3 in the other five models. Different from O 3 , H 2 O IRKs in Fig. 2 show the strongest sensitivity to the tropical lower troposphere centered at 800 hPa, where H 2 O is most concentrated globally. We found the model biases in H 2 O are strongest in the tropical lower troposphere. It explains why the strongest radiative bias from H 2 O is also located in the tropical region near 800 hPa in all models as shown in Fig. 6. Figure 10 and Figure 11. The zonal averaged vertical-latitudinal distribution of T a biases from models to the reanalysis data. The black curves are the zero lines. Table 3 further help to indicate that H 2 O is biased low only in two models, AM3 (−586.5 ppm) and CMAM (−506.6 ppm). We note that H 2 O IRKs are also negative as O 3 . Therefore, these two models have the unique overestimates in H 2 O radiative bias at low latitudes (see Fig. 4), while all the other models are predominantly biased high in tropical H 2 O concentrations, which result in the negative radiative biases.

T a bias
We found the T a radiative biases in these model ensembles are all negligible. There are two reasons. One is that the T a biases are small overall (less than 2 K) even at the tropical lower troposphere (below 1 K on average in Table 3). The other reason is that the compensation in the vertical integration helps to reduce the radiative bias at the top of atmosphere. Figure 11 shows that the model biases in T a range within ±2 K for all the models because the current chemistryclimate models have been well developed to simulate the global atmospheric T a fields relative to reanalysis. The region with strongest sensitivity, identified by the T a IRKs (Fig. 2), is the tropical lower troposphere (the region within ±30 • and below 800 hPa). The T a biases in the tropics shift between positive and negative vertically in most models except CMAM, which is systematically biased (Fig. 11). The oscillated T a biases suggest that simulated air temperatures stay around the reanalyzed profiles. These models better represent the air temperature than the trace gases like H 2 O and O 3 . The oscillation around the reanalyzed profile leads to vertical compensation in the air T a radiative bias. Therefore, the flux bias from T a is a small component compared to the radiative bias from O 3 and H 2 O. Figure 4 suggests the only model that has a small tropical peak in the T a radiative component is CMAM, which has the strongest T a radiative bias (22.5 ± 40.5 mW m −2 in Table 2) among all the models. Figure 11 shows that this model has a deep region with a strong bias of about 2 K in tropical areas and also persistently overestimated T a vertically. Figure 7 further suggests the strong radiative bias in CMAM mainly comes from the tropical lower troposphere (> 10 mW m −2 below 800 hPa). The other models have vertical compensation in the tropics (less than 0.5 K bias on average; see Table 3), and therefore they are less biased in TOA flux (less than 1 mW m −2 in Table 4). The two EMAC models both have strong biases at the southern high latitudes but still have a weak radiative effect in this region (see Fig. 7) due to much weaker T a IRK at high latitudes.

T s bias
The global distribution of the T s biases indicates that the biases in sea surface temperature (SST) are smaller than the biases in land T s for all the models (Fig. 12) because the CCMI experiment (REF-C1) selected in this study used the observed SST. CMAM is the model that has the strongest T s radiative bias (−100.2 ± 93.3 mW m −2 in Table 2), which peaks in both subtropical regions (Fig. 4). These large negative biases are due to the large underestimates of the T s over the major deserts, e.g., the Sahara, the Middle East, and Australia (Fig. 12). In other words, the real deserts' surface is hotter than the model's prediction. At the same time, the T s IRKs at the subtropical desert region are also strongest globally, since the TOA flux is more sensitive to T a when the atmosphere is transparent, which is due to the downdraft of the Hadley cell controlling the region (Kuai et al., 2017). The downwelling airflow results in less precipitation and less clouds, as well as higher T s during summer over the desert surface. These factors cause the CMAM to have the largest T s radiative bias compared to all the other models.
In contrast, two EMAC models and MRI-ESM1r1 also have strong high bias in T s in Siberia (> 4 K in Fig. 12), but the radiative bias is much less significant compared with the Middle East in CMAM in Fig. 8. The T s IRKs are weaker during the winter season in high latitudes than the low latitudes if the T s is low (Fig. 2). However, the IRKs at the subtropical desert region stay strong during winter. Therefore, the annual mean of the T s radiative bias is much weaker in Siberia in two EMAC models and MRI-ESM1r1 than the Middle East region in CMAM. Consequently, the global annual means of the T a radiative biases for two EMAC models and MRI-ESM1r1 are small, although the large biases in T s are found in their Siberian region.

Correlation to the broadband OLR
The analysis up to this point has been limited to the 9.6 µm band. We posed the question as to whether biases in this band could provide any insight into biases in the entire OLR band. To that end, we found an anti-correlation (R = −0.6) be- tween the global mean of the O 3 band flux biases and the clear-sky broadband OLR calculated internally by the models, as shown in Fig. 13a. The CMAM OLR is inconsistent with the ensemble (more than 2 W m −2 higher than all the other models), and therefore it is excluded in the correlation. Interestingly, a very similar regression line and anticorrelation coefficient (R = −0.6) are found between the O 3 radiative bias and the broadband OLR (Fig. 13b). The similar regression line indicates that the O 3 radiative bias dominates the 9.6 µm TOA flux distribution, which is confirmed by the attribution analysis that O 3 radiative bias is the largest term in five of eight models. The anti-correlation suggests a radiative compensation between the 9.6 µm band and the other parts of the OLR, assuming a constant globally integrated OLR at TOA. More interestingly, a strong correlation (R = 0.9) is found between T s radiative bias and broadband OLR (Fig. 13c) because the T s affects the entire baseline of the outgoing radiance and its radiative effect plays the same role in the O 3 band as in the entire OLR. However, there is no significant correlation found between T a radiative bias and OLR, likely because there is no coherent bias in T a radiative effect between the O 3 band and in the entire OLR. There is neither a correlation between the H 2 O radiative bias and broadband OLR. H 2 O absorption is ubiquitous in the OLR. Consequently, biases in the 9.6 µm band do not drive the magnitude of the overall H 2 O absorption in spite of the H 2 O biases.
The anti-correlation between the biases in the 9.6 µm band and in the entire OLR band would suggest some bias drivers in the 9.6 µm band must play different roles at the other part of the OLR band. The further investigation of these processes would help to explain the radiative effect of different biases on the OLR estimations from models (Huang et al., 2008(Huang et al., , 2014.

Conclusions
We have demonstrated a new method to quantitatively attribute the biases in O 3 band TOA flux from chemistryclimate model ensembles to O 3 , H 2 O, T a , and T s radiative components without cloud effect using observationally constrained IRKs in the clear sky. The study also provides the first vertically and globally resolved view of the radiative bias for each component. An IRK depicts the sensitivity of TOA fluxes to the vertical distribution of the geophysical quantities, such as O 3 , H 2 O, T a , and T s . While the products of 9.6 µm O 3 band IRK for O 3 , H 2 O, T a , and T s have been developed with the satellite observations by Aura TES, the record could be extended by MetOp-IASI and SNPP-CrIS Fourier Transform Spectrometer (FTS) measurements. We compute the model biases against reanalysis data for four key variables: O 3 , H 2 O, T a , and T s . Especially for O 3 biases, the newly developed TCR-1 O 3 assimilation data (Miyazaki et al., 2015; are, for the first time, used as the state-of-the-art benchmark for tropospheric O 3 in models. These specific bias comparisons for the CCMI study cause the modelers to investigate the reasons for these biases and motivate them to improve their simulations. For example, MRI-ESM1r1 shows the reduced LNO x emission help to improve their tropical upper tropospheric O 3 and its radiative bias. O 3 abundance is found to be the dominant driver for the ensemble flux bias. Tropical tropospheric O 3 is too high for most models and accounts for about 70 % of the flux bias ( Table 2). The second driver in the model ensemble becomes the T s instead of H 2 O because the T s radiative components are commonly biased low in the model ensemble, while the H 2 O radiative biases between models are biased randomly in both directions with large diversity. For individual models, however, H 2 O is the second most important driver, a larger component than T s , for many cases such as AM3, SOCOL3, and MRI-ESM1r1.
In addition to determining that the tropospheric O 3 and H 2 O are overestimated, and the surface is too cold, the study also tells us the geolocations, in latitudes and altitudes, of the deviations in these geophysical quantities that propagate into the flux bias.
The largest spread of the flux bias between the models is found in the tropics. The principal contributors governing each model are different and controlled by different processes over different regions. The flux biases in five of the eight models (AM3, SOCOL3, EMAC-L47MA, EMAC-L90MA, and MRI-ESM1r1) are primarily driven by too much O 3 in the tropical middle and upper troposphere. H 2 O is a big driver in five models (AM3, SOCOL3, GEOSCCM, CMAM, and CESM). T s is an important contributor in CMAM in addition to its H 2 O.
Although AM3 and CMAM overall have relative lower TOA flux biases globally, we found they are actually right for the wrong reasons. In AM3, the dominant positive H 2 O radiative bias (87.7 mW m −2 in Table 2) happens to be canceled by the dominant negative O 3 component (−140 mW m −2 ), while in CMAM, the large positive H 2 O component (127.9 mW m −2 ) is mostly being compensated by T s radiative bias (−100.2 mW m −2 ). The two relatively young models among the model ensembles, SOCOL3 and MRI-ESM1r1, have a large potential to be improved for their fluxes by reducing their strong negative radiative biases from both tropical upper tropospheric O 3 and tropical lower tropospheric H 2 O.
On average, the model ensemble underestimates the flux by about 133 mW m −2 due to overestimated tropical tropospheric O 3 and H 2 O. The underestimate of the TOA flux implies the model atmosphere is too opaque. In a more opaque atmosphere, the change in flux will be weaker for the same change in tropospheric O 3 because the sensitivity (i.e., IRKs) is weaker. With such feedback, the O 3 RF, the changes in O 3 GHG effect from pre-industrial times to the present day would likely be underestimated. The opacity of the atmosphere is controlled by climate processes, such as the hydrological cycle, that is shown can indirectly affect the O 3 GHG effect and RF, as discussed in Kuai et al. (2017).
The spatially explicit and process-focused differences could be used as a basis for emergent constraints . New techniques such as hierarchical emergent constraints (HECs) can harness this spatial information so that specific processes affecting O 3 RF can be identified . Moreover, if this correlation exists between the TOA flux bias and the O 3 RF, then a similar issue could be found in the RF of other GHGs, such as CO 2 and CH 4 . That is a subject for future research.
Finally, although the chemical reanalysis dataset provides comprehensive information on model radiative biases, we need to understand its performance. For instance, further improvements are still needed for lower tropospheric O 3 . Ingesting more datasets and applying a bias correction procedure would be useful to improve reanalysis accuracy. The lower tropospheric O 3 analysis would benefit from the recently developed satellite retrievals with high sensitivity to the lower troposphere (Fu et al., 2018) and the optimization of additional precursor emissions.

Appendix A
Here, we compare the MRI-EMS1r1 experiment run RefC1_50 % LNO x with its RefC1 run. The new run's emission decreases by about 50 % compared with the original run. The global lightning NO x emission annual mean in 2006 simulated in the experiment run is reduced from ∼ 10.79 TgN yr −1 in RefC1 to ∼ 5.21 TgN yr −1 . The 10-year average changes from ∼ 10.44 to ∼ 5.18 TgN yr −1 .
We found the total flux bias is much reduced due to the improved O 3 radiative bias (Fig. A1a, b). As we expected, the vertical resolved O 3 radiative bias shows that the overestimation of the tropical upper tropospheric O 3 radiative bias is much weaker in the new run (Fig. A1c, d). This improvement is due to the lower O 3 biases in this region caused by reduced LNO x emission (Fig. A1e, f). We also see some changes in the latitudinal distributions of the H 2 O radiative bias. This is because the reduction of the upper tropospheric O 3 will cause the model responses in the O 3 heating rate, which would have radiative effect on the temperature, atmospheric stabilities, and convective activity (Nowack et al., 2015). All these factors would impact water vapor and cloud formation.
Author contributions. LK and KWB designed the analysis and organized the paper. LK developed the IRK product, performed all the analysis, and drafted the paper. KWB helped interpret the results. HW and SK provided the code used to further develop the new IRK products. KM contributed the TCR-1 data. The other authors ran the individual model, contributed the model output, and helped revise the paper.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Chemistry-Climate Modelling Initiative (CCMI) (ACP/AMT/ESSD/GMD inter-journal SI)". It is not associated with a conference.
Acknowledgements. This work was conducted at Jet Propulsion Laboratory. Le Kuai and Kevin W. Bowman's research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. Le Kuai and Kevin W. Bowman were supported under NASA ROSES NNH13ZDA001N-AURAST. The TCR-1 work was supported through JSPS KAKENHI grant numbers 26287117 and 18H01285 and by the Environment Research and Technology Development Fund (2-1803) of the Ministry of the Environment, Japan. The Earth Simulator was used to conduct chemical reanalysis calculations under the JAMSTEC Proposed Project and Strategic Project with Special Support. The EMAC model simulations were performed at the German Climate Computing Centre (DKRZ) through support from the Bundesministerium für Bildung und Forschung (BMBF). DKRZ and its scientific steering committee are gratefully acknowledged for providing the high-performance computing (HPC) and data archiving resources for the consortial project ESCiMo (Earth System Chemistry integrated Modelling). The GEOSCCM is supported by the NASA MAP program and the high-performance computing resources were provided by the NASA Center for Climate Simulation (NCCS). Eugene Rozanov is partially supported by the Swiss National Science Foundation under grant 200020_182239 (POLE) and the information gained will be used to improve next versions of the CCM SOCOL.
Financial support. This research has been supported by the NASA ROSES (grant no. NNH13ZDA001N-AURAST), the JSPS KAK-ENHI (grant nos. 26287117 and 18H01285), and the Swiss National