Evaluation of black carbon estimations in global aerosol models

We evaluate black carbon (BC) model predictions from the AeroCom model intercomparison project by considering the diversity among year 2000 model simulations and comparing model predictions with available measurements. These model-measurement intercomparisons include BC surface and aircraft concentrations, aerosol absorption optical depth (AAOD) retrievals from AERONET and Ozone Monitoring Instrument (OMI) and BC column estimations based on AERONET. In regions other than Asia, most models are biased high compared to surface concentration measurements. However compared with (column) AAOD or BC burden retreivals, the models are generally biased low. The average ratio of model to retrieved AAOD is less than 0.7 in South American and 0.6 in African biomass burning regions; both of these regions lack surface concentration measurements. In Asia the average model to observed ratio is 0.7 for AAOD and 0.5 for BC surface concentrations. Compared with aircraft measurements over the Americas at latitudes between 0 and 50N, the average model is a factor of 8 larger than observed, and most models exceed the measured BC standard deviation in the mid to upper troposphere. At higher latitudes the average model to aircraft BC ratio is 0.4 and models underestimate the observed BC loading in the lower and middle troposphere associated with springtime Arctic haze. Low model bias for AAOD but overestimation of surface and upper atmospheric BC concentrations at lower latitudes suggests that most models are underestimating BC absorption and should improve estimates for refractive index, particle size, and optical effects of BC coating. Retrieval uncertainties and/or differences with model diagnostic treatment may also contribute to the model-measurement disparity. Largest AeroCom model diversity occurred in northern Eurasia and the remote Arctic, regions influenced by anthropogenic sources. Changing emissions, aging, removal, or optical properties within a single model generated a smaller change in model predictions than the range represented by the full set of AeroCom models. Upper tropospheric concentrations of BC mass from the aircraft measurements are suggested to provide a unique new benchmark to test scavenging and vertical dispersion of BC in global models.


Introduction
Black carbon, the strongly light-absorbing portion of carbonaceous aerosols, is thought to contribute to global warming since pre-industrial times.It is a product of incomplete combustion of fossil fuels and biofuels, such as coal, wood and diesel.Black carbon (BC) has several effects on climate, primarily warming, but potentially also some amount of cooling.The "direct effect" is the scattering Correspondence to: D. Koch (dkoch@giss.nasa.gov)and absorption of incoming solar radiation by the BC suspended in the atmosphere.The absorption warms the air where the BC aerosol is suspended, but the extinction of radiation results in negative forcing at the earth's surface (e.g.Ramanathan and Carmichael, 2008).The "BC-albedo effect" occurs because black carbon deposited on snow lowers the snow albedo and may therefore promote snow and ice melting (e.g. Warren and Wiscombe, 1980;Hansen and Nazarenko, 2004).BC may also have important effects on clouds by changing atmospheric stability and/or relative humidity, and thus affect cloud formation; this has been termed the "semi-direct effect" (e.g.Ackerman et al., 2000;Johnson et al., 2004).Finally, BC is a primary aerosol particle and influences the number of particles available for cloud condensation (e.g.Oshima et al., 2009); it may thus play an important role for the aerosol cloud "indirect effect".BC may also affect the indirect effect by acting as ice nuclei (e.g.Cozic et al., 2007;Liu et al., 2009).
Quantifying the effects of black carbon on climate change is hindered by several uncertainties.Emissions are uncertain because of difficulties quantifying sources and emission factors (e.g.Bond et al., 2004).Measurements of BC concentrations are uncertain because of instrumental limitations in the present measurement techniques (Andreae and Gelencser, 2006).Optical properties are uncertain since these vary with source, morphology, particle age and chemical processing.Atmospheric column aerosol absorption comes mostly from black carbon in many polluted and biomass burning regions.This absorption aerosol optical depth (AAOD) has been retrieved from satellite and an array of sunphotometer measurements, and these retrievals also help to constrain column BC.However the constraint is limited by uncertainties and assumptions in the retrievals as well as by the fact that other absorbing species besides BC are present, such as dust and organic carbon.Model simulation of BC is complicated by uncertainties in treatment of initial particle size and shape appropriate for initial release in a model gridbox, particle uptake in liquid or frozen clouds and precipitation, treatment of mixing state and optical properties.Assumptions influencing the degree of internal vs. external mixing with water-soluble particles in the accumulation mode strongly influence the aerosol absorption (Seland et al., 2008) and CCN-activation.Internal mixing of BC also affects BC lifetime, decreasing it relative to hydrophobic BC (Ogren and Charlson, 1983;Stier et al., 2006Stier et al., , 2007)).Furthermore, the BC model predictions are subject to model uncertainties that apply to any chemical model simulation, such as the accuracy of the model's meteorology including transport, clouds, and precipitation (e.g.Liu et al., 2007).
The aim of this study is to evaluate model-calculated BC in recent state-of-the-art global models with aerosol chemistry and physics, to consider their diversity and compare them with available observations.There has been concern that some models may greatly underestimate BC absorption and therefore BC contribution to climate warming (e.g.Sato et al., 2003;Ramanathan and Carmichael, 2008;Seland et al., 2008).However it is unclear whether this is a problem common to all models, whether the problem is regional or global, and the extent to which the bias is due to BC mass underestimation possibly linked to emissions underestimation, or to model treatment of optical properties leading to underestimation of BC absorption.We examine these issues by comparing the models to a variety of measurements, and working with a large number of current models.We also investigate whether biases in some regions are more problematic than in others.Finally we make use of one of the models, the GISS model (available to the first author of this paper), to consider the effects of changing BC emissions, aging, removal assumptions and optical properties.We also use the GISS model to consider the seasonality of model bias and the spectral dependence of AAOD bias.
We compare the models with several types of observations.Model surface concentrations are compared with longterm surface concentration measurements.Model BC concentration profiles are compared with aircraft measurements for several recent aircraft campaigns, spanning the North American region from the tropics to the Arctic.Column BC is assessed by comparing model AAOD with that retrieved by Dubovik and Kings (2000) inversion algorithm from AERONET sunphotometer measurements (Holben et al., 1998), as was done in Sato et al. (2003), and with OMI satellite retrievals of AAOD.We also compare column burden of BC with the AERONET-based estimation as in Schuster et al. (2005).While the measurements provide constraints for the models, in the final section we will discuss measurement uncertainties and the discrepancies among them that are apparent as we apply them to the models.

AeroCom models
We evaluate seventeen models from the AeroCom aerosol model intercomparison, an exercise that has been ongoing for the past 5 years.Model results, as well as observation datasets for validation purposes, are available at the Ae-roCom website (http://nansen.ipsl.jussieu.fr/AEROCOM/).The AeroCom intercomparison exercises included an exercise "A" with each model using its own emissions, and an exercise "B" where all models used identical emissions, and were described in detail in Textor et al. (2006Textor et al. ( , 2007)), Kinne et al. (2006) and Schulz et al. (2006).Here we work with exercise A unless only B is available for a particular model in the database.The models used year 2000 emissions and in some cases year 2000 meteorological fields.Not all diagnostics were available for all models, so we used all those available for each quantity considered.Many aspects of the models have been evaluated in previous publications, and we refer to those for general background information.Textor et al. (2006) provided a first comparison of the models in experiment A and included basic information on the models such as model resolution, chemistry, and removal assumptions.Textor et al. (2007) described the exercise B model intercomparison, and showed that model diversity was not greatly reduced by unifying emissions, indicating that the greatest model differences result from features such as meteorology and aerosol treatments rather than from emissions.Kinne et al. (2006) discussed the aerosol optical properties of the models and Schulz et al. (2006) presented the radiative forcing estimates for the models.Some of the model features most relevant for the BC simulations are provided in Table 1.As shown there, nine different BC energy emissions inventories and eleven different biomass burning emissions inventories were used.The models had a variety of schemes to determine black carbon aging from a fresh to aged particle, where aged particles may be activated into cloud water.Ten models assumed that black carbon aged from hydrophobic to hydrophilic after a fixed lifetime; five models had microphysical mixing schemes to make the particles hydrophilic, in one model the black and organic carbon are assumed to be mixed when emitted, and one model had fixed solubility.In three cases the particle mixing affected optical/radiative properties.A variety of assumptions were made about how frozen clouds removed aerosols compared to liquid clouds, ranging from identical treatments for frozen and liquid clouds to zero removal by ice clouds.Black carbon lifetime ranges from 4.9 to 11.4 days.
We note that the model versions evaluated here were submitted to the database in year 2005, and many models have evolved significantly since (e.g.Bauer et al., 2008;Chin et al., 2009;Ghan and Zaveri, 2007;Liu et al., 2005Liu et al., , 2007;;Myhre et al., 2009;Stier et al., 2006Stier et al., , 2007;;Takemura et al., 2009).Thus this study provides a benchmark at the time of the 2005 submission.

GISS model sensitivity studies
We use the GISS aerosol model to study sensitivity to factors that could impact the BC simulation.The GISS aerosol scheme used here includes mass of sulfate, sea-salt (Koch et al., 2006), carbonaceous aerosols (Koch et al., 2007) and dust (Miller et al., 2006;Cakmur et al., 2006).The sensitivity studies are described below and listed in Table 2.All simulations are performed and averaged for 3 years, after a 2-year model spin-up.The standard GISS model version for these sensitivity studies is slightly different than the version in the AeroCom database.This version does not include dust-nitrate interaction, and does not include enhanced removal of BC by precipitating convective clouds as was included in the AeroCom-database GISS model version, and therefore has a somewhat larger BC load.  (2Aging as it affects particle solubility.A= aging with time; I= aging by coagulation and condensation, particles are internally mixed; BCOC = BC assumed mixed with OC; N= none; # indicates that mixing/aging also affects particle optical properties. (3) T= Temp dependence, L= as liquid, LI = As liquid for in-cloud removal only; S= Stier et al. (2005); % is relative to water.
(6) X: models did not simulate optical properties.

Emissions
The standard GISS model uses carbonaceous aerosol energy production emissions from Bond et al. (2004).Biomass burning emissions are based on the Global Fire Emissions Database (GFED) v2 model carbon estimates for the years 1997-2006(van der Werf et al., 2003, 2004)), with the carbonaceous aerosol emission factors from Andreae and Merlet (2001).One sensitivity case had fossil and biofuel emissions from the Emission Database for Atmospheric Research (EDGAR V32FT2000, called "EDGAR32" below; Olivier et al., 2005) combined with emission factors from Bond et al. (2004) and in a second those of the International Institute for Applied Systems Analysis (IIASA) (Cofala et al., 2007).
In a third we used the largest biomass burning year from the GFED dataset, 1998.

Aging and removal
In the standard GISS model, energy-related BC is assumed to be hydrophobic initially and then ages to become hydrophilic with an e-fold lifetime of 1 day.Biomass burning BC is assumed to have 60% solubility, so that if a cloud is present, 60% these aerosols are taken into the cloud water for each half-hour cloud timestep.One sensitivity test assigned a shorter lifetime with a halved e-folding time for energy BC and 80% solubility for biomass burning.A second test assumes a longer lifetime, with doubled e-folding time for energy BC and 40% solubility for biomass burning.
Treatment of BC solubility is particularly uncertain for frozen clouds.In our standard model, BC-cloud uptake for frozen clouds is 12% of that for liquid clouds.A sensitivity run allowed 24% ice-cloud BC uptake, and another case 5%.

Aerosol size
The standard GISS model assumes the BC effective radius (cross section weighted radius over the size distribution; Hansen and Travis, 1974) is 0.08 µm.One sensitivity case increased this to 0.1µm, and another decreased it to 0.06 µm.The size primarily affects the BC optical properties.For BC sizes 0.1, 0.08 and 0.06 µm, the model global mean BC mass absorption efficiencies are 6.2, 8.4 and 12.4 and BC single scattering albedos are 0.31, 0.27 and 0.21.

Surface concentrations
Annual average BC surface concentration measurements are shown in the first panel of Fig. 1.The data for the United States are from the IMPROVE network (1995)(1996)(1997)(1998)(1999)(2000)(2001), those from Europe are from the EMEP network (2002)(2003); some Asian data from 2006 are from Zhang et al. (2009); additional data, mostly from the late 1990s, are referenced in Koch et al. (2007).These data are primarily elemental carbon, or refractory carbon, which can be somewhat different than BC (Andreae and Gelencser, 2006).The data were not screened according to urban, rural or remote environment, all available data were used; however the IMPROVE sites are generally in rural or remote locations.There are broad regional tendencies, with largest concentrations in Asia (1000-14 000 ng m −3 ), then Europe (500-5000 ng m −3 ), then the United States (100-500 ng m −3 ), then high northern latitudes (10-100 ng m −3 ) and least at remote locations (<10 ng m −3 ).
Figure 1 also shows BC surface concentrations from the GISS model sensitivity studies.The biggest impact for remote regions comes from increasing BC lifetime, either by doubling the aging rate or by reducing the removal by ice.
Decreasing the BC lifetime has a smaller effect.The larger 1998 biomass burning emissions mostly increase BC in boreal Northern Hemisphere and Mexico.EDGAR32 emissions increase BC in Europe, Arabia and northeastern Africa; IIASA emissions increase south Asian BC.
Figure 2 show the AeroCom model simulations of BC surface concentration, using model layer one from each model.Figure 2 also shows the average and standard deviations of the models.The standard deviation distribution is similar to the average.Regions of especially large model uncertainty occur where the standard deviation equals or exceeds the average, such as the Arctic.Overall the models capture the observed distribution of BC "hot spots".SPRINTARS is the only model that successfully captures the large BC concentrations in Southeast Asia (Table 3), however it overestimates BC in other regions.Unfortunately there are no long-term measurements of BC in the Southern Hemisphere biomass burning regions.
Table 3 shows the ratio of modeled to observed BC in regions where surface concentration observations are available.The regional ratios are based on the ratio of annual mean model to annual mean observed for each site, averaged over each region.Thirteen out of seventeen AeroCom models over-predict BC in Europe.Sixteen of the models underestimate Southeast Asian BC surface concentrations; however most of these measurements are from 2004-2006 and emissions have probably increased significantly since the 1990s (Zhang et al., 2009).Nine of the models overestimate remote BC; in the United States about half the models overestimate and half underestimate the observations.Overall, the models do not underestimate BC relative to surface measurements.None of the GISS model sensitivity studies show significant improvement over the standard case.The longer lifetime cases improve the model-measurement agreement in polluted regions but worsen the agreement in remote regions.

Aerosol absorption optical depth
The aerosol absorption optical depth (AAOD), or the nonscattering part of the aerosol optical depth, provides another test of model BC.AAOD is an atmospheric column measure of particle absorption, and so provides a different perspective from the surface concentration measurements.Both BC and dust absorb radiation, so AAOD is most useful for testing BC in regions where its absorption dominates over dust absorption.Therefore we focus on regions where the dust load is relatively small, for example Africa south of the Sahara Desert.However since some sites within these regions still have dust, we work with model total AAOD, including all species.
Figure 3 shows AERONET (1996AERONET ( -2006) ) sunphotometer (e.g.Dubovik et al., 2000;Dubovik and King, 2000) and OMI satellite (2005( -2007, from OMAERUV product;, from OMAERUV product;Torres et al., 2007) retrievals of clear sky AAOD.A scatter plot compares the AERONET and OMI retrievals at the AERONET sites.Table 4 (last 5 rows) provides regional average AAOD for these retrievals.The two retrievals broadly agree with one another.However, the OMI estimate is larger than the AERONET value for South America and smaller for Europe and Southeast Asia.
The AeroCom model AAOD simulations are in Fig. 4. The standard deviation relative to the average is similar to the surface concentration result; it is less than or equal to the average, except in parts of the Arctic.Table 4 gives the average ratio of model to retrieved AAOD within regions.For the ratio of model to AERONET we average the model AAOD over all AERONET sites within the region and divide by the average of the corresponding AERONET values.For OMI we average over each region in the model and divide by the OMI regional average.The average model agrees with the retrievals in eastern North America and with AERONET in Europe (ratios of modeled to AERONET in these regions are 0.86 and 0.81); it underestimates Asian (ratio is 0.67) and biomass burning AAOD (about 0.5-0.7 for AERONET and 0.4-0.5 for OMI).AAOD depends not just on aerosol load but also on optical properties, such as refractive index, particle size, density and mixing state.In Fig. 3 we show how the GISS model AAOD changes with assumed effective radius.The global mean AAOD decreases/increases 15%/27% for an increase/decrease of 0.02 µm effective radius.Note that the AeroCom model initial particle diameters (Table 1) span beyond this range (0.01 to 0.9 µm) and in some cases grow as the particles age.Increasing particle density from 1.6 to 1.8 g cm −3 in the GISS model decreases AAOD about as much as increasing particle size from 0.08 to 0.1 µm (calculated but not shown).Thus the AAOD is highly sensitive to small changes in these optical properties.
Note that models generally underestimate AAOD but not surface concentration.As we will discuss below, this could result from inconsistencies in the measurements, from model under-prediction of BC aloft, or from under-prediction of absorption.In this connection most models in the 2005-version of AeroCom did not properly describe internal mixing with scattering particles in the accumulation mode.Such mixing increases the absorption cross section of the aerosols compared to external mixtures of nucleation-and Aitken-mode BC particles.

Wavelength-dependence
Black carbon absorption efficiency decreases less with increasing wavelength compared with dust or organic carbon (Bergstrom et al., 2007).Therefore comparison of AAOD with retrievals at longer wavelength indicates the extent to which BC is responsible for biases.In Fig. 5 we compare AERONET AAOD at 550 and 1000 nm with the GISS model AAOD for the wavelength intervals 300-770 nm and 860-1250 nm respectively.Table 5 shows the ratio of the GISS model to AERONET within source regions for 1000 nm and 550 nm, for three different BC effective radii.In all regions except Europe and Asia, the ratio is even lower at the longer wavelength, confirming the need for increased simulated BC absorption, rather than other absorbing aerosols that absorb relatively less at longer wavelengths.

Seasonality
Our analysis has considered only annual mean observed and modeled BC.Here we present the seasonality of observed AAOD compared with the GISS model to explore how the bias may vary with season.Seasonal AAOD are shown for AERONET, OMI and the GISS model in Fig. 6.As in most of the models, the GISS model BC energy emissions do not include seasonal variation.Biomass burning emissions do, and dust seasonality is also very pronounced.However, transport and removal seasonal changes also cause fluctuations in model industrial source regions.Note that more AERONET data satisfy our inclusion criteria for the 3 month means compared with annual means (see figure captions), so coverage is better in some regions and seasons than in the annual dataset in Fig. 3. Table 4 (bottom 4 rows) gives regional seasonal retrieved mean AAOD.The seasonal model-to-measurement ratios are also provided in the middle portion of Table 4. Biomass burning seasonality, with peaks in JJA for central Africa (OMI) and in SON for South America, is simulated in the model without clear change in bias with season.In Asia both retrievals have maximum AAOD in MAM, which the model underestimates (i.e.ratio of model to observed is lowest in MAM).The MAM peak may be from agricultural or biomass burning not underestimed by the model emissions.The other industrial regions do not have apparent seasonality.However the model BC is underestimated most in Europe during fall and winter suggesting excessive loss of BC or missing emissions during those seasons.
Summertime pollution outflow from North America seems to occur in both OMI and the model.The large OMI AAOD in the southern South Atlantic during MAM-JJA may be a retrieval artifact due to low sun-elevation angle and/or sparse sampling; however if it is real, then the model seasonality in this region is out of phase.

Column BC load
Model simulation of column BC mass (Fig. 7) in the atmosphere should be less diverse than the AAOD since it contains no assumptions about optical properties.However there is no direct measurement of BC load.Schuster et al. (2005) developed an algorithm to derive column BC mass from AERONET data, working with the non-dust AERONET climatologies defined by Dubovick et al. (2002).The Schuster algorithm uses the Maxwell Garnett effective medium approximation to infer BC concentration and specific absorption from the AERONET refractive index.The Maxwell Garnett approximation assumes homogeneous mixtures of small insoluble particles (BC) suspended in a solution of scattering material.Such mixing enhances the absorptivity of the BC.Schuster et al. (2005)   of the models (see Table 1); however a lower value would increase the retrieved burden and worsen the comparison with the models.An updated version of the AERONET-derived BC column mass is shown in Figs.7 and 8.For this retrieval, a BC refractive index of 1.95-0.79iwas assumed, within the range recommended by Bond and Bergstrom (2006), and BC density of 1.8 g cm −3 .In the retrievals, most continental regions have BC loadings between 1 and 5 mg m −2 , with mean values for North America (1.8 mg m −2 ) and Europe (2.1 mg m −2 ) being somewhat smaller than Asia  industrial region retrievals are larger than the previous estimates of Schuster et al. (2005), which were 0.96 mg m −2 for North America, 1.4 mg m −2 for Europe and 1.6 mg m −2 for Asia.The biomass burning estimates are similar to the previous retrievals.The differences may be due to the larger span of years and sites in the current dataset.
Figure 7 shows the AeroCom model BC column loads.The model standard deviation relative to the average is similar to the surface concentration (Fig. 2) and the AAOD (Fig. 4).The model column loads are smaller than the Schuster estimate.Some models agree quite well in Europe, Southeast Asia or Africa (e.g.GOCART, SPRINTARS, MOZGN, LSCE, UMI).Model to retrieved ratios within regions are presented in Table 6.This ratio is generally smaller than model to retrieved AAOD in North America and Europe.The inconsistencies among the retrievals would benefit from detailed comparison with a model that includes particle mixing and with model diagnostic treatment harmonized to the retrievals.
Figure 8 has GISS BC column sensitivity study results.The load is affected differently than the surface concentrations (Fig. 1).The Asian IIASA emissions are larger than Bond (Bond et al., 2004) or EDGAR32, so that the outflow across the Pacific is greater.The large-biomass burning case (1998) also results in greater BC transport to Northwestern US in the column.Increasing BC lifetime increases both BC surface and column mass more than the other cases; however it has a larger impact on Southern Hemisphere load than surface concentrations.The reduced ice-out case has somewhat smaller impact on the column than at the surface, especially for some parts of the Arctic.The reduced ice-out thus has an enhanced effect at low levels, below ice-clouds, in the Arctic, while having a relatively small impact on the column.Modest model improvements relative to the retrieval occur for the case with increased lifetime and for the IIASA emissions (Table 6).

Aircraft campaigns
We consider the BC model profiles in the vicinity of recent aircraft measurements in order to get a qualitative sense of how models perform in the mid-upper troposphere and to see how model diversity changes aloft.The measurements were made with three independent Single Particle Soot absorption Photometers (SP2s) (Schwarz et al., 2006;Slowik et al., 2007)   (Fig. 10) over North America.Details for the campaigns are provided in Table 7.The SP2 instrument uses an intense laser to heat the refractory component of individual aerosols in the fine (or accumulation) mode to vaporization.The detected thermal radiation is used to determine the black carbon mass of each particle (Schwarz et al., 2006).The U. Tokyo and the NOAA data have been adjusted 5-10% (70% during AVE-Houston) to account for the "tail" of the BC mass distribution that extends to sizes smaller than the SP2 lower limit of detection.This procedure has not been performed on the U. Hawaii data, however this instrument was configured to detect smaller particle sizes so that the unmeasured mass is estimated to be less than about 13% (3% at smaller and 10% at larger sizes).The aircraft data in each panel of Figs. 9 and 10 are averaged into altitude bins along with standard deviations of the data.When available, data mean as well as median are shown.For cases in which significant biomass smoke was encountered (e.g.Figs.9d, 10d and e), the median is more indicative of background conditions than the mean.However, for the spring ARCPAC campaign (Fig. 10c), four of the five flights encountered heavy smoke conditions, so in this case profiles are provided for the mean of the smoky flights and the mean for the remaining flight which is more representative of background conditions.The ARCPAC NOAA WP-3D aircraft thus encountered heavier burning conditions (Fig. 10c; Warneke et al., 2009) than the other two aircraft for the Arctic spring (Fig. 10a and b).Model profiles shown in each panel are constructed by averaging monthly mean model results at several locations along the flight tracks (map symbols in Figs. 9 and 10).We tested the accuracy of the model profile-construction approach using the U. Tokyo data and the GISS model, by comparing a profile constructed from following the flight tracks within the model fields with the simpler profile construction shown in Fig. 10a.The two approaches agreed very well except in the boundary layer (the lowest 1-2 model levels).Potentially more problematic is the comparison of instantaneous observational snapshots to model monthly means.Nevertheless the comparison does suggest some broad tendencies.
The lower-latitude campaign observations (Fig. 9) indicate polluted boundary layers with BC concentrations decreasing 1-2 orders of magnitude between the surface and the mid-upper troposphere.Some of the large data values can be explained by sampling of especially polluted conditions.For example, the CARB campaign (Fig. 9d Model profiles in approximate SP2 BC campaign locations in the tropics and midlatitudes, averaged over the points in the map (bottom).Observations (black curves) are average for the respective campaigns, with standard deviations where available.The Houston campaign has two profiles measured two different days.Mean (solid) and median (dashed) observed profiles are provided for (d).The markers in the map inset denote the location of model profiles in these comparisons with the aircraft measurements that are detailed in Table 7.
unusually heavy biomass burning.The models used climatological biomass burning and would not have included these particular fire conditions.Nevertheless, overall the datasets show remarkably consistent mid-tropospheric mean BC levels of about 0.5-5 ng kg in the tropics and midlatitudes.With the exception of the CARB campaign, the models generally exceed the upper limit of the standard deviation of the data.For CARB, most models are within the data standard deviations up to about 500 mb (Fig. 9d), while about half exceed the upper limit of the observed standard deviation above 500 mb.
The spring-time Arctic campaigns observed maximum BC above the surface (Fig. 10a-c), which may occur from two mechanisms.First, background "Arctic haze" pollution is thought to originate at lower latitudes, and is transported to the Arctic by meridionally lofting along isentropic surfaces (Iversen, 1984;Stohl et al., 2006).Most of the observed profiles and the model results would reflect those conditions.Alternatively, BC could be injected into the mid-troposphere near its source by agricultural or forest fires and then advected into the Arctic.This is apparently the case for the ARCPAC measurements (Fig. 10c) that probed Russian fire smoke (Warneke et al., 2009).In both cases, the pollution levels aloft during springtime are substantial and comparable to those levels observed in the polluted boundary layer at midlatitudes.Thus at the lower latitudes BC decreases with altitude, whereas at these higher latitudes it increases toward the middle troposphere during springtime.Model profile diversity is especially great in the Arctic, as discussed in previous sections.Many of the models do have profile maximum BC above the surface, but most of the springtime peak values are smaller in magnitude than the aircraft measurements.9 but for high latitude profiles.Mean (solid) and median (dashed) observed profiles are provided except for (c) the ARCPAC campaign has distinct profiles for the mean of the 4 flights that probed long-range biomass burning plumes (dashed) and mean for the 1 flight that sampled aged Arctic air (solid).less than 20 ng kg −1 yet most of them are within the lower limit of the observed standard deviation.Overall, most models are underestimating poleward transport, are removing the BC too efficiently, or are not confining pollution sufficiently to the lowest model levels due to excessive vertical diffusion.
The high-latitude summer ARCTAS campaigns encountered heavy smoke plumes for part of their campaign, so the mean (Fig. 10 d-e, solid black) values are less characteristic of typical conditions than the median (dashed).Most models are within the observed standard deviation for the summertime data however overestimate relative to median BC above 500 mb.Many of the models have little change in estimates between spring and summer (e.g.compare Fig. 10b and d), while the observed background conditions are less polluted in summer.Similar to the lower latitudes, the models generally overestimate BC in the upper troposphere (Fig. 10a and  d) in the Arctic.On the other hand, the UTLS measurements in the Arctic region are sparse and may not be statistically significant.
The ratio of model to observed BC over the profiles for Fig. 9 (south) and Fig. 10 (north), excluding the bottom 2 layers of each model, are given in Table 8; we use median observed values for campaigns that encountered significant biomass burning (Figs.9d, 10d and e) and for the ARCPAC spring (Fig. 10c) we use the background profile.The average model ratio is 7.9 in the south and 0.41 in the north.In general, the ordering of model concentrations in the midtroposphere is the same across latitudes, so the models with small upper tropospheric concentrations in the tropics also are smaller in the Arctic.Typically those that are most successful compared to the observations at low latitudes do not have large enough concentrations in the lower and middle troposphere in the Arctic.This could result from failure to distinguish between removal of BC by convective and stratiform clouds, with convective clouds providing deep-column cleaning of particles primarily at lower latitudes.The models may also fail to resolve pollution transport events needed to bring pollution to the Arctic.However, some models are fairly versatile; for example the MIRAGE, UMI and GISS models attain large lower tropospheric concentrations in the Arctic yet relatively low concentrations aloft at low latitudes; these are within a factor of 4 of observed in the south and a factor of 3 in the north (Table 8).Some of the models have a strong minimum at around 300-400 hPa, probably due to effective scavenging in a region where condensable water tends to be removed by rain.This seems to work well in the lower latitude regions, however it apparently should not apply at the higher latitude springtime where colder clouds dominate.
We also made profiles for the GISS sensitivity simulations.However the variability among these cases is much smaller than for the AeroCom models in Figs. 9 and 10.Doubling or halving the GISS BC aging rate generally made the lowest and highest concentrations, respectively, throughout the column, however the difference was less than a factor of two from the standard case.In the Arctic near the surface, the case with increased ice-out had the lowest concentration, but again the change was not large.(Schwarz et al., 2006); University of Tokyo: Yutaka Kondo, Nobuhiro Moteki (Moteki and Kondo, 2007;Moteki et al., 2007); University of Hawaii: Antony Clarke, Cameron McNaughton, Steffen Freitag (Clarke et al., 2007;Howell et al., 2006;McNaughton et al., 2009;Shinozuka et al., 2007).

Summary of model-observation comparison
The average AeroCom model performance compared to each measurement type for each region is given in Table 9.The average model bias tends to be high compared with surface concentration measurements, in all regions except Asia where most measurements are relatively recent.The average model bias tends to be low compared with all column retrievals except for the OMI estimate for Europe.The model bias is especially low in biomass burning regions of Africa and South America.It is also likely that anthropogenic emissions are underestimated, especially in South America (e.g.Evangelista et al., 2007).The rest-of-world bias is quite low for the column quantities; however the retrievals tend to have greater difficulty for small aerosol optical depth conditions (e.g.Dubovik et al., 2002) and are therefore biased high.A detailed analysis in which the model diagnostics are screened with the same criteria as AERONET would help to resolve this.The remote BC load is sensitive to the BC aging or mixing rate, so resolving the discrepancy is important.It is possible that model aging rate is overestimated in the models, resulting in excessive removal and low model bias away from source regions.
We have only considered SP2 aircraft measurements over North America, and generally the models are larger than observed both at the surface and in the free troposphere.The models underpredict AAOD and the Schuster-BC in North America, but the comparison with aircraft data suggests that the models are actually overestimating middle-upper atmospheric BC.It therefore seems that the optical properties in the models provide less absorption than they should, or that the retrievals overestimate AAOD, or that the treatment of the model diagnostics are not sufficiently harmonized to the retrieval.

Discussion and conclusions
Our comparison of AeroCom models and observations reveals some large BC discrepancies and diversities.To some extent the comparison of AeroCom and GISS sensitivity models can be used to infer which parameters might improve performance.
The AeroCom models use a variety of BC emission inventories (Table 1).In the GISS sensitivity studies we used three recent inventories and did not see dramatic differences in the model results, however the developers of these inventories shared similar energy and emission factor information so it is not surprising that the inventories are not dramatically different, although for specific regions there are some large differences.Furthermore, this is consistent with the Textor et al. (2007) comparison of model experiments with and without different emissions in which model diversity was not greatly reduced if the emissions were harmonized.It therefore seems that the lowest order model biases require either changes to BC in most inventories, or changes to other model characteristics.
The BC inventories continue to improve as information on technologies and activities become available, especially in developing countries.In addition, it seems that model results could derive as much benefit from information about optical properties specific for individual emission sources, such as particle size, density and mixing state appropriate for model grid-box-scale sources.Biomass burning emission estimations are also improving.For example, the latest GFED estimates rely on satellite observations of burned area and fire counts in deriving the burning history (Giglio et al., 2006;van der Werf et al., 2006).Here we only considered seasonal variability in the GISS model, and it seemed to agree reasonably well compared with retrieved AAOD seasonality in the biomass burning regions.On the other hand, nearly all models underestimate column BC in these regions, especially in South America, suggesting that the emission factors (currently based on Andreae and Merlet, 2001) or optical properties for the smoke are not generating enough BC and/or particle absorption.Spackman et al. (2008) reported BC emission factors from fresh biomass burning plumes that were 25 to 75% higher than those reported in Andreae and Merlet (2001), consistent with the model underestimations noted here.Long-term in situ measurements co-located with AERONET sites could help resolve which of these is in error.Many models are developing sophisticated aerosol microphysical schemes, including information on nucleation, evolving particle size distributions, particle coagulation and mixing by condensation of gases onto particles.The added physical treatment also allows more physical representation of particle hygroscopicity, optical properties, uptake into clouds, etc. However it is challenging to increase physical sophistication in the schemes while validating the schemes using field information on how such particles behave in the real world.The assessment here includes some constraint on final BC properties.While microphysical schemes are essential for simulating particle number and size distribution, it is not apparent that they improve on BC simulations as examined here.Yet the schemes might benefit from increased sophistication, such as including evolution of particle morphology, effect of internal mixing on particle absorption, and density (Stier et al., 2007).Bond and Bergstrom (2006) have provided some straightforward recommended improvements for BC models, but many models presented here had not yet included these.Bond and Bergstrom (2006) suggested a typical fresh particle mass absorption cross section (MABS, essentially the column BC absorption divided by the load) of about 7.5 m 2 g −1 and that this should probably increase as particles age.Nine of the models have MABS larger than 6.7 m 2 g −1 .Enhancement of absorption from BC coating was recommended to be about a factor of 1.5 and this had not been included in the models.A recent study with the UIOCTM did include a 1.5 enhancement of MABS for aged BC and found increased radiative forcing of 28% (Myhre et al., 2009).Bond and Bergstrom (2006) recommended refractive index values larger than the value used in older models, i.e. about 1.9-0.7iat 550 nm; only three of the models have values larger than 1.9-0.6i.Bond and Bergstrom also pointed out that many models have underestimated particle density and recommend a value of about 1.7-1.9g cm −3 .Eleven of the models have densities lower than this range and would have weaker absorption if the density was increased to the recommended level.In summary, including particle mixing or core-shell configuration, and increasing refractive index should increase model particle absorption, while increasing density will decrease AAOD.
Model treatment of BC uptake by clouds is determined by assuming a fixed uptake rate, or by assuming the BC becomes hydrophilic following some aging time, or from a microphysical scheme that includes mixing with soluble species.Relatively little effort has been given to treatment of BC uptake or removal by frozen clouds and precipitation.Some field information is available, e.g.Cozic et al. (2007), and although more observations are needed, this is a process models need to consider more carefully.The comparison of models with aircraft data (Figs.9 and 10) suggests that some upper-level removal processes may be missing.Alternatively the model vertical mixing may be excessive.It would be useful for the models to compare other species with available aircraft observations to learn whether the bias is primarily for BC or occurs also for other species.The GISS model is fairly successful at capturing the decrease with altitude for SO 2 , sulfate, DMS and H 2 O 2 (Koch et al., 2006).We have had some success decreasing the BC aloft in the GISS model by enhancing removal by convective clouds.The ECHAM5 model has found improved vertical transport results with increased vertical resolution.Note however, that decreasing the load of BC diminishes the AAOD and worsens that bias.
An obvious difficulty in applying the various datasets for model constraint is the uncertainty in the data.Thorough discussion of this topic is beyond the scope of this study but we briefly summarize some issues here.There are uncertainties in surface measurements and AAOD retrievals, failure to accurately account for additional absorbing species, failure to diagnose the model like the retrievals, and mismatch of periods for observations and model.
Surface concentration measurements are made by a variety of techniques, including various thermal and optical approaches, summarized in e.g.Bond and Bergstrom (2006).This variety contributes to bias scatter in the model evaluations.In particular, the reflectance method used for IM-PROVE is known to measure higher elemental carbon than the transmittance method used by EMEP (Chow et al., 2001), which may explain some of the difference in modelmeasurement comparisons between these regions.
AERONET and OMI retrievals of AAOD use uniform techniques for their respective retrievals, however they have their own uncertainties.The AERONET retrieval algorithm (Dubovik and King, 2000) derives detailed size distribution and spectrally dependent complex refractive index by fitting direct and transmitted diffuse radiation measured by groundbased sun-photometers (Holben et al., 1998).No microphysical model is assumed for size distribution or for complex refractive index.Then the values of AAOD are calculated using the combination of size distribution and index of refraction that provide best fit to the measurements.The major limitation for the retrieval of aerosol absorption is caused by the limited accuracy of the direct Sun radiation measurements (Dubovik et al., 2000).As shown by Dubovik et al. (2000), the retrievals of aerosol absorption and Single Scattering Albedo (SSA = scattering/(scattering + absorption)) are unreliable at low aerosol loading conditions, with AAOD tending to be biased high but with accuracy of 0.01.
Although no similar limitation has been documented for the OMI retrieval, generally the accuracy of OMI retrievals (as for retrievals by any passive satellite sensors) is also lower for smaller aerosol loading conditions since the aerosol signal to radiometric noise ratio decreases.The OMI retrieval also relies on a predetermined limited set of aerosol models and the OMI algorithm chooses the model as part of the retrieval.Then the AAOD as well as other aerosol parameters (including Angstrom parameter) are estimated using the retrieved aerosol optical depth (AOD) and chosen model.Obviously, the incorrect choice of the aerosol model would affect the retrievals of both AAOD and angstrom parameter.In contrast, the AERONET retrieval uses transmitted radiation (not reflected as registered by OMI) and the angstrom parameter is derived from direct AOD measurements (not an aerosol model).
Both AERONET and OMI data are for daytime and clearsky conditions only, and the model results used here are all-day and all-sky.Ideally, model diagnostics should be screened using similar criteria.Within the GISS model we have found that all-sky and clear-sky AAOD do not differ greatly since the absorbing aerosols are assumed to be unaffected by relative humidity.Models that include aerosol mixing would probably have larger differences in AAOD for all-sky and clear-sky conditions.
The AAOD measurements include absorption by dust and "brown" or absorbing organic carbon.We have included all species in the model AAOD estimates, however we have not attempted to address shortcomings in dust simulations, and the models generally do not yet include significant absorption for organic carbon.However we have focused on regions where carbonaceous aerosols dominate over dust absorption.Furthermore, dust and absorbing organics absorb relatively less at longer wavelengths compared with BC.When we used the GISS model to consider the spectral dependence of the AAOD bias we found that the bias is generally independent of wavelength, suggesting BC is the primary source of bias.A final difficulty is mismatch between dates for measurements and model emissions.Other than the aircraft measurements, we used long-term measurements (one year or more) but the various measurements were taken from a variety of years.In regions where BC has been changing significantly, we may expect differing biases depending on the measurement and its date.The models generally used emissions for the 1990s.AERONET measurements are from 1996-2006, OMI from 2005-2007, IMPROVE from 1990s to 2002, EMEP for 2003, many Asian surface concentration data are from 2006, and the SP2 measurements are for 2004-2008.Over the USA, there do not appear to be significant trends in the IMPROVE data for sites that have long-term surface concentration measurements (not shown).The other datasets are too short to observe significant trends.Some of the model bias in regions such as southeast Asia, where BC may be increasing during the past 2 decades (Bond et al., 2007), may be due to a mismatch of emissions and measurement dates.
We may infer model underestimation of BC radiative forcing from the underestimation of AERONET AAOD.According to Table 9, the average model underestimates AAOD compared with AERONET by less than a factor of 2. The average AeroCom model BC radiative forcing is +0.25 Wm −2 (Schulz et al., 2006).If we assume that the radiative forcing is underestimated by the same amount as AAOD, then the average of AeroCom models would give a BC radiative forcing closer to +0.5 Wm −2 .This enhancement would put the average model estimate close to the +0.55 Wm −2 model estimate of Jacobson (2001), who used a BC-core-shell configuration.However the enhanced estimate is smaller than some other recent high estimates such as +0.8 Wm −2 of Chung and Seinfeld (2002) for internally mixed BC, or the retrieval-based estimates +1.0 Wm −2 of Sato et al. (2003) and +0.9 Wm −2 of Ramathan and Carmichael (2008).
In spite of the uncertainties in models and measurements, our study has revealed some broad tendencies and biases in model BC simulations.Compared to column estimates of load and AAOD, the models generally underestimate BC.This bias is worst in biomass burning regions where the ratio of average model to retrieved is 0.4 to 0.7, remote regions (0.2 to 0.5) and southeast Asia (0.6 to 0.7).To some extent the bias can be attributed to differing times for emissions and measurements in southeast Asia, and to AERONET AAOD overestimation in remote regions.On the other hand, the models do not generally underestimate BC surface concentrations.And compared to aircraft measurements at lowmid latitudes of North America the models generally agree near the surface, but overestimate the BC aloft, especially in the mid-upper troposphere.At high latitudes many models underestimate BC in the lower and middle troposphere.The model-aircraft comparison suggests that models allow excessive vertical transport of BC, and may be lacking sufficient removal by precipitating clouds; they also probably lack sufficient low-level pole-ward transport.Unfortunately, enhancing BC rainout, especially at middle latitudes, is likely to diminish the BC available to travel pole-ward.Furthermore it will worsen the underestimate relative to AAOD.
This study suggests several future research directions to help close the gap between models and observations.To improve treatment of BC optical properties, models should include the effect of mixing with other species and increase refractive index as recommended by Bond and Bergstrom (2006) or approximate this effect by enhancing MABS for aged BC by 1.5 (Bond et al., 2006).Development of emissions inventories with size information and emission estimates of absorbing organic aerosols for model simulations should also be a priority.Models should include diagnostic simulators that screen in a manner like AERONET and OMI, e.g.only using sufficiently large AOD and clear-sky daytime conditions.Although the GISS model AAOD did not differ much for all-sky and clear-sky conditions, models with aerosol mixing may have larger impacts from changes in relative humidity.Important additional constraint would be provided by aircraft measurements over Eurasia, the oceans and the biomass burning regions.Furthermore it would be useful to compare models and aircraft measurements using model emissions for specific field seasons and comparing along flight track, particularly for campaigns that sample biomass burning.And long-term surface measurements colocated with AERONET stations, especially in remote and biomass burning regions, could help interpretation of model biases in these regions.

Fig. 3 .
Fig.3.Top: Aerosol absorption optical depth, AAOD, (x100) from AERONET (at 550 nm; upper left), OMI (at 500 nm; upper right); middle: scatter plot comparing OMI and AERONET at AERONET sites; and bottom: GISS sensitivity studies for effective radius 0.08, 0.1, and 0.06 µm for 300-770 nm.The AERONET data are for 1996-2006, v2 level 2, annual averages for each year were used if >8 months were present, and monthly averages for >10 days of measurements.The values at 550 nm were determined using the 0.44 and 0.87 µm Angstrom parameters.The OMI retrieval is based on OMAERUVd.003daily products from 2005-2007 that were obtained through and averaged using GIOVANNI(Acker and Leptoukh, 2007).

Fig. 7 .
Fig. 7. Annual mean column BC load for 9 AeroCom models, mg m −2 .The Schuster BC load is based on AERONET v2 level 1.5; annual averages require 12 months of data, data include all AERONET up to 2008.

Fig. 8 .
Fig. 8. Annual mean column BC load for GISS sensitivity simulations and the Schuster BC retrieval (see Fig. 7).
Fig. 9.Model profiles in approximate SP2 BC campaign locations in the tropics and midlatitudes, averaged over the points in the map (bottom).Observations (black curves) are average for the respective campaigns, with standard deviations where available.The Houston campaign has two profiles measured two different days.Mean (solid) and median (dashed) observed profiles are provided for (d).The markers in the map inset denote the location of model profiles in these comparisons with the aircraft measurements that are detailed in Table7.

Fig. 10 .
Fig.10.Like Fig.9but for high latitude profiles.Mean (solid) and median (dashed) observed profiles are provided except for (c) the ARCPAC campaign has distinct profiles for the mean of the 4 flights that probed long-range biomass burning plumes (dashed) and mean for the 1 flight that sampled aged Arctic air (solid).

Table 3 .
Average ratio between model and observed BC surface concentrations within regions for AeroCom models and GISS sensitivity studies.Number of measurements is given for each region.Bottom row is observed average concentration in ng m −3 .Regions defined as N Am (130 W to 70 W; 20 N to 55 N), Europe (15 W to 45 E; 30 N to 70 N), Asia (100 E to 160 E; 20 N to 70 N).

Table 4 .
Average ratio of model to retrieved AERONET and OMI Aerosol Absorption Optical Depth at 550 nm within regions for AeroCom models and GISS sensitivity studies.Number of measurements is given for AERONET.Annual and seasonal measurement values are given in last 5 rows.Regions defined as N Am (130 W to 70 W; 20 N to 55 N), Europe (15 W to 45 E; 30 N to 70 N), Asia (100 E to 160 E; 30 N to 70 N), S Am (85 W to 40 W; 34 S to 2 S), Afr (20 W to 45 E; 34 S to 2 S).

Table 5 .
The average ratio of GISS model to AERONET within regions for 1000 nm and 550 nm.

Table 6 .
Schuster et al. (2005) to retrieved AERONET BC column load using theSchuster et al. (2005)algorithm, within regions for AeroCom models and GISS sensitivity studies.Last row has average retrieved value in mg m −2 .Number of measurements is given for each region.

Table 7 .
Single Particle Soot Photometer (SP2) Measurements of Black Carbon Mass from Aircraft.
1 AVE Houston: NASA Houston Aura Validation Experiment; CR-AVE: NASA Costa Rica Aura Validation Experiment; TC4: NASA Tropical Composition, Cloud, and Climate Coupling; ARCTAS: NASA Arctic Research of the Composition of the Troposphere from Aircraft and Satellites; CARB: NASA initiative in collaboration with California Air Resources Board; ARCPAC: NOAA Aerosol, Radiation, and Cloud Processes affecting Arctic Climate 2 NOAA: David Fahey, Ru-shan Gao, Joshua Schwarz, Ryan Spackman, Laurel Watts

Table 8 .
Ratio of model to observed aircraft campaigns for south (Fig.9) and north (Fig.10using the median values for Figs.9d and 10d-e).The lowest 2 model layers are not used.

Table 9 .
Summary table: ratio of average model to observed/retrieved within regions, from Tables3, 4 and 6.