Global and regional modeling of clouds and aerosols in the marine boundary layer during VOCALS: the VOCA intercomparison

A diverse collection of models are used to simulate the marine boundary layer in the southeast Pacific region during the period of the October–November 2008 VOCALS REx (VAMOS Ocean Cloud Atmosphere Land Study Regional Experiment) field campaign. Regional models simulate the period continuously in boundary-forced free-running mode, while global forecast models and GCMs (general circulation models) are run in forecast mode. The models are compared to extensive observations along a line at 20 S extending westward from the South American coast. Most of the models simulate cloud and aerosol characteristics and gradients across the region that are recognizably similar to observations, despite the complex interaction of processes involved in the problem, many of which are parameterized or poorly resolved. Some models simulate the regional low cloud cover well, though many models underestimate MBL (marine boundary layer) depth near the coast. Most models qualitatively simulate the observed offshore gradients of SO2, sulfate aerosol, CCN (cloud condensation nuclei) concentration in the MBL as well as differences in concentration between the MBL and the free troposphere. Most models also qualitatively capture the decrease in cloud droplet number away from the coast. However, there are large quantitative intermodel differences in both means and gradients of these quantities. Many models are able to represent episodic offshore increases in cloud droplet number and aerosol concentrations associated with periods of offshore flow. Most models underestimate CCN (at 0.1 % supersaturation) in the MBL and free troposphere. The GCMs also have difficulty simulating coastal gradients in CCN and cloud droplet number concentration near the coast. The overall performance of the models demonstrates their potential utility in simulating aerosol–cloud interactions in the MBL, though quantitative estimation of aerosol–cloud interactions and aerosol indirect effects of MBL clouds with these models remains uncertain.


Introduction
The southeast Pacific (SEP) region has an unusually extensive and persistent low-cloud cover supported by relatively low sea-surface temperatures (SSTs) due to coastal Published by Copernicus Publications on behalf of the European Geosciences Union.
upwelling, strong subsidence, and high static stability in the lower troposphere.There are typically strong east-west aerosol gradients in this marine boundary layer (MBL) between relatively pristine conditions in air masses advecting from the South Pacific Ocean and more polluted air near the west coast of South America (e.g., Bretherton et al., 2010;Allen et al., 2011).Anthropogenic aerosol and aerosol precursor emissions from industrial, agricultural, and transportation sources are incorporated into the MBL directly or through intermittent free-tropospheric flow over the ocean and subsequent entrainment into the MBL (e.g., Clarke et al., 2010;George et al., 2013).
The persistent clouds and aerosol gradients make the SEP an attractive test bed for evaluating how well modern forecasting and climate models can simulate aerosol-cloud interactions, a key uncertainty in understanding the 20th century climate record and an important issue for climate projection (Solomon et al., 2007).This was a central motivation for the Variability of the American Monsoon Systems (VA-MOS) Ocean Cloud Atmosphere Land Study Regional Experiment (VOCALS-REx) field campaign, which took place in the SEP region during October and November 2008 (Wood et al., 2011a).
In addition to the features given above, many factors coincide to make the SEP unique in terms of its persistent cloud deck.The subsiding air above the MBL is also exceptionally dry, enhancing radiative cooling of the MBL clouds.The temperature inversion at the top of the MBL in the region is extremely strong, commonly exceeding 12 K during the austral spring.Another prominent feature influencing regional meteorology and climate is the Andes mountain range, which forms a long, mostly north-south barrier to east-west flow in the MBL (Richter and Mechoso, 2006).This feature together with the strong inversion controls the circulations that affect aerosol and chemical transport pathways.The meteorology of the region in the austral spring season is dominated by a subtropical anticyclone.The flow in the MBL (Fig. 1) is typically southerly near the coast, turning southeasterly away from the coast.There is a climatological advection of coastal air to the northwest, away from the coast and towards higher SSTs.The MBL deepens as it is advected offshore over higher SSTs.This flow pattern also carries aerosols from coastal anthropogenic and natural sources offshore.Aerosols generated farther inland and/or lofted upwards may also enter the SEP MBL through advection offshore at higher levels and entrainment into the MBL top (Saide et al., 2012;George et al., 2013).
Skillful simulation of aerosol-cloud interaction in the MBL requires a realistic representation of other boundary layer cloud processes in models.However, the accurate simulation of boundary layer clouds such as stratocumulus and trade cumulus is a long-standing challenge in climate and weather forecast modeling.The Pre-VOCALS Assessment (PreVOCA, Wyant et al., 2010) was designed to document and evaluate a wide range of models in the SEP region and to provide a benchmark for future model comparisons to VOCALS-REx observations.PreVOCA examined simulations of the VOCALS-REx study region for October 2006 using a collection of 15 regional and global models and compared them with satellite data and ship-based climatologies available before VOCALS-REx.Most of these models had no explicit representation of aerosols.Many of the models produced serious biases in the time-mean geographic variability of low cloud in this region.In most models, the simulated MBL was too shallow near the coast.Nevertheless, a subset of models simulated the space-time distribution of cloud cover and thickness quite well.
The extensive in situ sampling during VOCALS-REx, especially from aircraft, provides more detailed and direct comparisons for models than were available for PreVOCA.These include comparisons of aerosol and chemical constituents (Bretherton et al., 2010;Allen et al., 2011) as well as MBL vertical structure and precipitation.This data set is uniquely suited to testing simulations of MBL clouds, aerosols, and their interactions.The VOCALS Assessment (VOCA) was organized to capitalize on this opportunity.
Participating models simulated the SEP during the month of VOCALS-REx when aircraft observations were being made.Sixteen modeling groups submitted simulations from global climate models, global operational forecast models, and regional models.In this study we focus on the subset of nine VOCA models that have some representation of aerosols and their effects on clouds.
There are a number of prior modeling studies of the SEP during VOCALS REx.Abel et al. (2010) evaluate the simulations of cloud cover, MBL depth, and precipitation over the entire REx period as well as over the diurnal cycle using a limited area model (LAM) configuration of the UK Met Office Unified Model.Q. Yang et al. (2011) compare their WRF-Chem (Weather Research and Forecasting model coupled with Chemistry) simulations for VOCA with observations and find that their simulations with interactive aerosols perform better than those with a passive treatment of aerosols.Their follow-up modeling study (Yang et al., 2012) quantified the relative impacts of regional anthropogenic and oceanic emissions on aerosol properties, cloud macro-and microphysics, and cloud radiative forcing over the SEP during VOCALS, and reported a large feedback of aerosol concentration on precipitation and aerosol lifetime over the clean ocean environment.Saide et al. (2012), using a different configuration of WRF-Chem, compare their VOCA simulations with observations over the entire study period as well as over shorter episodes.They also find that aerosol indirect effects play an important role in their simulations, and that their treatment of aerosol wet deposition has a strong impact on their results.George et al. (2013) used WRF-Chem in a similar configuration to their runs presented here to study multiday "hook" events, where polluted continental air is carried offshore and influences stratocumulus clouds via aerosol indirect effects.
This paper addresses several questions: Can the models represent the geographical contrasts in cloud microphysical properties in the SEP?How well do the geographical and vertical concentrations of aerosols agree?How well do the models represent the impacts of these aerosols in the clouds?What problems are common to many models?Do these observations provide a good benchmark for aerosol-cloud interaction?
We will describe the setup of VOCA in Sect. 2. Section 3 compares the model results with each other and with observations.The results of the comparison will be discussed in Sect. 4 and conclusions presented in Sect. 5. Detailed descriptions of the models used are given in the Appendix.

Case setup
VOCA covers the time interval from 00:00 UTC (universal time coordinated) 15 October 2008 through 00:00 UTC 16 November 2008, the period of VOCALS REx intensive airborne observations.The outer study region for VOCA is shown in Fig. 1.The inner domain outlined in black extends from 12 to 35 • S and 68.5 to 88 • W, which includes the region of most of the REx research flights including the large set of flights along 20 • S from the coast to 85 • W. Simulation output data in the outer and inner region were horizontally averaged to a 1 • ×1 • grid and 0.25 • ×0.25 • grid, respectively, by the modeling groups.The models were not required to match their simulation domains to the outer and inner domains, or to necessarily include the outer study domain; the regional models in this comparison did not cover this outer study domain due to computational demands.Each model submitted data on its native vertical levels to preserve vertical structure for analysis.The data were submitted with 3 h time resolu-tion, with some data fields averaged over 3 h intervals, and other fields provided at 3 h snapshots.The experiment specification can be found at http://www.atmos.washington.edu/~mwyant/vocals/model/VOCA_Model_Spec.htm.
A diverse group of models is represented in this study.They include global general circulation models (GCMs): the National Center for Atmospheric Research (NCAR) Community Atmosphere Model versions 4 and 5, (CAM4 and CAM5, respectively), and the NOAA Geophysical Fluid Dynamics Atmospheric Model 3 (GFDL AM3).Simulations using global weather forecast models were provided by the European Centre for Medium-Range Weather Forecasts (ECMWF) and the UK Met Office (UKMO).Regional simulations using WRF-Chem were submitted independently by research groups from the University of Iowa, the Pacific Northwest National Laboratory, and the University of Washington (hereafter labeled IOWA, PNNL, and UW, respectively).Another regional simulation included in this study was produced by the International Pacific Research Center (IPRC) with their Regional Atmospheric Model (iRAM).Detailed descriptions of these models are given in the Appendix.
Table 1 shows a list of the VOCA simulations analyzed in this study and many of their important parameters and characteristics.All of the listed global models were run in forecast mode, i.e., as a series of short simulations initialized at subsequent times from externally specified conditions.This initialization constrains the large-scale environment while still allowing the model to develop internally consistent representations of cloud and boundary layer structure.Forecast mode has proven to be a good framework for identifying climate model biases (e.g., Phillips et al., 2004;Boyle et al., 2008;Hannay et al., 2009).Daily forecasts were provided by the modeling groups (twice-daily for the UKMO model), and for each model, data from these were stitched together to cover the REx period.The global weather forecast models used a data assimilation/forecast cycle that did not have a large initialization shock for boundary layer cloud, so the first forecast period (which presumably has the most accurate meteorological fields) was used in our study (e.g. 0-12 h for UKMO).The global climate models were initialized from ECMWF high-resolution global analyses produced for the Year of Tropical Convection (YOTC), so there was a spinup period for each model to adjust to this analysis.For such models, a later forecast period was chosen for analysis.The global models each utilize different land emission schemes.
All of the regional models were run continuously in freerunning mode, with forcing at the lateral boundaries.The lateral boundary conditions for IOWA, UW, and iRAM came from the NCEP global FNL (Final) analysis, and for PNNL they came from NCEP's Global Forecast System (GFS) analyses.A regional emissions inventory of natural and manmade emissions over land during the VOCALS REx period was developed at the University of Iowa.This inventory is described by Mena-Carrasco et al. (2012)  sions from anthropogenic sources and large nearby volcanoes, but not biogenic or biomass burning emissions.All of the WRF-Chem regional models incorporated these emissions in their simulations, but none of the other participating models use these emissions.Parameterizations for fluxes of sea salt and dimethyl sulfide (DMS) from the sea surface were provided in the VOCA specification but not required for participants.The specified coarse and fine-mode sea-salt emissions are based on Gong et al. (1997) and Monahan et al. (1986), while ultrafine emissions follow Clarke et al. (2006).The specification uses a simplified version of Nightingale et al. (2000) with a geographically uniform ocean surface DMS concentration of 2.8 nmol L −1 .Choice of emission parameterizations for any other aerosol types, such as dust, was left up to the participants.For regional models, the Model for Ozone and Related chemical Tracers version 4 (MOZART-4, Emmons et al., 2010) global model provided initial and lateral boundary conditions of aerosol and chemical species concentrations.
The models represent aerosol size and mass to varying degrees of precision and complexity.The IPRC model uses climatologically prescribed aerosol mass and size distributions and permits aerosols to affect clouds, so surface aerosol emissions are not represented.The rest of the models use prognostic aerosol schemes -either they specify a small number of size modes (CAM5, GFDL, UW), or use sectional schemes with explicit aerosol size bins (PNNL, ECMWF, UKMO, IOWA).For models with aerosol-cloud feedbacks, a fraction of the aerosols can become activated and become cloud droplet nuclei.In this way, aerosol number concentration can affect cloud droplet number concentration (N d ).N d in turn affects drizzle formation and cloud reflectivity.Cloud and precipitation scavenging reduces concentrations of both activated and unactivated aerosols in the MBL.
In this study, we rely heavily on in situ aircraft observations along 20 • S and between 70 • W, at the Chilean coast, and 85 • W, at the Improved Meteorology (IMET) moored research buoy situated about 1500 km offshore.Throughout VOCALS REx, several aircraft, primarily the NSF C-130 and UK BAe146, regularly performed research flights in and above the MBL along this line (Bretherton et al., 2010;Allen et al., 2011).A common flight pattern included a sequence of 60 km level legs, one 150-300 m above the inversion, one in the middle of the cloud layer or, in the absence of clouds, just below the inversion base, and one in the lower MBL at 150 m height.This pattern was repeated multiple times along the 20 • S segment.Data from 23 flights are distributed fairly evenly throughout the 15 October-16 November period and fairly evenly over the diurnal cycle.Almost all C-130 and BAe146 flights sampled out to 80 • W, while four C-130 flights sampled the entire segment out to 85 • W. Bretherton et al. (2010) and Allen et al. (2011) provided a thorough description of the flights and findings from this collection of flight data and other supporting observational data.Following those studies, we frequently sort aircraft leg-mean values into 5 • or 2.5 • longitude bins before further averaging in order to reduce sampling noise and facilitate comparisons with the models.The 25th-and 75th-percentile values of these leg-mean values are plotted in the figures as error bars and provide an estimate of the temporal and geographic variability in sampling.The actual measurement errors of the means should be much smaller than these ranges.

Time-mean cloud macrophysics and precipitation
We begin by comparing simulated low-cloud fraction near 15:30 UTC (approximately 10:30 LT -local time) averaged over the 1-month REx period (Fig. 2) with satellite cloud fraction from the Moderate Resolution Imaging Spectrometer (MODIS) Terra daytime overpass (also approximately 10:30 LT).Note that the MODIS cloud fraction includes all clouds, not just low clouds, though low clouds strongly dominate the cloud fraction climatology.As in PreVOCA, many models have difficulty in simulating the geographic distribution of the low-cloud fraction as compared with MODIS.The models' patterns of low-cloud cover are quite diverse.The PNNL, UW WRF, IOWA, and ECMWF models agree well with MODIS in the northeast part of the inner study region.In the southwest part of the region, PNNL and UW WRF do not have enough low cloud, while IOWA and ECMWF models have too much.In the southern half of the inner study region the CAM4 and CAM5, GFDL, UKMO, and IPRC models have too few low clouds.While CAM5, with better vertical resolution, appears to be an improvement on CAM4 in the study region, the CAM5 low cloud fraction does not agree any better with MODIS than CAM4 in the outer region, despite better vertical resolution.The GFDL model also has too few low clouds near the coast.Along 20 • S in the inner study region, the GFDL and UKMO models both significantly underestimate cloud fraction compared with MODIS.
Figure 3 2001to 2008(de Szoeke et al., 2012)).Both satellite and aircraft measure a mean increase in LWP moving westward (offshore) from the near-coastal MBL and then a more constant LWP further offshore, while in the R/V Ron Brown climatology the LWP increases further offshore.The LWP along 20 • S varies considerably between models.Most of the models underpredict mean LWP over most of the 20 • S pro-   these measurements (25 October-2 November 2008 and 10 November-2 December 2008) only partly overlap with the VOCA study period.The modeled and observed vertical extent of the cloud fraction is broader to the west, consistent with a more decoupled vertical structure associated with cumuliform convection in the MBL and/or stronger time variations in inversion height.The overall distribution of modeled cloud heights is consistent with the cloud-top-height comparison in Fig. 4. Models with fine vertical resolution in the MBL and lower troposphere (PNNL, IOWA) are able to represent the Gaussian shape of the measurements where models with coarser resolution show less smooth profiles.The height of peak cloud fraction in Fig. 5 is lower in almost all models than the corresponding observed peak, but in this case the comparison could be influenced by the mismatch of observation times and locations with those used for model averaging.Mean surface precipitation rates in the region are generally very small, much less than 1 mm day −1 (Bretherton et al., 2010;Wood et al., 2012;Rapp et al., 2013), but precipitation processes still play an important role in the MBL.Drizzle redistributes moisture downward and stabilizes the MBL through evaporation.In this environment cloud and precipitation scavenging is the dominant removal process of submicron aerosols.Precipitation feedbacks also may play a central role in the formation and maintenance of pockets of open cells (POCs), which are common features of the regional marine stratocumulus (Bretherton et al., 2004;Wood et al., 2008Wood et al., , 2011b;;Ovchinnikov et al., 2013).
Figure 6 compares time-mean modeled surface precipitation, time-mean aircraft observations, and a 2006-2010 satellite precipitation climatology (Rapp et al., 2013) from the NASA CloudSat 2C-RAIN-PROFILE product that includes both daytime and nighttime passes.The aircraft measurements were made at about 150 m above the surface using the Particle Measuring Systems 2D-C instrument.Both observational data sets are subject to considerable uncertainty that is associated with both the measurement technique and the representativeness of the sampling.The models tend to produce more surface precipitation than suggested by CloudSat retrievals.Near the coast limited CloudSat observations suggest miniscule precipitation rates.Some models agree well with this (CAM5, UKMO, PNNL, and IOWA), while the other models predict more significant precipitation rates.Offshore, all models are within an order of magnitude of observed values.

Time-mean aerosol and chemical properties
We next compare the simulated aerosol and chemical properties along 20 • S with the REx observations.We focus on aerosols that directly impact MBL clouds in this region through their capacity to act as cloud condensation nuclei (CCN).We compare modeled and C-130 measured CCN number concentration at 0.1 % supersaturation in the free troposphere above the inversion (FT, Fig. 7, top-left panel) and at 150 m height (Fig. 7, bottom-left panel).The specification of 0.1 % supersaturation was in retrospect suboptimal for the intercomparison, since it is somewhat lower than the 0.2-0.4% maximum supersaturation expected during the nucleation of cloud droplets given typical MBL updraft strengths and aerosol size spectra (Martin et al., 1994;Snider et al., 2003;Hudson et al., 2010).This may lead to an underestimate of the actual number concentration of aerosol that nucleate cloud droplets.However, given other large parameterization uncertainties, this statistic is still a useful comparison between models and observations.In all figures, FT aircraft observations are sampled above clouds and between 1700 and 3200 m, while model FT means are computed from the inversion height to 3200 m, following Allen et al. (2011).At 150 m, with the exception of the UKMO model, all of the models have mean CCN concentrations in the MBL and FT that are about half as large as observed or even less, both near shore and offshore.WRF-Chem models using the MO-SAIC sectional aerosol scheme and the Abdul-Razzak and Ghan (2002) activation scheme (PNNL and IOWA) have significant concentrations of accumulation-mode aerosol that do not activate at this low supersaturation, and aerosol concentrations show much better agreement with VOCALS observations in the MBL when these accumulation-mode aerosols are considered (Q.Yang et al., 2011;Saide et al., 2012).East of 80 • W, the UKMO model has excessive CCN concentrations at all longitudes, reaching a peak of 1700 cm −3 at 74 • W. In the FT the model concentrations of the other models are also lower than observed.Most of the models have some semblance of the offshore CCN gradient seen in the observations.
Observational studies in the VOCALS region confirm that sulfate aerosol is the most important aerosol for nucleating cloud droplets (e.g., Twohy et al., 2013).While number concentration of accumulation-mode sulfate aerosol may be more directly relevant to cloud-aerosol interaction than sulfate mass, only the latter quantity was archived by most models and will be compared with observations.In the right panels of Fig. 7, modeled total mean sulfate aerosol mass is compared with C-130 and BAe-146 aerosol mass spectrometer (AMS) sulfate aerosol mass from 0.05 to 0.5 µm.Here the model MBL values are vertical means with the MBL thickness determined as for Fig. 4. In both the MBL and the FT, the models all have significant offshore gradients of sulfate aerosol comparable to the observations, consistent with a continental source.The models differ considerably in sulfate mass, especially in the MBL, but the majority of models tend to have less FT and more MBL sulfate aerosol mass than the AMS values.It should be noted that the AMS values represent a lower bound on actual sulfate mass, as there can be significant mass contained in aerosols larger than 0.5 µm in diameter (e.g., Q. Yang et al., 2011).In the MBL, the models are more skillful representing sulfate mass than CCN number concentration, with most models within a factor of 2 of the observed means.
Two important atmospheric precursors to sulfate aerosol are DMS and SO 2 .DMS is the only local source of (non-sea salt) sulfate aerosol in remote ocean regions.Figure 8 shows a comparison of mean MBL DMS concentration of most of the models with aircraft observations.Also shown are mean near-surface atmospheric DMS observations from R/V Ron Brown during VOCALS-REx (M.Yang et al., 2011).The timing of these observations only partly overlaps the VOCA simulation period, as was the case with the R/V Ron Brown cloud-fraction profiles shown above.The DMS concentrations vary widely across models but are generally higher than the aircraft-observed values for some models.The near-surface values observed by R/V Ron Brown are notably higher than aircraft values, which can be partially explained by the general decrease of DMS concentration with height in the MBL (e.g., M. Yang et al., 2011).The specified ocean surface DMS concentration is a spatially uniform 2.8 nM for the WRF models (as given in the VOCA specification).While it may differ somewhat in the other models, the differences are very unlikely to account for the wide variation between models.Differences in mean surface wind speed and advection patterns also cannot account for DMS differences.Over most of the inner study region, the interquartile range across models of mean model surface wind speeds is less than 2 m s −1 and the interquartile range of both meridional and zonal 10 m winds is less than 1.5 m s −1 .Furthermore, the intermodel differences in upstream mean model wind speed appear to be uncorrelated with model mean DMS concentrations.The large differences in MBL DMS concentration are most likely due to differences in surface flux parameterizations or differences in model chemistry.Both models and observations agree that MBL DMS concentrations are larger offshore than near the coast, possibly due to the much higher wind speed offshore.PNNL WRF-Chem significantly overestimates the DMS concentration in the atmosphere, and a detailed investigation by Q. Yang et al. (2011) partially attributes this to overestimation of the DMS oceanto-atmosphere transfer velocity.However, the PNNL WRF mean wind speeds along 20 • S are very similar to those from UW WRF and GFDL, whose mean 20 • S MBL DMS concentrations are much lower.
Both modeled and observed profiles of gas phase SO 2 along 20 • S (Fig. 9) in the MBL and the FT show even sharper gradients near the coast than for SO 4 aerosol mass.There is abundant SO 2 near shore due to continental anthropogenic and natural sources, but the SO 2 is low offshore compared with aircraft values in both the MBL and the FT.The abundance of modeled SO 2 in the near shore and the strong modeled offshore sulfate gradient in the MBL suggests the models are producing most of their MBL sulfate aerosol east of 80 • W via oxidation of SO 2 .This mechanism is generally consistent with findings of M. Yang et al. (2011) based on observed offshore SO 2 and SO 4 budgets in VOCALS-REx.The offshore model differences in the FT SO 2 are likely due to differences in background SO 2 in the models.The only model that matches the observed val- ues (IOWA) has specified minimum thresholds for its SO 2 boundary conditions (Saide et al., 2012).For the offshore MBL, most models, including the three WRF-Chem simulations, underestimate SO 2 , which has been hypothesized to be due to SO 2 to SO 4 aqueous reaction rates that are too fast (Saide et al., 2012).However, the aircraft concentrations in the remote MBL are suspiciously high, as there were almost no measured SO 2 concentrations below 10 pptv (parts per trillion by volume) during VOCALS flights, even during nighttime.
Another significant potential source of aerosol mass and number in the MBL, especially in the remote regions, is sea-spray aerosol (SSA) generated by bubble bursting.The SSA mass in the MBL is thought to be dominated by the largest 10 % of the total number concentration, with dry diameters exceeding 1 µm while number concentrations and contributions to CCN are dominated by the smaller sizes (Clarke et al., 2006).Here we compare the modeled SSA (dry) mass mixing ratio with C-130 aircraft-observed estimates (Fig. 10).These estimates from Blot et al. (2013) are based on data from particle counters and a Giant Nuclei Impactor and consider SSA particle sizes from about 0.04 µm to tens of micrometers.The observed trend to lower values west of 80 • W has been attributed to more effective removal by drizzle in spite of higher winds and SSA production (Blot et al., 2013).There is a substantial range in simulated SSA mass, with most models exceeding the observed mean values.However, the WRF-Chem models and the GFDL models are  generally close to the aircraft interquartile ranges.The intermodel range of mean surface wind speeds in the study region is small (as noted above) and uncorrelated with SSA mass.Some models have upper size limits due to the sectional approach used (e.g., the MOSAIC model used in the PNNL WRF and IOWA WRF has a 10 µm cutoff), somewhat limiting their total SSA mass.The expected mass contribution of aerosols smaller than 0.04 µm is negligible.
We next compare in Fig. 11 modeled cloud droplet number concentration (N d ) with aircraft-observed N d and MODIS N d retrieved using the method of George and Wood (2010).Five of the seven plotted models underestimate droplet concentration compared with aircraft and MODIS observations, especially near the coast.(Note that model N d is computed only in grid cells where the 3 h mean cloud liquid water ex- Black carbon (BC) aerosol is a key tracer for the presence of submicrometer combustion-derived aerosol.Although it is usually only a few percent of combustion aerosol mass, when BC is elevated above "clean" conditions it indicates combustion aerosol is contributing directly to aerosol mass, number and CCN.Unlike CO, BC in aged combustion aerosol is readily scavenged by precipitation such that ambient concentrations reflect the impact of both source and removal processes.Figure 15 compares BC aerosol mass for several models with binned C-130 aircraft measurements made with a single particle soot photometer, which measures BC aerosol of 0.087-0.4µm diameter (Shank et al., 2012).The models' spread in MBL concentrations is large, especially near the coast, but with all models generally within 1 order of magnitude of observed means.Despite the large biases in many models, most do show an increase in black carbon concentration towards the coast in the MBL, as observed.One ex-  ception to this trend is the UW.This model does not include biomass burning, which explains the large difference between it and the other models near land.The models generally underestimate BC in the FT.The FT observations are suggestive of an offshore maximum in BC that is not captured in any of the models.The spatial and temporal variability in aircraft-measured BC in the FT makes evaluation of the model means difficult.Two other trace gases measured during VOCALS flights are ozone and CO.Although they do not interact strongly with clouds, they provide an interesting comparison with models because this region is data-sparse and distant from other locations with extensive in situ measurements through the lower troposphere.These gases (especially CO) are longlived; hence they are strongly determined by boundary conditions in the regional models.Thus, these model comparisons, especially for CO, are a stronger test for global than regional models.
Ozone concentrations are compared in Fig. 16.As noted in Allen et al. (2011), mean O 3 concentrations measured in this region are higher in the free troposphere than in the MBL, generally consistent with subsidence of higher-ozone uppertropospheric air, and the models reproduce this pattern.The PNNL WRF and IOWA WRF models match the observed means fairly well.Ozone can also be produced around anthropogenic pollution plumes.However, observed longitudinal gradients of O 3 are small in the boundary layer, and in the FT there actually is a 25 % drop in concentration near the coast; Allen et al. (2011) attributed this to enhanced mixing with ozone-poor boundary layer air, which overwhelms any coastal anthropogenic source.The IOWA WRF and GFDL runs have a lesser but noticeable coastal decrease in O 3 ; the CAM models have a slight ozone increase in the MBL and no decrease in the FT, suggestive of an overly strong coastal ozone source.
CO concentrations (not shown) were available only from the WRF-Chem regional models and the GFDL global runs.Aircraft mean values from 75 to 85 • W were 66 ppbv (parts per billion by volume) in the MBL and 75 ppbv in the FT with weak longitudinal variation, and the model means were generally within ±10 ppbv of observed means along 20 • S in both the MBL and FT.Because of the relatively long lifetime of CO, differences between model means are more closely tied to model boundary conditions or remote sources than to differences in model physics and chemistry.

Discussion
In evaluating the performance of the models with respect to aerosols and clouds, it is useful to group a few subsets of the models with similar characteristics.We begin with two contemporary GCMs in the study, GFDL and CAM5, which have comparable horizontal and vertical resolution in the MBL.Both models significantly underpredict LWP and inversion height along 20 • S, and the GFDL model is significantly deficient in cloud fraction all along 20 • S, especially near the coast.Both are also deficient in CCN at 0.1 % supersaturation and have an apparent surplus of sulfate aerosol and SSA mass, suggesting that their aerosol size distributions may be skewed towards larger sizes.Neither model displays a mean offshore gradient in CCN despite having significant offshore gradients in sulfate aerosol.As a result, both models underestimate observed cloud droplet concentrations, especially near the coast.
The three participating WRF-Chem models (PNNL, IOWA, and UW) show somewhat differing cloud characteristics but are similar in some other respects.Since they use different PBL (planetary boundary layer), microphysics, chemistry, and aerosol schemes, and use different horizontal and vertical grid resolutions, these models are expected to give a range of results.The three models produce similar geographic patterns of low clouds but the IOWA model predicts more low clouds in the southwest part of the study region than the other two models, while MODIS cloud fractions have intermediate values.Along 20 • S, the PNNL model has the highest LWP while the IOWA and especially the UW model underpredict LWP away from the coast.
Furthermore, all three models only slightly underestimate the observed MBL depth.All three display prominent offshore gradients in CCN, N d , and sulfate aerosol.All three significantly underpredict CCN concentrations at 0.1 % supersaturation at 20 • S. However the PNNL and IOWA models activate significantly more CCN at higher supersaturations (not shown).The UW and PNNL simulations only slightly underpredict N d and the IOWA simulation is close to observations in the western part of the study region but overpredicts N d in the eastern part.
The simulations from the two global operational forecast models, ECMWF and UKMO, contrast sharply.These models are intermediate in vertical resolution between the WRF models and the global climate models.The ECMWF LWP and cloud fraction agree reasonably well with observations though the MBL depth is shallower than observed.The UKMO model maintains realistic MBL depth, but its low cloud fraction drops to 50-60 % away from the coast, somewhat less than observed, and the LWP is lower by a factor of 2 or more than observed.Because CCN concentration and N d are unavailable from the ECMWF simulations, it is difficult to evaluate the ECMWF aerosol distribution.In contrast to other models in the study, UKMO has very high concentrations of aerosol and CCN, leading to very large cloud droplet concentrations compared with those observed.The overestimation of sulfate aerosol was subsequently found to be due to a positive bias in the emission source strength used in these simulations, introduced through an error in the interpolation of the emissions onto the model grid.

Conclusions
The VOCALS-REx experiment in the SEP region provides a unique data set of aerosols, chemical constituents and marine boundary layer clouds sampled extensively by aircraft and ship over a 4-week period.This has provided the opportunity to compare and evaluate a large group of diverse models with extended in situ data over the longitudinal transect at 20 • S. Compared to the previous Pre-VOCA model assessment (Wyant et al., 2010) in the same region, which relied mostly on satellite measurements, the new emphasis of VOCA is on aerosol-determining processes and aerosolcloud interactions in a marine stratocumulus regime.Hence our analysis in this paper has been limited to the subset of nine models participating in VOCA that have some representation of aerosol processes, which in some cases interacts with cloud microphysics.
Returning to the first question raised in the introduction, for many of the models, accurately predicting cloud fraction, liquid water path, and precipitation remain as major challenges and are critical for accurately simulating aerosolcloud interactions.Despite good simulations of the SEP pressure and wind patterns, the mean distribution of low clouds in the region is still problematic and not substantively improved for many global models since PreVOCA, while regional models participating in both studies (IPRC and especially PNNL-WRF) exhibit better performance.Most models still tend to underestimate LWP and boundary layer depth in the study region, especially GCMs with low vertical resolution, and the intermodel spread in LWP is still large.For many models in VOCA, the representation of aerosol processes is a relatively new feature and, at this stage of model development, we do not expect, nor generally find, that their inclusion necessarily improves model simulation of cloud and boundary layer properties relative to Pre-VOCA.
Turning to our second question about how well models represent the spatial distribution of aerosols, we find that along 20 • S most models were able to qualitatively represent offshore and vertical gradients in aerosols and aerosol-related constituents, in particular the offshore reduction of aerosols in the MBL and an associated reduction in cloud droplet concentration.The models also show some skill in simulating the time variation of aerosol and cloud droplet number concentrations associated with episodic offshore flow in the VO-CALS study region.
Our third question asked about the fidelity of modeled aerosol-cloud interaction.Most of the models in this study appear to be deficient in CCN at 0.1 % supersaturation both in the MBL and free troposphere.However, droplet number concentrations are unbiased in a model ensemble-mean sense, indicating that, for some models, significantly more accumulation-mode aerosol is being activated than just the CCN at 0.1 % supersaturation.The GCMs in this study have difficulty with properly representing offshore gradients in CCN and cloud droplet number concentration near the coast.Low horizontal resolution may be to blame.There is also substantial scatter in model-predicted local sources of aerosol mass over the remote ocean due to DMS and SSA, even though the simulated wind speeds were realistic.While the global models tended to have better DMS representation than the regional models, the opposite occurred for SSA, where regional models showed lower biases.
Although simulation of aerosol-cloud interactions and aerosol indirect effects in the marine boundary layer clouds is a challenge, and further improvements are needed, the models do capture many of the essential cloud and aerosol controlling processes in the SEP.Indeed, regional models are already being successfully used to investigate aerosol processes in the SEP (e.g., Q. Yang et al., 2011;Saide et al., 2012;Yang et al., 2012;George et al., 2013).However, for those models with large mean biases in cloud and aerosol properties, accurately simulating impacts of aerosols on clouds and vice versa is problematic.Thorough integration of interactive aerosols into operational weather prediction models, a relatively new development, may help stimulate progress in this area.
In answer to the last question raised in the introduction, the VOCA comparison presented here demonstrates that VOCALS-REx observations provide a good benchmark for aerosols and for cloud properties, providing a comprehensive observational basis for a first-order look at aerosol-cloud interactions in a broad range of models.Future comparisons using VOCALS-REx data or other field data could aim at better quantitative constraints on individual aerosol and cloud processes by enforcing more uniform land and ocean surface emission conditions and possibly specifying lateral advective conditions.Because of the large numbers of model fields and high-resolution outputs of some models, the overall utility of the intercomparison could be improved by adding an additional quality-assurance phase to the submission process, where model setup and output over a relatively short simulation period could be evaluated and corrected prior to conducting experiments over long durations.Collection of additional model outputs, such as a broader selection of CCN activation supersaturations, more detailed aerosol size information, and rates of aerosol-related processes could be used to help better unravel individual model biases.An alternative but promising approach for some categories of models would be a variation of a kinematic driver framework (KiD, Shipway and Hill, 2012) in order to analyze and compare microphysical and aerosol processes in various models.
Appendix A: Model descriptions NCAR CAM4 and CAM5 are both part of the CESM 1.0 release.The global NCAR CAM4 and CAM5 simulations were performed with similar setups with the finite volume dynamical core.Both use daily forecast runs initialized with ECMWF YOTC analyses interpolated onto the model grid, and are analyzed at hours 48-72.They use identical horizontal resolution, but with fewer vertical levels in CAM4, especially in the boundary layer.CAM4 uses a prognostic (liquid and ice) single-moment microphysics scheme (Rasch and Kristjansson, 1998).CAM5 uses the two-moment prognostic bulk scheme including prognostic number concentration (Gettelman et al., 2008;Morrison and Gettelman, 2008).The PBL schemes also differ: CAM4 uses the nonlocal diffusivity scheme (Holtslag and Boville, 1993) while CAM5 uses the TKE-based turbulence scheme of Bretherton and Park (2009) and the shallow convection scheme of Park and Bretherton (2009).
The IPRC model iRAM 1.2 is very similar to the version described in Lauer et al. (2009) but run at higher horizontal resolution (0.25 • × 0.25 • ).The simulations here used NCEP Final (FNL) analysis for initial and boundary conditions.Monthly mean aerosol concentrations are prescribed for these simulations based on global model simulations of aerosol mass (see Lauer et al., 2007) and observed aerosol size distributions (see McNaughton, 2008).Cloud microphysics are calculated with a two-moment bulk scheme (Phillips et al., 2007(Phillips et al., , 2008(Phillips et al., , 2009)).Aerosol activation is tracked and affects cloud microphysics, but cloud evolution and precipitation do not affect aerosol mass concentrations or sizes outside of clouds.The PBL scheme uses a turbulence closure with prognostic turbulent kinetic energy (TKE) and dissipation rate (Detering and Etling, 1985;Langland and Liou, 1996).The radiation scheme is based on Edwards and Slingo (1996).
The three WRF-Chem simulations were run continuously over the study period and have similarly sized domains.UW and IOWA use NCEP FNL analyses and PNNL uses NCEP GFS analyses for initial and boundary conditions together with MOZART model output for initializing concentrations of chemical species and aerosols.All use the VOCA standard anthropogenic and volcanic land emissions.All use the RRTM scheme (Mlawer et al., 1997) for LW radiation and the Goddard scheme (see Chou et al., 1998) for SW radiation.However the three simulations' horizontal and vertical resolutions differ, as do many of their other aerosol, cloud, and boundary layer physics parameterizations.
The IOWA run uses WRF-Chem v3.3, and its configuration and physics are described in detail in Saide et al. (2012).The MOSAIC (Zaveri et al., 2008) eight-bin sectional aerosol scheme is used, with the CBM-Z (carbon-bond mechanism version Z) gas-phase chemical mechanism (Zaveri and Peters, 1999) and modified DMS reactions.Biogenic land emissions are based on the MEGAN (Model of Emissions of Gases and Aerosols from Nature) algorithm (Guenther et al., 2006) and biomass burning emissions are estimated from FIRMS (Fire Information for Resource Management System) MODIS fire detections (Davies et al., 2009).A bulk two-moment Lin microphysics scheme (see Chapman et al., 2009) and a level-2.5 Mellor-Yamada type PBL scheme (MYNN 2.5, Nakanishi and Niino, 2004) are used.
The PNNL simulation uses modified WRF-Chem v3.2.1 code, which was later released to the public in v3.3.The model is configured to use the MOSAIC eight-bin sectional aerosol module and the CBM-Z mechanism with DMS chemistry.The PNNL runs also use biogenic and biomass burning emissions from MEGAN and MODIS, respectively.The PNNL simulations differ in the use of the bulk twomoment microphysics scheme of Morrison (Morrison and Gettelman, 2008) and the YSU (Yonsei University) nonlocal PBL scheme (Hong et al., 2006).Additional details regarding the model's physical parameterizations and configuration for the PNNL simulations can be found in Q. Yang et al. (2011Yang et al. ( , 2012)).
The UW contribution also uses WRF-Chem v3.2.1, though on a coarser horizontal and vertical grid than the IOWA and PNNL runs.Aerosols are represented with three modes using the Modal Aerosol Dynamics Model for Europe (MADE, Ackerman et al., 1998) together with a Secondary Organic Aerosol Model (SORGAM, Schell et al., 2001).The Regional Acid Deposition Model version 2 (RADM2, Chang et al., 1989) chemical mechanism is used with modified DMS reactions.The UW run neglects biogenic and biomass burning emissions.For DMS flux, the UW run follows the VOCA specification.The same Lin microphysics scheme is used as the IOWA runs.Like CAM5, the TKE scheme of Bretherton and Park (2009) is used in the PBL but no shallow convection scheme is used.
The UKMO simulations use a deterministic global numerical weather prediction (NWP) configuration of the Met Of-fice Unified Model (MetUM, Davies et al., 2005) based on that in the Met Office's operational NWP suite between 9 March and 14 July 2010; this is designated global NWP cycle G52.Two main forecasts were run per day, each 5 days in length, initialized at 00:00 and 12:00 UTC, for which the first 12 h are analyzed in this study.The Coupled Largescale Aerosol Simulator for Studies in Climate (CLASSIC) prognostic aerosol scheme from the Met Office Hadley Centre was used (Bellouin et al., 2011).Aerosol concentrations are initialized from HadGEM-2 climatologies from a 20-year HadGEM2 climate run with the CLASSIC scheme.Aerosol emissions used are based on the AeroCom-2 hindcast emissions (Diehl et al., 2012) based on the year 2006.DMS emissions come from HadGEM2-based climatology.Local SSA over the ocean are diagnosed based on surface wind speed, and are not transported or deposited.Biogenic land aerosol is not modeled explicitly but instead comes from a climatology based on earlier simulations.A single-moment bulk microphysics scheme (Wilson and Ballard, 1999), the Lock et al. (2000) PBL scheme, and the two-stream radiation scheme of Edwards and Slingo (1996) were used.
The ECMWF runs use the Monitoring Atmospheric Composition and Climate (MACC) cycle model 36R1.Daily 24 h forecast runs are used with aerosols in the model as passive tracers.The model uses the aerosol scheme of Morcrette et al. (2009), which has three bins each for sea salt and dust, single prognostic variables for SO 2 and SO 4 , and 12 prognostic variables in all.The ECMWF model uses a bulk singlemoment microphysics scheme.The RRTM radiation scheme is used with a McICA approach (Morcrette et al., 2008).The PBL in the model uses an eddy-diffusivity mass-flux framework (Köhler et al., 2011).
The GFDL AM3 (Donner et al., 2011) was run in forecast mode on a cubed-sphere 48 × 48 × 6 grid with model output originally interpolated to a 2.0 • latitude × 2.5 • longitude grid.The runs were initialized with ECMWF reanalysis data.The GFDL modal aerosol scheme uses two modes for sulfate and organic aerosol, and three modes for sea salt (see Donner et al., 2011).Anthropogenic emissions are estimated from historical values of Lamarque et al. (2010).Biogenic emissions and DMS emissions from the ocean surface are also included.The microphysics scheme follows Rotstayn (1997) and Rotstayn (2000) including prognostic cloud number concentration (Ming et al., 2006).The Lock et al. (2000) PBL scheme is used.The radiation scheme used is due to Freidenreich and Ramaswamy (1999) and Schwarzkopf and Ramaswamy (1999).See Donner et al. (2011)

Figure 1 .
Figure 1.Observed SST (K) from AMSR-E and surface winds from QuikSCAT in the outer VOCA study region during the REx period, 15 October-16 November 2008.The inner study region is shown as a black rectangle.
compares the simulated liquid water path (LWP) along 20 • S with mean C-130 airborne microwave radiometer observations (Zuidema et al., 2012) during VOCALS and with mean satellite observations from the Advanced Microwave Scanning Radiometer-EOS (AMSR-E) on NASA's Aqua satellite.The AMSR-E values include both daytime and nighttime passes.Also plotted is a 2001-2008 October-November climatology of LWP along 20 • S from the shipbased radiometer measurements of the R/V Ron Brown from

Figure 2 .
Figure 2. Models' mean low cloud fractions at 10:30 LT (15:30 UTC) compared with MODIS Terra daytime mean total cloud fraction.The extent of the inner VOCA study region is shown with a black rectangle.

Figure 4 Figure 3 .Figure 4 .
Figure 3. Grid-box mean LWP along 20 • S compared with AMSR-E satellite mean of day and night passes and mean LWP from microwave radiometer on the C-130 (Zuidema et al., 2012).Error bars represent interquartile ranges of aircraft leg means.Also plotted as triangles are mean values measured by the R/V Ron Brown from 2001 to 2008 (de Szoeke et al., 2012).

Figure 5 .
Figure 5. Mean model cloud fraction at 85 • W, 20 • S (left panel) and at 75 • W, 20 • S (right panel).Also plotted is cloud fraction inferred from R/V Ron Brown ship-based measurements over nearby longitudes fromBurleyson et al. (2013).See text for more details.

Figure 6 .
Figure 6.Mean surface precipitation (in mm day −1 ) along 20 • S compared with leg-mean precipitation rate from C-130 estimates at 150 m using a 2D-C probe, and with CloudSat climatology for October-November 2007-2010.The 2D-C precipitation mean for 70-75 • W is less than 0.001 mm day −1 and not shown.

Figure 7 .Figure 8 .
Figure 7. CCN concentrations at 0.1 % supersaturation (in cm −3 ) along 20 • S are shown in the left-hand side panels.FT mean (top left) and concentration at 150 m (lower left).C-130 nephelometer means are plotted with "x" symbols.Sulfate aerosol (SO 4 ) dry mass concentrations (in µg m −3 ) of diameter range 0.05-0.5 µm measured with AMS (C-130 and BAe-146) are compared with model dry mass concentration along 20 • S (see Allen et al., 2011) in the right-hand side panels for the FT (top right) and MBL mean (bottom right).The lower-left panel is linearly rescaled at the top of the plot.The lower-right panel is modified from a figure in Mechoso et al. (2014) to add aircraft sampling variability.Note that ECMWF CCN concentrations are unavailable.

Figure 9 .
Figure 9. Mean modeled SO 2 (gas) concentration along 20 • S (in pptv) and C-130 aircraft means.The top sections of both panels are rescaled.

Figure 11 .
Figure 11.Mean cloud droplet number concentration, N d (in cm −3 ), along 20 • S compared with mean C-130 measurements using a PMS cloud droplet probe and FSSP (forward scattering spectrometer probe) as well as with MODIS estimates.This figure is modified from Mechoso et al. (2014) to add aircraft sampling variability and MODIS data.The top section of the plot is rescaled.

Figure 12 .
Figure 12.Hovmöller diagrams of CCN at 0.1 % supersaturation at 150 m height along 20 • S. CCN concentrations are given in cubic centimeters.

Figure 13 .
Figure 13.Hovmöller diagrams of the models' mean cloud droplet concentrations, N d (in cm −3 ), along 20 • S. Daily mean MODIS estimates from Bretherton et al. (2010) are shown in the lower-left panel.

Table 1 .
Model parameters and physics.
for more details.