Observation-based constraints on modeled aerosol surface area: implications for heterogeneous chemistry

. Heterogeneous reactions occurring at the surface of atmospheric aerosol particles regulate the production and lifetime of a wide array of atmospheric gases. Aerosol surface area plays a critical role in setting the rate of heterogeneous reactions in the atmosphere. Despite the central role of aerosol surface area, there are few assessments of the accuracy of aerosol surface area concentrations in regional and global models. In this study, we compare aerosol surface area concentrations in the EPA’s Community Multiscale Air Quality (CMAQ) model with commensurate observations from the 2011 NASA ﬂight-based DISCOVER-AQ (Deriving Information on Surface Conditions from COlumn and VERtically Resolved


The role of aerosol surface area in heterogeneous reaction kinetics
Reactions occurring at atmospheric interfaces, such as suspended aerosol particles, catalyze the production and loss of key gas-phase compounds in Earth's atmosphere with important implications for regional air quality (Chang et al., 2011). The rate of heterogeneous reactions occurring at the surface of aerosol particles is a function of the gas-aerosol collision frequency and the reaction probability per collision. Variability in gas-aerosol collision frequency is determined by the aerosol surface area concentration. The probability of reaction, or the net reactive uptake coefficient (γ ), is reaction specific and dependent on chemical kinetics, gas accommodation at the surface, and near-surface diffusion (Abbatt et al., 2012). Collectively, the first-order removal rate of a gasphase species (A) from the atmosphere can be written as where the heterogeneous reaction rate constant (k het ), in the absence of gas-phase diffusion limitations, can be written as where ω is the mean molecular speed of the gas-phase molecule (m s −1 ), and S a is the surface area concentration of aerosol particles (m 2 m −3 ). To date, most evaluations of the role of heterogeneous chemistry on gas-phase composition have focused on uncertainty in parameterizations of reactive uptake coefficients, such as the reactive uptake of dinitrogen pentoxide (N 2 O 5 ) due to its role as a NO x sink (Brown et al., 2009;Evans and Jacob, 2005;MacIntyre and Evans, 2010;McDuffie et al., 2018). In comparison, there has been less focus on model representation of aerosol surface area concentrations, despite the fact that k het is linearly dependent on S a . An accurate representation of aerosol surface area in regional and global chemical transport models is challenging, as S a is a complex function of size-dependent aerosol particle emissions, chemical transformations, and removal processes. Here, we directly compare aerosol surface area concentrations in a regional chemical transport model with commensurate aircraft measurements to assess the representation of S a in regional air quality models.
1.2 Calculation of aerosol surface area in regional air quality models The total aerosol particle surface area concentration has been calculated in air quality models using a variety of approaches, including the discrete representation of the particle size distribution in defined size ranges, known as the sectional method (Adams and Seinfeld, 2002;Gelbard et al., 1980;Jacobson, 2001;Lee et al., 2009;Lee and Adams, 2012;Luo and Yu, 2011;Spracklen et al., 2006;Trivitayanurak et al., 2008;Yu and Luo, 2009), and a continuous modal representation of the particle size distribution (Kleeman et al., 1997;Mann et al., 2010;Meng, 1998;Pringle et al., 2010;Sartelet et al., 2006;Stier et al., 2005;Vignati et al., 2004;Zhang et al., 2010a). Here, we review the modal representation of particle size distributions implemented in the Community Multiscale Air Quality (CMAQ) model and the calculation of both wet and dry total aerosol surface area (S a ). Aerosol particle size distributions in CMAQ follow the method developed for the Regional Particulate Model, an extension of the Regional Acid Deposition Model (Binkowski, 1999;Binkowski and Roselle, 2003), where the total particle size distribution is treated as the superposition of three separate lognormal distributions (or modes) -Aitken, accumulation, and coarse modes (Binkowski, 1999;Whitby, 1978). The lognormal particle size distribution for each mode is defined as where N is the total number concentration, D is the particle diameter, and D g and σ g are the geometric mean diameter and geometric standard deviation. Under this definition, the Aitken mode describes aerosol particles of diameter smaller than approximately 0.1 µm with a median diameter of 0.03 µm, while the accumulation mode encompasses the diameter range of 0.1 to 2.5 µm with a median diameter of 0.3 µm (Binkowski, 1999). The coarse mode describes particles of diameter 0.3 to about 10 µm, with a median diameter of 6 µm. It should be noted that there is uncertainty in the exact size distributions within CMAQ, dependent on emissions parameters, so the median diameters within modes are approximate (Elleman and Covert, 2010). Particle nucleation and growth result in changes to the mode diameter and can result in the transfer of particle number, surface area, and mass to the next larger size (e.g., Aitken to accumulation). Outside of particle growth and nucleation, the approximate median diameters are unchanged. The size ranges for each mode are based on Whitby (1978), and geometric standard deviation is also based on Whitby (1978) but has been updated to the geometric standard deviations from Elleman and Covert (2010). Three integral properties of the aerosol size distribution are calculated in CMAQ, the zeroth (M 0 ), second (M 2 ), and third moments (M 3 ), where the kth moment of the size distribution is calculated as In this representation, N = M 0 , S a = π M 2 , and V = π 6 M 3 , where V is the total aerosol volume (Binkowski, 1999;Binkowski and Roselle, 2003). Though M 2 is utilized in CMAQ's aerosol subroutines, it is multiplied by π prior to use in main CMAQ routines, such that it is identified as a modal surface area (Binkowski and Roselle, 2003).
The time rate of change of each moment is calculated for each grid box and time interval as where P and L represent the production and loss of M k in each aerosol mode. With respect to S a (S a = πM 2 ), neglecting transport terms, P 2 includes new particle formation (Aitken mode only), condensational growth, and primary emissions, and L 2 includes intramodal coagulation and dry and wet deposition. Primary aerosol emission rates are sourced from the 2011 EPA National Emissions Inventory, which characterizes emissions based on source type and location. Within CMAQ, all primary aerosol emissions, independent of source type, are parameterized with modal size distributions per Elleman and Covert (2010) (see the Supplement). In the version of CMAQ used here, new particle formation is based on classical, binary homogeneous nucleation (Kulmala et al., 1998). Particle growth is described by Binkowski and Roselle (2003) and secondary organic aerosol (SOA) schemes in Carlton et al. (2010).
In the interpretation of model S a , the following modelspecific details should be considered: (1) fine particles (Aitken and accumulation modes) do not coagulate with coarse-mode particles, and coarse-mode particles do not coagulate with each other (Binkowski and Roselle, 2003). (2) The size distribution for primary PM 2.5 emissions is assumed to have a geometric mean (D g = 0.3 µm) and geometric standard deviation (σ g = 2), and >99 % of PM 2.5 emissions are assigned to the accumulation mode (Binkowski and Roselle, 2003), which may have consequent effects on the aerosol surface area distribution. (3) Particles are assumed to be spherical.
The condensation of water is accounted for in the chemical evolution of M 2 ; thus M 2 is inherently the wet second moment (M w 2 ), which is used in the calculation of heterogeneous chemical reactions. In addition to M w 2 , a dry second moment (M d 2 ) is calculated as a function of the third moment (M 3 ) as In the following analyses, we concentrate on the comparison of modeled and measured S a to evaluate the relative uncertainty associated with model descriptions of heterogeneous kinetic mechanisms (i.e., reactive uptake coefficients, γ ) and aerosol particle size distributions (i.e., aerosol surface area, S a ) that, combined, dictate the fate of reactive gas-phase molecules.

Previous model-measurement comparisons of aerosol surface area
Evaluation of regional air quality models has largely focused on criteria air pollutants such as ozone (O 3 ) and particle mass (e.g., PM 2.5 ) (Appel et al., 2021). Previous model evaluation of particle mass has focused on an array of metrics including mass concentration (Gantt et al., 2012;Spak and Holloway, 2009;Wang et al., 2009), number concentration (Park et al., 2006;Ranjithkumar et al., 2021;Wang et al., 2009;Zhang et al., 2010b), size distribution (Kelly et al., 2011;Nolte et al., 2015;Park et al., 2006;Zhang et al., 2010b), composition (Knote et al., 2011;Nolte et al., 2015;Prank et al., 2016), and aerosol optical depth (Ghan et al., 2001;Knote et al., 2011). There has been a very long and detailed history of CMAQ evaluation of PM 2.5 , including ground-based (Baker et al., 2018;Fan et al., 2005;Ghim et al., 2017;Hogrefe et al., 2009Hogrefe et al., , 2015Liu and Zhang, 2011;Prank et al., 2016;Smyth et al., 2006;Wang et al., 2021;Yu et al., 2012Yu et al., , 2008bYu et al., , 2008aZhang et al., 2019Zhang et al., , 2006Zhang et al., , 2010c, ship-based (Yu et al., 2012), and aircraft-based measurements (Baker et al., 2018;Chen et al., 2020;Yu et al., 2012). For 15 studies comparing groundbased measurements of PM 2.5 to CMAQ outputs between 1999 and 2018, 10 saw an underestimation of PM 2.5 by the model ranging between 6 % -75 % (Ghim et al., 2017;Liu and Zhang, 2011;Prank et al., 2016;Wang et al., 2021;Yu et al., 2008aYu et al., , 2012Yu et al., , 2008bZhang et al., 2019Zhang et al., , 2006Zhang et al., , 2010c, dependent on pollution events and rural versus urban location, while 4 found that CMAQ predicted observations well (Baker et al., 2018;Fan et al., 2005;Hogrefe et al., 2009;Smyth et al., 2006), matching general trends in the observational data, and 1 saw an overestimation of observational data (Hogrefe et al., 2015). Of the three aircraft studies, two saw significant underestimation of PM 2.5 aloft (Baker et al., 2018;Chen et al., 2020), while one saw overestimation in some PM 2.5 compositional components and underestimation in others (Yu et al., 2012). Particle surface area specifically is not regulated as a criteria air pollutant as standards of measurement and air quality controls are determined on a mass per unit volume basis. However, particle surface area indirectly affects the concentration of PM 2.5 and O 3 as it can serve to regulate the lifetime of nitrogen oxides (Chang et al., 2011;Geyer and Stutz, 2004;Portmann et al., 1996;Stadtler et al., 2018) and hydrogen oxides (George et al., 2013;Lakey et al., 2015;Martin et al., 2003;Thornton et al., 2008;Thornton and Abbatt, 2005), the production rate of secondary organic aerosol (Gaston et al., 2014), and new particle formation and growth rates, as the preexisting aerosol surface area serves as a condensation sink for low-volatility gas-phase compounds (Donahue et al., 2014;Trump et al., 2014). There are few reports of modelmeasurement comparisons of particle surface area, and those that have been reported in the literature have focused on comparisons of heavily spatially and temporally averaged concentrations (e.g., field campaign averages). For example, Simon et al. (2010) compared ground-based aerosol surface area concentrations calculated in the CAMx model to measurements made aboard the RV Ronald H. Brown in the Gulf of Mexico and the Houston Ship Channel with two differential mobility particle sizers and an aerodynamic particle sizer (Bates et al., 2008). The results of these studies are given in Table 1. Model prediction of median S a in the Gulf of Mexico was similar to the measurement data (S a,mod /S a,meas = 0.96), with median values and ranges again shown in Table 1. In comparison, model prediction of median S a in the Houston Ship Channel, where there is large spatial and temporal fluctuation in S a relative to the smaller interquartile range seen in the Gulf of Mexico, yielded S a,mod /S a,meas = 1.6. Modeled S a was also compared to measured S a aloft on two research flights from the TexAQS II/GoMACCS field study in September and October 2006. The range of measured S a values (<600 µm 2 cm −3 ) matched well with the average values predicted in the CAMx model, though the maximum modeled values were much larger than those measured (4000-8000 µm 2 cm −3 compared to <600 µm 2 cm −3 ), consistent with the RV Ronald H. Brown comparisons. Overall, it should be noted that on a regional scale, modeled values agree well with measurements of aerosol S a ; however maximum modeled values were larger than those measured both for ground-based measurements and those aloft.
More recently, modeled dry surface area concentrations were assessed over the northeast US during the 2015 Wintertime INvestigation of Transport, Emissions, and Reactivity (WINTER) aircraft campaign . While quantitative assessment of aerosol surface area was not the focus of this study, dry aerosol S a was calculated by combining dry aerosol size distribution observations from a passive cavity aerosol spectrometer probe and ultra-highsensitivity aerosol spectrometer and comparing these to the GEOS-Chem chemical transport model. Two versions of the model, a reference and improved model, were compared to the observations within 13 altitude bins, ranging from surface to 4.5 km; here we focus on the improved model results. The GEOS-Chem model medians were encompassed by the observed interquartile ranges in each altitude bin. The improved model showed excellent agreement with measurements when compiled over large spatial and temporal scales, where S a,mod /S a,meas was 1.25 and 0.68 for the surface and 4.5 km comparisons, respectively.
Given the importance of accurate model representation of aerosol surface area to multiple atmospheric processes, and the limited number of prior studies conducted in urban environments, we revisit this comparison using a regional air quality model aerosol and commensurate aircraft observations conducted in an urban environment.

Aerosol evaluation in the CMAQ model
CMAQ simulations were performed as described by Abel et al. (2018Abel et al. ( , 2019 and Harkey et al. (2021), with carbon bond 5 chemistry, anthropogenic emissions from the 2011 National Emissions Inventory, and input meteorology from the Weather Research and Forecasting (WRF) version 3.2.1 (Skamarock et al., 2008), constrained to the North American Regional Reanalysis (NARR; Messinger et al., 2006;Abel et al., 2018Abel et al., , 2019Harkey et al., 2021). The CMAQ simulation utilized here employed CMAQ version 5.2.1 (Byun and Schere, 2006;Nolte et al., 2015) and was run with 25 vertical layers from the surface to 100 hPa, a 12 × 12 km grid, and hourly temporal resolution. Anthropogenic emissions and emissions from fires (both prescribed and not) were based on the 2011 National Emissions Inventory version 2, with in-line estimates of NO and NO 2 produced by lightning, boundary conditions from the Model for Ozone and Related Chemical Tracers version 4 (MOZART; Emmons et al., 2010), and biogenic emissions from WRF output in the Model of Emissions of Gases and Aerosols from Nature version 2.1 (MEGAN; Guenther et al., 2012). The model was run from 20 May through 31 August 2011, to include 11 d of spin-up (Harkey et al., 2021). The dataset utilized in this analysis is only a subset of the model dataset originally run at UW-Madison.
CMAQ was also run for the time period of the 2015 WIN-TER field campaign for comparison of modeled and measured N 2 O 5 uptake coefficients. This CMAQ simulation also employs input meteorology constrained to NARR, calculated using WRF version 3.8.1 (Skamarock et al., 2008). Anthropogenic emissions were taken from the 2016 National Emissions Inventory Collaborative, version 1 (NEIC, 2019). Emissions from fires and boundary conditions were taken from the EPA Air QUAlity TimE Series (EQUATES) project (https://www.epa.gov/cmaq/equates, last access: 29 November 2021). Biogenic emissions and lightning NO x emissions were both calculated in-line. This simulation employed CMAQ version 5.3.2 (Appel et al., 2021), with carbon bond 6 chemistry (Emery et al., 2015;Luecken et al., 2019). The WINTER period simulation was run from 21 January through 16 March 2015, to include 11 d of spin-up, with hourly output on a 12 × 12 km grid and on 35 vertical levels from the surface to 100 hPa.
The CMAQ simulation for the 2011 DISCOVER-AQ (Deriving Information on Surface Conditions from COlumn and VERtically Resolved Observations Relevant to Air Quality) period employed the "AERO6" aerosol module (Binkowski and Roselle, 2003;Carlton et al., 2010;Foley et al., 2010;Sonntag et al., 2014), where primary and secondary aerosols are characterized by bimodal lognormal size distributions, and the total size distribution is the sum of three aerosol size modes, Aitken, accumulation, and coarse modes, with  (Pye et al., 2015(Pye et al., , 2017Qin et al., 2021;Xu et al., 2018). The CMAQ simulation for the DISCOVER-AQ period employed the default heterogeneous N 2 O 5 uptake (Davis et al., 2008), while the simulation covering the WINTER period employed a N 2 O 5 uptake modified per Bertram and Thornton (2009). Due to the modality of the CMAQ representation of aerosols, we calculate each parameter relating to the aerosol dataset separately for each mode, and these are then combined to result in a total value that can be directly compared to the DISCOVER-AQ observational data. The total CMAQ dry surface area is computed as the sum of the modal dry surface areas. The variable SRF is an output of CMAQ but is defined as SRF = π M d 2 . SRF is a modal variable like each moment, such that total surface area = SRFATKN + SRFACC + SRFCOR (dry surface area in the Aitken, accumulation, and coarse modes, respectively).

DISCOVER-AQ 2011 campaign
Research flights conducted during the NASA DISCOVER-AQ campaigns were designed to measure the vertical and spatial distribution of key air pollutants in urban environments, with a focus on connecting surface measurements with vertically integrating satellite observations. The first DISCOVER-AQ campaign, conducted aboard the NASA P-3B aircraft during July 2011, was comprised of 14 science flights in the Baltimore and Washington, D.C. area Crawford and Pickering, 2014;NASA, 2012). The 2011 DISCOVER-AQ campaign was the first of a series of flights with an objective of narrowing the gap of satellite and observational data and air quality utilizing near-surface air pollution measurements. Science flights concentrated on high-time-resolution measurements of atmospheric composition in the convective boundary layer. Here, we focus on measurements of dry aerosol surface area concentration (S a ), determined from high-time-resolution (1 Hz) size distributions made using an ultra-high-sensitivity aerosol spectrometer (Droplet Measurement Technologies, UHSAS) integrating between 60<d p <1000 nm, which captures the peak of the surface area distribution, shown in Fig. 1 below. The UHSAS measures the particle size from optical light scattering, which was calibrated during DISCOVER-AQ using NIST-traceable polystyrene latex spheres whose refractive index may differ slightly from that of real-world aerosols that may result in a slight under-sizing bias (Moore et al., 2021). Ambient air was sampled through an isokinetic inlet, allowing the aerosol temperature to quickly equilibrate to that of the cabin of the P3-B aircraft. Along with additional ram pressure heating in the inlet during flow deceleration, particles reach an RH of below 40 %-50 %, meaning the aerosol is considered to be dry. The term "dry" here distinguishes that the particle hydration state is greatly reduced compared to that of the unperturbed ambient air. It is important to note that aerosol water is not accounted for in the following model-measurement comparison, though there would be some aerosol water present at the higher end of the 40 %-50 % RH threshold, which may impact the comparison of the measured aerosol to dry aerosol in the model. The particle mobility size from 10-310 nm diameters was measured with a TSI scanning mobility particle sizer (SMPS) with 45 s time resolution, and the particle aerodynamic size from 500-4000 nm diameters was measured with a TSI aerodynamic particle sizer (APS) at 1 Hz. On the representative day shown in Fig. 1, SMPS data showed that approximately 5.8 % of the surface area fell below the 60 nm threshold of the UHSAS measurement for an average surface area distribution for the average of all altitudes, while the APS indicated that supermicron particles did not contribute to the particle surface area. The distribution comparison between the model and measurement differs in shape at different altitudes. Figure 1b shows a comparison of the near-surface data, below 1 km, which gives a very similar distribution to that of the average of all altitudes. However, Fig. 1c, or that above 3 km altitude, shows a different shape and one that is more comparable between model and measurement. It should be noted that the axis scales in Fig. 1b and c do not match, showing a discrepancy in distribution calculation between CMAQ and DISCOVER-AQ that will be discussed in later sections. For simplicity, we choose to focus exclusively on the UHSAS size distribution data given their high frequency and wide size range of particle diameter, but it is important to note that not all of the particle surface area is captured by the UHSAS instrument. UHSAS data are available to the public at the NASA Langley Atmospheric Science's Data Center and Distributed Active Archive Center (https://doi.org/10.5067/Aircraft/DISCOVER-AQ/Aerosol-TraceGas). In the following analysis we utilize observations of nitric oxide (NO) and nitrogen dioxide (NO 2 ) measured with the NCAR four-channel chemiluminescence instrument (Ridley and Grahek, 1990) and carbon monoxide (CO) measured via differential absorption CO measurement (DACOM) (Sachse et al., 1987) to assess differences in modeled and measured aerosol surface area.

Model-measurement comparison
To compare measured and modeled S a , we sample the hourly 12 km × 12 km CMAQ output at the time and location of each DISCOVER-AQ sampling point. The spatial resolution of CMAQ, relative to the DISCOVER-AQ flight, is shown in Fig. 2 for the 28 July 2011 research flight, where the color corresponds to the modeled surface-level dry S a (in µm 2 cm −3 ) at noon EST. Indexing and analysis of the two aforementioned datasets was completed in MATLAB. Each 1 s data point from the DISCOVER-AQ 2011 campaign was mapped to the nearest (as described below) 4D index (time of day, latitude, longitude, and altitude) in the lower time and spatial resolution CMAQ model for direct comparison. The nearest 1 h averaged CMAQ time point was selected based on the time window that encompassed the aircraft flight time; i.e., a flight time of 09:16:00 was mapped to CMAQ time period of 09:00-09:59. The nearest 12 km × 12 km CMAQ grid box was also selected based on the grid box that encompassed the aircraft location at the time of sampling. The nearest CMAQ altitude (or layer) was identified by locating the aircraft height within one of the 25 indexes in which the altitude was encompassed. With all four CMAQ indexes assigned to each data point, the 1 s DISCOVER-AQ dataset could be fully mapped and compared to that from the CMAQ model. It should be noted that there are far more data points in the observed data than in the model due to resolution constraints, and thus the model is being oversampled. Each of the four indexes were concatenated together in the order defined in CMAQ, namely latitude, longitude, layer (altitude), and time. This process was utilized for each data point for an entire flight of DISCOVER-AQ and was then replicated for each subsequent flight. The result of this approach is shown in Fig. 3, for the comparison of modeled and measured carbon monoxide (CO). The coefficient of determination (r 2 ) for the linear regression of modeled vs measured CO concentration (CO mod / CO meas ) was 0.44 with a slope of 1.0499±0.0007. The large variance highlights the spatial and temporal mismatch of model sampling and measurement, while the near-unit correlation coefficient indicates that on average modeled and measured CO agree. This agreement implies that the model-measurement comparison of many well-understood parameters should be accurate and that there is not a fundamental issue in comparing modeled and measured data between the two datasets.

Campaign-averaged comparison of aerosol surface area concentrations
First, we assess general agreement between campaignaveraged modeled and measured surface area concentrations. The observational data are from the 13 DISCOVER-AQ flights during July 2011 (1-29 July). The final research flight consisted of a dual highway leg conducted south along the Baltimore-Washington Parkway and north along I-95 at low altitude to compare the two roadways. Given the proximity to a large point source, this research flight was not included in the following analysis. The observational data from the UH-SAS included number, surface area, and volume measurements, though the focus of this study is primarily surface area (S a ).
In this analysis, we compare dry surface area concentrations as the DISCOVER-AQ measurements were made at an RH below 40 %-50 % and are thus considered dry, and we do not have direct measurements of particle growth factors for comparison of wet S a . However, it is important to note that S a used in E2 is the surface area concentration at ambient humidity, and any uncertainty in modeled aerosol hygroscopicity will propagate to the aerosol surface area concentration used in E2. Figure 4 shows the campaign-averaged vertical profile of both the measured dry UHSAS surface area (S a,meas ) and the modeled dry CMAQ surface area (S a,mod ), along with the interquartile ranges separated into 1 km altitude bins. Since the UHSAS measurement frequency is 1 Hz, the CMAQ modeled data are at 1 h time resolution, and the model samples a full domain in 12 km grid boxes compared to the smaller domain sampled by aircraft, there are many more measurement data points (N = 330 204) than comparable modeled data points (N = 5196) over the course of the flight campaign. In Fig. 4, the UHSAS measurements have been averaged to the spatial and temporal resolution of the model, such that the number of observational points is the same as the number of model points. The light gray error bars shown in Fig. 4a reflect the standard deviation of the data from the mean at that point in time and space. It should be noted that the error bars on this dataset are large, due to the spatial and temporal mismatch between model and measurement in a highly heterogeneous sampling domain. For both model and measurement, the surface area increases towards the surface, as is to be expected, and decreases with altitude. The vertical profile is well captured by the CMAQ model; however, there is a larger range of measured surface area concentrations than is seen in the corresponding model altitude bin.
As shown in Fig. 4, measured S a is on average larger than modeled S a , particularly at low altitude, where the ratio of the median, modeled S a to measured S a near the surface (z<1 km) is 0.47. This contrasts with what has been reported previously in the literature. For example, both Jaeglé et al (2018) and Simon et al. (2010) found that the median modeled near-surface S a was consistently larger than measured S a (S a,mod /S a,meas = 1.04-1.6).
The comparison between modeled and measured S a is also shown with histograms in Fig. 5 for all altitudes (5a, b) and for the surface-level measurements (0-1 km; 5c, d). While the number of points is not consistent between the model and measurement datasets due to the 12 km grid box constraint and time frequency in CMAQ, differences in the range in surface area concentrations are observed. Measured surface area concentrations range between 0-1.87 × 10 3 µm 2 cm −3 , with the vast majority of data below 420 µm 2 cm −3 , while modeled S a ranges between 0-300 µm 2 cm −3 .

Direct model-measurement comparison
A linear regression of CMAQ modeled dry aerosol surface area concentration and measured aerosol surface area concentration is shown in Fig. 6. The measured data have been averaged to the space and time domain of CMAQ (latitude, longitude, altitude, and time). The coefficient of determination (r 2 ) for the linear regression of modeled and measured S a was 0.52 with a slope of 0.437 ± 0.004, indicating that the measured S a is on average twice that of the model value.
The histogram of the surface area ratio (S a,mod /S a,meas ) throughout the campaign in Fig. 7 shows that the model underpredicts the measured surface area ratio in 81 % of the comparison points. The model underpredicts S a by a factor of 2 44 % of the time. In the following section, we explore potential causes for model-measurement disagreement, including model-measurement spatial and temporal differences, the spatial distribution of primary emissions, and/or treatment of secondary aerosol formation.

Discussion
In the following section, we explore the source of modelmeasurement discrepancy in S a discussed in Sect. 3. We begin by investigating the dependence of S a,mod /S a,meas on altitude and proximity to primary aerosol sources. We then  . Average vertical profile of (a) measured DISCOVER-AQ aerosol surface area concentration (S a,meas ) and (b) CMAQ aerosol surface area concentration (S a,mod ) over the entirety of the DISCOVER-AQ campaign. Each measured point is an average of the points included in that 4D index corresponding to CMAQ. The overlaid box plots show the median (red line within blue box) and interquartile ranges (blue box with the 25th percentile at the left end and 75th percentile at the right end) in 1 km altitude bins. The labels on the altitude axis lie at the midpoint of the 1 km altitude bin, and red crosses indicate outliers from the majority of the dataset at that altitude portion.
investigate the role of temporal and spatial resolution as CMAQ has a much coarser resolution, both spatially and temporally, than the measured data. Finally, we investigate the possibility of impacts on S a,mod /S a,meas from anthropogenic and biogenic indicators as they are tied to aerosol emissions. It is also important to acknowledge that some of the model-measurement disagreement could be due to processes not considered in the model such as phase separation, viscosity changes of aerosols, and direct modeling of clouds impacting cloud processing of aerosols, though the impacts of these processes are not investigated further in this work. The lack of a fourth mode below the Aitken mode for nucleation of particles and growth to the Aitken mode also impacts the accuracy of the size distribution within CMAQ and may explain a portion of the model-measurement disagreement, though it is known that improving the default parameteriza-

Dependence of S a,mod /S a,meas on altitude
Given the strong dependence of S a on altitude as shown in Fig. 4, we first explore if part of the variance in S a,mod /S a,meas shown in Fig. 6 can be explained by altitude. In Fig. 8, we show S a,mod /S a,meas as a function of altitude. As shown, there is an altitude dependence in S a,mod /S a,meas , where the mean, median, and interquartile range (25th to 75th percentile) are given in Table 2 for the 1 km altitude bins from 0-5 km. Model-measurement discrepancy in S a is largest at low altitude, where particle number concentrations are highest, proximity to particle sources is close, and heterogeneity in particle number concentrations is largest.

Dependence of S a,mod /S a,meas on spatial and temporal resolution
The 12 km × 12 km spatial resolution and 1 h temporal resolution of CMAQ is significantly larger and longer than the spatial and temporal resolution of the aircraft data, resulting in an inherent contrast in resolution between model and measurement that may play a role in the variance in S a,mod /S a,meas . Within any individual 12 km × 12 km model pixel, in the Baltimore-Washington sampling area, there is heterogeneity in S a as shown in Fig. 9. Sub-gridscale variability in S a would lead to increased variance in S a,mod /S a,meas but likely with a mean and median close to 1 Figure 8. Ratio of modeled to measured aerosol surface area concentration (S a,mod /S a,meas ) including median (red line within blue box), and interquartile ranges (blue box with the 25th percentile at the left end and 75th percentile at the right end) in 1 km altitude bins. The red crosses outside of the bounds of the plot denote outliers. The labels on the altitude axis lie at the midpoint of the 1 km altitude bin. Table 2. Mean, median, and interquartile range (range of 25th to 75th percentile) surface area ratio (S a,mod /S a,meas ) for each 1 km altitude bin from 0-5 km. 0-1 km 1-2 km 2-3 km 3-4 km 4-5 km S a,mod /S a,meas (mean) 0.56 0.97 0.89 1.05 1.42 S a,mod /S a,meas (median) 0.47 0.51 0.63 0.82 0.75 S a,mod /S a,meas (interquartile range) 0.33-0.65 0.33-0.80 0.37-0.97 0.56-1.31 0.53-2.14 if the domain sampling was not biased, comparable to what is observed in the CO comparison (Fig. 3), where the histogram of the CO data showcases a clear center around 1, with very few data points beyond a CO mod / CO meas value of 2.
To investigate the discrepancy more quantitatively, we compare the probability density functions (PDFs) of the model-to-measured CO, NO x , particle number concentration, and particle surface area concentrations. We use the PDFs to characterize the population of data based on the standard deviation and mean, which provides a quantitative and comparable assessment of the variability in the comparison. Assuming that the research flights sampled the CMAQ model domain in an unbiased way (i.e., flights did not target or avoid point sources) we would expect that the PDFs of the modeled-to-measured ratio in CO, NO x , number concentration, and surface area concentration would all center at 1 (or log 10 (1) = 0 as shown in Fig. 10), and the standard deviation of the distribution (σ ) would reflect heterogeneity in the scalar concentration at scales smaller than the model spatial or temporal domain. The histogram, PDF, and cumulative distribution function (CDF) for log 10 (S a,mod /S a,meas ) are shown in Fig. 10. The PDF of the histogram of log 10 (S a,mod /S a,meas ) has a mean (µ) of −0.26 and standard deviation (σ ) of 0.34. Comparison of the peak and width of the PDF of the model-to-measured ratios of CO, particle number concentration (N), and NO x provides an objective measure for assessing the impact of spatial and temporal resolution on the comparison. As shown in Fig. 11 and Table 3, the mean of each PDF is −0.0029, 0.047, and −0.14 for CO, N , and NO x . Each of these values is significantly closer to 0 than that measured for S a (−0.26), suggesting that the methodology for assessing model-measurement agreement should not be significantly impacted by model resolution, especially given Figure 9. Flight path from 28 July with overlaid UHSAS S a data within the 10th layer of CMAQ altitude (∼ 850-1000 m) and gridded CMAQ S a from layer 10 in the background. The CMAQ data are specifically at noon EST (16:00 UTC). Figure 10. Normalized histogram, probability density function (PDF), and cumulative distribution function (CDF) for the modeled-tomeasured aerosol surface area (S a, mod /S a, meas ). The histogram and PDF serve to indicate the median and spread of the dataset, while the CDF indicates the percentage of data encompassed at a certain data threshold. the large range in atmospheric lifetimes for CO, N, and NO x . However, without utilizing a smaller model resolution to directly test for impacts of the grid size, resolution issues cannot be fully ruled out. Interestingly, the mean N mod /N meas is close to 1 (10 0.047 = 1.11), where a value closer to 1 indicates agreement between model and measurement and that closer to zero indicates a large discrepancy between datasets. The mean N mod /N meas is significantly different than that observed for S a,mod /S a,meas (10 −0.26 = 0.55), perhaps suggesting that the model-measurement disagreement is related to the shape of the size distribution, either due to the a priori emissions size distribution or secondary aerosol processes.
Also shown in Fig. 11 and Table 3, the standard deviation (σ ) of the PDF for the CO, N, and NO x model-tomeasurement ratios is 0.15, 0.27, and 0.34 respectively. The standard deviation of the PDF of [CO] mod / [CO] meas is the narrowest, likely reflecting the longer lifetime of CO and a damping of sub-grid-scale variability of CO in each pixel. The width of the N and S a ratio distributions is comparable, again highlighting that the deviation of S a,mod /S a,meas from 1 may reflect differences in the modeled-measured aerosol size distribution (as shown in Fig. 1). Collectively, this analysis suggests that there is not a significant bias on average in the methodology based on model resolution and that the apparent differences in the number and surface area model- Figure 11. Normalized probability density functions for the log 10 of the model-to-measurement ratio in particle number concentration (yellow), carbon monoxide (red), NO x (purple), and particle surface area concentration (blue). measurement ratios are most likely driven by the shape of the underlying aerosol size distribution.
It is interesting to note that the model-measurement agreement in particle number concentrations is significantly better than that of particle surface area concentrations, implying that the differences in S a may be related to the shape of the aerosol size distribution. There have been numerous analyses of model-measurement comparison of the aerosol number concentration and size distributions specific to CMAQ (Elleman andCovert, 2009a, 2010;Kelly et al., 2011;Zhang et al., 2010b). Elleman and Covert (2009a) compared the 4 km CMAQ v4.4 model's size distributions to measurement data from the 2001 Pacific Northwest and Pacific field campaigns. The Pacific Northwest field campaign (PNW2001) was conducted in August 2001 with both airborne and ground-based measurements of pollution in the Puget Sound urban area around Seattle, Washington, and included northwest Oregon, western Washington, and southwest British Columbia. PNW2001 was conducted to complement that of the Pacific 2001 field campaign, which was a major regional air pollution study in the Lower Fraser Valley of metropolitan Vancouver, British Columbia, focusing on ground-based observations, conducted from 10 August to 2 September 2001. Analyses of these two campaigns and model predictions found that CMAQ underpredicted airborne particle number concentrations by a factor of 10-100 and was least accurate in the smallest size mode: the Aitken mode (Elleman and Covert, 2009a). The underprediction was consistent between measurement studies and did not depend on time and location. Zhang et al (2010b) compared CMAQ v4.4 to the 1999 Southern Oxidants Study and corroborated the findings of Elleman and Covert (2009a), that the Aitken mode was significantly underpredicted in total number concentra-tion (varying by up to 3 orders of magnitude), yielding an overall underprediction of PM 2.5 in Atlanta.
In a follow-up analysis, Elleman and Covert used updated emissions size distributions to compare a summer 2001 case study comprising data from a period of August 2001 with airborne and surface measurements from Pacific 2001 and PNW2001, as was used in the original base case to CMAQ (Elleman and Covert, 2010). CMAQ still underpredicted the observable aerosol number concentrations by about one order of magnitude with updated emission size distributions, which was an improvement from the 1-2 orders of magnitude previously but pointed to issues within the model's prediction of aerosol number. Kelly et al. then utilized the updated emissions size distributions from Elleman and Covert as well as the original distributions with in CMAQ to compare to the 1998 California Regional PM 10 / PM 2.5 Air Quality Study (CRPAQS) (Kelly et al., 2011). It was noted that the simulated number size distributions from the improved emission simulation were about 20 % lower than the observations, while the standard-emission simulation was about a factor of 5 lower than the observations, confirming that the updated emissions improved model-measurement agreement. The observed shape of the distributions also better matched the updated emissions simulations. The improvement in model-measurement agreement showcases the necessity for accurate size distributions and emissions within CMAQ and the impact on S a data.
4.3 Dependence of S a,mod /S a,meas on secondary aerosol production Two potential reasons for the discrepancy between mean S a,mod /S a,meas (0.55) and mean N mod /N meas (1.11) are (1) uncertainty in the size distribution of primary aerosol par- ticles and (2) uncertainty in secondary aerosol production (i.e., the condensation of low-volatility material to existing aerosol particles). To investigate these two potential sources, we investigate the response of S a,mod /S a,meas to photochemical age. We start by looking at the response of S a,mod /S a,meas to the NO x /HNO 3 ratio (Fig. 12), where high NO x /HNO 3 in this sampling region is indicative of air masses near an anthropogenic source, similar to that of a NO x /NO y clock (Kleinman et al., 2008;Pan et al., 2015;Tie et al., 2009). If the aerosol surface area of primary emissions is underestimated in the model, we would expect S a,mod /S a,meas to be biased low at high NO x /HNO 3 . If the condensation rate of low-volatility anthropogenic species is underestimated in the model, we would expect S a,mod /S a,meas to decrease with a decreasing NO x /HNO 3 ratio as the air mass ages. As shown in Fig. 12, S a,mod /S a,meas is remarkably constant over a wide span of NO x /HNO 3 ratios (0.5-10), before tending to larger values at low NO x /HNO 3 . This trend is also seen in the dependence of N mod /N meas on NO x /HNO 3 , suggesting a potential discrepancy in the modeled-measured lifetime of aerosol or treatment of background aerosol particles in the region. This trend suggests that an underestimate in the condensation of low-volatility gas-phase compounds of anthropogenic origin is not a significant driver of modelmeasurement discrepancy in S a . Rather, the persistent underestimate of S a in the model at high NO x /HNO 3 points to uncertainty in the size distribution of primary emissions or secondary aerosol formed at the early stages of oxidation.
To address secondary aerosol formation more generally, we also assessed the response of S a,mod /S a,meas to temperature as equilibrium partitioning in the gas phase based on temperature and RH is a primary driver of secondary aerosol formation. No statistically significant trend in S a,mod /S a,meas was observed over the range of temperatures observed during DISCOVER-AQ.
To further investigate secondary aerosol formation as a factor in driving the discrepancy in modeled S a , we assess the response of S a,mod /S a,meas to isoprene oxidation products in the aerosol phase as an example of biogenic VOC oxidation. As shown in Fig. 13, there does not appear to be a trend with concentration of isoprene SOA. Though we cannot test for all biogenic oxidation products, the lack of a trend with isoprene SOA in the aerosol phase may mean that the discrepancy in S a,mod /S a,meas is not biogenic in nature. There is also no trend with the total SOA concentration parameterized in CMAQ, shown in Fig. 13b.

Implications for the treatment of heterogeneous reactions in air quality models
As shown in E2, the rate constant for the heterogeneous loss of gas-phase compounds to aerosol (k het ) is linearly dependent on both aerosol surface area concentration (S a ) and the reactive uptake coefficient (γ ). In Sect. 3, we showed that the average S a,mod /S a,meas , determined from the regression of the average model and measurement S a was 0.437, which would result in an underestimate by approximately a factor of 2 in k het . A similar underestimation has been seen previously in select ground-based (Ghim et al., 2017;Liu and Zhang, 2011;Prank et al., 2016;Wang et al., 2021;Yu et al., 2008aYu et al., , 2012Yu et al., , 2008bZhang et al., 2019Zhang et al., , 2006Zhang et al., , 2010c and aircraft-based (Baker et al., 2018;Chen et al., 2020) studies of CMAQ prediction of PM 2.5 , which may point to a larger issue in model representation of particle mass. For some heterogeneous reactions, where the reactive uptake coefficients are well parameterized in model (e.g., extremely low-volatility species), uncertainty in S a likely determines uncertainty in k het . To assess the dominant source of uncertainty in model-derived k het , we focus on the N 2 O 5 system as an example. Recently, McDuffie et al. (2018) assessed the accuracy of model parameterizations of γ (N 2 O 5 ) using ambient observations from the WINTER campaign. In Fig. 14a, we show the histogram and PDF of the ratio of γ (N 2 O 5 ) mod calculated in CMAQ using the Bertram and Thornton (2009) parameterization for the WINTER campaign, compared with γ (N 2 O 5 ) meas , which was determined in McDuffie et al. (2018) from an observationally constrained analysis of flight data via a box model solved along the flight path to determine the uptake coefficient. The PDF of the directly compared model-measurement ratio is centered above zero (µ = 0.22 or γ (N 2 O 5 ) mod /γ (N 2 O 5 ) meas = 1.65) (Fig. 14, Table 3). Interestingly, since k het (N 2 O 5 ) is proportional to the product of S a and γ (N 2 O 5 ), the underestimate in model S a is compensated for by an overestimate in γ (N 2 O 5 ) in the mean state if the underestimate in model S a is consistent for WINTER, though that is not necessarily the case. While the width of the PDF of log 10 (γ (N 2 O 5 ) mod /γ (N 2 O 5 ) meas ) for WINTER is similar to that seen for S a for DISCOVER-AQ, it should be noted that neither the histogram of the γ (N 2 O 5 ) ratio or the S a ratio is easily fit to a Gaussian peak shape. As shown in Fig. 14, the histogram of the log 10 (γ (N 2 O 5 ) mod /γ (N 2 O 5 ) meas ) for WIN-TER has a broader range of values than that of S a in this study. Collectively, this analysis highlights that while model uncertainty in k het (N 2 O 5 ) is largely a function of quality of the γ (N 2 O 5 ) parameterization, future improvements in modeled surface area concentrations, particularly in urban environments, will also result in more accurate representations of heterogeneous chemical reactions.

Summary
This study examined the ability of the CMAQ model to accurately predict aerosol surface area as it directly affects heterogeneous chemistry within the model. The CMAQ data were compared to dry measured aerosol surface area data from the 2011 DISCOVER-AQ campaign utilizing a UH-SAS. Showing a discrepancy between modeled and measured dry aerosol surface area, S a,mod and S a,meas , respectively, are modestly correlated (r 2 = 0.52) and on average agree to within a factor of 2 (S a,mod /S a,meas = 0.44) over the course of the 13 research flights. However, there was a strong correlation between measured and modeled number concentration (N mod /N meas = 0.87, r 2 = 0.63). When looking into possible sources of the discrepancy, there was not a Figure 14. Normalized probability density functions for the log 10 of the model-to-measurement ratio in (a) γ (N 2 O 5 ) from the WINTER campaign and (b) particle surface area concentration, S a , from this study. strong dependence on photochemical age or secondary biogenic aerosol concentration. The strong agreement in aerosol number concentration may indicate that the modeled size distribution contributes to the observed discrepancy, though the exact source of discrepancy is outside of the scope of this study.
The discrepancy in aerosol surface area was also compared to that of the reactive uptake coefficient of N 2 O 5 during the 2015 WINTER campaign due to the fact that the uptake coefficient also directly impacts heterogeneous reaction rates. The uncertainty in the modeled heterogeneous chemistry remains primarily driven by that of the uptake coefficient, as the uncertainty in those values is larger than that seen by S a . Model improvements to aerosol surface area concentrations along with improvements to the parameterization of reactive uptake coefficients will greatly impact the accuracy of heterogeneous chemistry within regional models.  (Bertram and Bergin, 2022).