Intercomparison of biomass burning aerosol optical properties from in-situ and remote-sensing instruments in ORACLES-2016

The total effect of aerosols, both directly and on cloud properties, remains the biggest source of uncertainty in anthropogenic radiative forcing on the climate. Correct characterization of intensive aerosol optical properties, particularly in conditions where absorbing aerosol is present, is a crucial factor in quantifying these effects. The Southeast Atlantic Ocean (SEA), with seasonal biomass burning smoke plumes overlying and mixing with a persistent stratocumulus cloud deck, offers an excellent natural laboratory to make the observations necessary to understand the complexities of aerosol-cloud-radiation 5 interactions. The first field deployment of the NASA ORACLES (ObseRvations of Aerosols above CLouds and their intEractionS) campaign was conducted in September of 2016 out of Walvis Bay, Namibia. Data collected during ORACLES-2016 are used to derive aerosol properties from an unprecedented number of simultaneous measurement techniques over this region. Here we present results from six of the eight independent instruments or instrument combinations, all applied to measure or retrieve aerosol absorption and single scattering albedo. Most but not all of the biomass10 burning aerosol was located in the free troposphere, in relative humidities typically ranging up to 60%. We present the single scattering albedo (SSA), absorbing and total aerosol optical depth (AOD and AAOD), and absorption, scattering, and extinction Ångström exponents (AAE, SAE, EAE) for specific case studies looking at near-coincident and -colocated measurements from multiple instruments, and SSAs for the broader campaign average over the monthlong deployment. For the case studies, we find that SSA agrees within the measurement uncertainties between multiple instruments, though, over all cases, there is no 15 1 Atmos. Chem. Phys. Discuss., https://doi.org/10.5194/acp-2019-142 Manuscript under review for journal Atmos. Chem. Phys. Discussion started: 20 February 2019 c © Author(s) 2019. CC BY 4.0 License.

the different instruments, and since the retrievals of AOD and SSA are not necessarily uncoupled, AOD can be a useful diagnostic. However, as we focus here on intensive aerosol properties, the climatological AOD values are not discussed. Finally, we discuss the Ångström exponents from each retrieval method (AAE, SAE, and EAE for the absorption, scattering, and extinction Ångström exponents, respectively). Ångström exponents are given by the log-space slope of absorption, scattering, or extinction aerosol optical depths versus wavelength, and are frequently used to characterize atmospheric aerosol. As AAE is 5 primarily (though not entirely) determined by aerosol composition Bahadur et al., 2012) and SAE is primarily associated with aerosol size, these parameters can be instructive in understanding the nature of the aerosol in question.
EAE is shown as well, to place these results in the context of other remote sensing results which measure extinction, though EAE closely follows SAE since extinction is dominated by scattering for almost all atmospheric aerosols. Accurate representation of the magnitude and variability of these aerosol properties on a regional scale has significant implications for aerosol 10 radiative effects calculated using climate models and/or satellite data. As different instruments (such as those incorporated into this work) may rely on different physical measurement principles, each with different considerations and limitations (Table 1), an understanding of how distinct observations compare to one another is a critical piece in gaining an understanding of our observational limitations for key parameters and hence in calculating aerosol effects and their uncertainties. This paper presents data from the NASA ORACLES (ObseRvations of Aerosols above CLouds and their intEractionS) cam-Compared with other regions of the world, there have been relatively few studies measuring aerosol properties (microphysical or radiative) either directly over the southeast Atlantic or near their source in sub-Saharan Africa. Nonetheless, there is still a good deal of previous work which can help to place the ORACLES observations in context. A key observational dataset is from the Southern AFricAn Regional science Initiative (SAFARI 2000) campaign which used aircraft to measure aerosol properties over and in close proximity to the coast of southern Africa in September 2000, including both aged aerosol and fresh 5 biomass plumes (e.g., Haywood et al., 2003;Schmid et al., 2003;Leahy et al., 2007;Russell et al., 2010). SAFARI 2000 also resulted in the establishment of several sites of the AErosol RObotic NETwork (AERONET) on the African continent; these allow for longer-term climatological analysis near emission sources (e.g., Queface et al., 2003;Magi and Hobbs, 2003;Swap et al., 2003;Eck et al., 2003Eck et al., , 2013. Passive and active satellite observations have also been used to detect and quantify aerosol above clouds over the SEA region (e.g., Chand et al., 2008Chand et al., , 2009Waquet et al., 2013;Jethva et al., 2014;Torres et al., 2012;10 Liu et al., 2015), though such studies typically need to assume (through lookup tables or models) cloud and aerosol properties as well as their relative geometry. In this context, the ORACLES aircraft-based dataset, designed to sample the region of highest cloud cover and BB smoke concentration, provides important and heretofore unique observations of both aerosol and cloud over the southeast Atlantic Ocean, due to both improved instrumentation since SAFARI-2000, and because ORACLES focused on regions farther off the southern African coast than previously measured. 15 The previous studies give a somewhat limited yet still useful view of the temporal and seasonal trends in SSA for a limited set of locations within this region (Figure 1). In the SAFARI 2000 campaign, aircraft instrumentation was used to sample both the aged aerosol plume (a few days old) as well as fresh biomass aerosol (a few minutes old) over Namibia and the coastal SEA. The mean SSA of the aged haze was reported by Haywood et al. (2003) as 0.91, 0.90, and 0.87 at 450, 550, and 700 nm using a combination of an in situ aircraft-based Particle Soot Absorption Photometer (PSAP) and nephelometer. However, 20 Leahy et al. (2007) reported a lower "best estimate" (campaign-average) SSA 550nm of 0.85 ± 0.02 using the SAFARI airborne flux radiometry and in situ measurements combined with ground-based AERONET retrievals (the individual flux radiometry estimate is described in Bergstrom et al. (2003) and Russell et al. (2010) and included separately in Figure 1). For a SAFARI flight specifically targeting fresh biomass burning smoke, the reported SSA was lower, at 0.86, 0.84, and 0.80 . Thus, even within a single campaign, past work has shown a sizable range in BB aerosol properties. It should be noted 25 that the SAFARI over-ocean flights were conducted within a more southern region (generally 15 • -25 • S) than the heart of the seasonal aerosol plume typically described as extending approximately 0 • -15 • S (Zuidema et al., 2016). The ORACLES sampling area spans both these latitude ranges (0 • -25 • S), but frequently sampled westward of the SAFARI region. SAFARI also made many measurements over the continent, closer to biomass burning sources, whereas the ORACLES measurements were made entirely over the ocean. It is therefore likely that the SAFARI measurements were of generally younger aerosol than 30 the ORACLES measurements, including the aerosol identified as 'aged' within Haywood et al. (2003).
An AERONET-based climatology by Dubovik et al. (2002) indicated that the African BB site (Zambia, 15 • 15'S, 23 • 09'E) had the lowest SSA (i.e. strongest absorption) and the strongest SSA spectral dependence (i.e., steepest slope, SSA 440nm =0.88 to SSA 1020nm =0.78) among the four geographical BB regions considered. This was attributed to the greater flaming versus smoldering characteristics of these fires compared with other regions. Using data from the same long-term AERONET site at 35 Mongu, Zambia, Eck et al. (2013) found a seasonal progression of increasing full-column SSA (decreasing relative absorption) over the July-to-November burning season based on data from 1997 to 2005. SSA increased from SSA 440nm =0.84 in July to SSA 440nm =0.93 in November (Figure 1). This same pattern was observed in a study of near-surface in situ SSA measurements at Ascension Island, showing monthly mean SSA 529nm increasing from 0.78 in August 2016 to 0.83 in October 2016 (Zuidema et al., 2018). We note this site is substantially westward (downwind) of the majority of ORACLES flights, and thus may 5 represent more aged aerosol. Eck et al. (2013) also reported an average September SSA similar to that reported by Dubovik et al. (2002) for Zambia: 0.88, 0.83, 0.81, and 0.79 at 440, 675, 870, and 1020nm, respectively. They also showed that within a given month, a south-to-north increase in SSA 388 is observed, as derived from the satellite-based OMI (Ozone Monitoring Instrument) retrievals. This was hypothesized to be due to shifts in fuel type over the BB season due to both anthropogenic factors (i.e. timing/practices of agricultural burning) and environmental factors (i.e. relating to moisture variability of potential 10 fuel types throughout the season). Variations in atmospheric humidity have also been shown to affect aerosol optical properties, including both aerosol scattering and absorption (e.g., Langridge et al., 2011;Lack and Cappa, 2010). Figure 1 summarizes the previous work in this region, and indicates that the SSA over southern Africa increases through the burning season (monthly averages from Eck et al., 2013), possibly due to changes in biomass burning fuel composition, and also that SSA may increase with distance from fire, i.e. as the aerosol ages (fresh versus aged plume of Haywood et al., 2003). However, we emphasize 15 that the "aging" time scales observed in SAFARI-2000 were much shorter (e.g. ∼5 hours in Abel et al. (2003); within a few days in Haywood et al. (2003)) than those seen in ORACLES (∼2-15 days, e.g. Dobracki et al., 2019). This may account for the opposite sign (higher SSA for younger aerosol) shown in the latter work. Thus, it is clear that multiple physical factors may be responsible for variability in aerosol optical properties.
Further discussion of previous studies of SEA aerosol in the context of the ORACLES results is found in Section 4. 1.2 Impacts of SSA on aerosol radiative effects Accurate representation of scattering versus absorption within biomass burning aerosol scenes has implications in subsequent calculations of aerosol radiative effects over the SEA. The range of SSA values shown in Figure 1 encompasses the possibility of both positive (smaller values of SSA) or negative (higher values of SSA) net direct radiative effect when this aerosol is above the partly-cloudy skies of the SEA. The specific value of SSA where the radiative effect changes sign will depend on 5 the cloud cover fraction; for example, in Chand et al. (2009) the aerosol direct effect was positive for a mid-visible SSA of 0.85 as long as cloud fraction was greater than 40%. Wilcox (2012) found that for a perturbation to SSA of ±0.03, the local direct aerosol radiative effect was changed by 10-20 W/m 2 , though this also depends on AOD and cloud albedo conditions. A perturbation of this magnitude encompasses only half the range between the lowest and highest September SSA values shown in Figure 1. As described above, some of this range may reflect seasonal variations in SSA, but even for September alone, 10 the previously measured values of SSA 550nm spanned 0.84-0.
90. An open question is to what degree the range in previous observations represents real variability in the SSA of smoke in this region or whether a significant part of the range is due to differences in measurement techniques and in measurement conditions; for example, the AERONET and flux radiometry retrievals are for ambient-RH aerosol, whereas the in-situ (SAFARI) measurements are of dry/low-RH aerosol. We endeavor to explore these questions in the following sections.

ORACLES overview
The overarching goal of ORACLES is to make high-quality airborne observations of aerosols and clouds in the SEA to gain a better understanding of the complex processes (direct, indirect, and semi-direct) by which BB aerosols, notable for their strong absorption of solar energy, affect radiation both directly and through their impacts on clouds (Zuidema et al., 2016). The project included three field deployments of approximately one month each: September 2016 based out of Walvis Bay, Namibia; August 20 2017 based out of São Tomé, São Tomé and Príncipe; and October 2018 again based out of São Tomé. The ORACLES study area time zone spans UTC and UTC+1. In the current paper we focus on the first field deployment of ORACLES in September of 2016 out of Walvis Bay, Namibia. This deployment included two NASA aircraft: a P-3 for full atmospheric profiling and low/mid-level in situ sampling, and a high-altitude ER-2 for remote sensing observations. The P-3 aircraft was flown with a suite of in situ and remote-sensing aerosol, cloud, radiation, and meteorological instruments, while the ER-2 carried only 25 remote-sensing instrumentation. Data were collected over 15 P-3 and 12 ER-2 flights, each 7-9 hours in duration. The 2017 and 2018 deployments included the P-3 only. Hence, the simultaneous deployment of both the P-3 and the ER-2 in the 2016 deployment created a unique testbed for evaluating remote sensing retrievals of aerosol and cloud properties from a variety of instruments that have potential for future space flight.
Of the in situ and remote-sensing instruments included in ORACLES, eight teams (including the complementary AERONET 30 sites, several of which were established to coordinate with the ORACLES deployments) observe or derive aerosol absorption (either locally or as column AAOD) and the related SSA parameter (Table 1). All considered remote-sensing instruments report an AOD product, with the exception of the Solar Spectral Flux Radiometer (SSFR), which uses AODs from the Spectrometers for Sky-Scanning, Sun-Tracking Atmospheric Research (4STAR) as input (each of these instruments is described below).
Remote-sensing SSA are reported as column-integrated values; for in-situ measurements, the SSA presented is the extinctionweighted profile-average value (i.e. SSA calculated at each altitude and weighted by the profile of extinction; Section 2.1.3).
Unless otherwise noted, reported Ångström exponents are calculated using a logarithmic fit of the AOD versus wavelength using all available wavelengths between 440 and 675nm (inclusive), to have the most comparable quantity between instruments 5 and to reduce uncertainty compared with a simple 2-wavelength calculation of AE. While previous studies have examined the agreement between the retrievals of a few of these instruments at a time (e.g., Sedlacek and Lee, 2007;Leahy et al., 2007;Knobelspiesse et al., 2011), an intercomparison including so many methods within one campaign has not previously been performed, due to the logistics of assembling such a comprehensive suite of instruments including several newly-developed algorithms. Specific instrument details are given in the following section. A more detailed overview of the campaign design 10 and goals may be found in Zuidema et al. (2016), and an overview of the results will be the focus of a future paper.

Instruments and data
In this section we offer descriptions of the instruments and data included in this paper, as summarized in Table 1. Table 1. Overview of ORACLES instruments as used to quantify SSA during the ORACLES campaigns, and the method used. Not all measurement approaches are included in this paper; some are still in process and will be presented in future publications. Instruments/methods presented in this paper are indicated in bold. Full archival data citations are given under Acknowledgments. sun photometer which can make direct-beam (sun-tracking mode) measurements for retrieval of column AOD and trace gases (Dunagan et al., 2013;Shinozuka et al., 2013;Segal-Rosenheimer et al., 2014) or below-cloud measurements of transmittance for derivation of cloud optical properties (zenith mode). Under certain level-flying conditions, 4STAR can also perform an 5 AERONET-like sky scans in either the principal-plane or almucantar (sky scanning mode), which provide the data used here.
The 4STAR sky scans are processed using a modified version of the Version 2 AERONET retrieval algorithm described in Dubovik and King (2000), which retrieves aerosol size distributions, refractive indices, SSA, and AAOD, among other parameters. All scans are run through the algorithm with the minimum scattering angle set to 3 degrees to avoid stray light from the sun entering the 4STAR optical aperture. Scene (i.e., surface plus atmosphere) albedo is provided by SSFR measurements 10 (described below). A notable modification of the 4STAR retrievals compared with AERONET retrievals is in the input wavelengths. While AERONET uses radiances measured at specific and discrete wavelengths (440, 675, 870, and 1020nm), with the hyperspectral 4STAR we are able to use AODs and sky radiances measured at a different (or larger) selection of wavelengths.
Due to suspected stray light contamination within the 4STAR spectrometer around 440nm, a particular sensitivity to the 440nm channel was observed, which in some cases resulted in an anomalously low SSA (high AAOD) at shorter wavelengths com-15 pared with retrievals run without 440nm. To avoid this issue, the results presented in this paper use a modified set of inputs at wavelengths of 400, 500, 675, 870, and 995nm, with 400 and 500nm replacing 440nm. Note the longest wavelength of 995nm replaces the AERONET 1020nm due to the wavelength limits of the 4STAR visible spectrometer.
4STAR executed a total of 174 sky scans in ORACLES-2016, of which 38% (66) met the following quality control (QC) criteria (adapted from the AERONET QC available at https://aeronet.gsfc.nasa.gov/new_web/PDF/AERONETcriteria_final1. manual inspection of the retrieval output for reasonable residual sky radiance error as a function of scattering angle (i.e. uniform aerosol conditions, no cloud contamination) Note that, unlike in the AERONET archive, we consider principal plane as well as almucantar scans when the above criteria are met. We do this because due to the timing of ORACLES flights, 4STAR sky scans were largely near solar noon, limiting the 30 angular range available in almucantar scans. In addition to the scans meeting the above QC measures, another 16 (9% of the scans) were included based on manual QC inspection. This generally involved cases with AOD between 0.2 and 0.4. We retain these lower-AOD, manually QC'd scans to explore the reliability of the retrievals under conditions of lower aerosol loading.
This QC procedure retained 82 sky scans in total (47% of all scans) which produced credible retrievals. In the present study, we focus on a further subset of 75 lower-altitude (< 3km), QC-screened sky scans. This is because our current interest is in retrievals of aerosol properties through the entire aerosol plume, and thus the low-altitude sky scans are most comparable to the other instruments presented here. 49 of the QC-passed scans corresponded to valid in situ data and are used in the aggregate 5 comparison figures; an additional 26 did not correspond in space and time to the other instruments, but are included in the broader analysis of Section 4. The available sky scan retrievals and their co-location with other instruments are summarized in Table 2.
The 4STAR uncertainties presented here are quantified by a sensitivity test based on AOD and radiance uncertainties. AOD uncertainties used are the archived wavelength-dependent uncertainties , and uncertainties in the sky ra-10 diances have been quantified through laboratory calibration using a NIST-traceable 12-lamp 36-inch integrating sphere (Brown et al., 2005). AOD uncertainties are dependent on wavelength, time, and solar zenith angle (geometrical air mass factor), as well as potential window contamination in some cases. These values were typically between 0.01 and 0.02, ranging from a low of 0.008 to a high of 0.037 in an extreme case. Radiance uncertainties are wavelength-dependent, but are constant over the entire campaign, ranging between 1.0% and 1.2% for 470-995 nm. To test the impact of these two types of errors, the sky 15 scan inversion code is run separately for an addition or subtraction case for each of these two parameters (i.e. four cases), and the result is added in quadrature for each of the upper and lower bounds. Note that an increase in AOD (without perturbing radiances) results in a lower SSA and higher AAOD, while an increase in radiance (without perturbing AOD) results in a higher SSA and lower AAOD. Uncertainties in SSA are dominated by the AOD terms, with smaller contributions from the uncertainty in the measured sky radiances. Other sources of uncertainty are not explicitly quantified in the present work.

AirMSPI
The Airborne Multi-angle SpectroPolarimeter Imager (AirMSPI) is an imaging polarimeter which has flown on the ER-2 aircraft since October 2010, including in ORACLES-2016 (Diner et al., 2013). The instrument contains a pushbroom camera inside a programmable gimbal for along-track view angles between ±67 • , which is typically used for the observation of a 10 × 10-km target from 9 discrete view angles ("step-and-stare mode"; 10-m resolution), or a 100-km-long target under 25 a continuously changing view angle ("sweep mode"; 25-m resolution). Further instrument details may be found in Diner et al. (2013). The sweep view mode was adopted for cloud and above-cloud aerosol observations during the ORACLES field campaign. The AirMSPI data presented here are from a coupled stratocumulus cloud and above-cloud aerosol retrieval based on an optimization approach (Xu et al., 2018). The retrieval is run by fitting polarized radiance in a wide scattering angular range (e.g. from ∼ 90 • to 180 • ) at three spectral bands centered at 470, 660, and 865 nm. The retrieved above-cloud aerosol 30 properties include refractive index, size distributions, and aerosol total volume concentration. The retrieved cloud properties include cloud-top droplet size distribution, cloud-top height, and cloud optical thickness (cloud optical thickness is derived by fitting the radiance in the three polarimetric bands). Non-spherical particles are not accounted for in the current retrievals.
The column effective AOD and SSA are calculated using Mie theory. Retrieval uncertainties are reported at the polarimetric wavelengths, and are determined by propagating the instrument errors into the retrieval uncertainties. For example, to get the retrieval uncertainty for above-cloud AOD and SSA, the fitting residual plus instrument bias are multiplied with the inverse of the Fischer matrix evaluated at the retrieved solution. Then chain rule is applied to propagate the error of the retrieved aerosol properties in the solution vector to AOD and SSA, which further involves the use of a Jacobian matrix containing derivatives of AOD and SSA with respective to aerosol properties (Xu et al., 2018(Xu et al., , 2019. The Hawaii Group for Environmental Aerosol Research (HiGEAR) operated several in-situ instruments on the P-3. Total and sub-micrometer aerosol light scattering coefficients (σ scat ) were measured onboard the aircraft using two TSI model 3563 3-wavelength nephelometers (at 450, 550, and 700 nm) corrected according to Anderson and Ogren (1998). In addition to the TSI nephelometers used in the present work, two single wavelength nephelometers (at 550 nm, Radiance Research, M903) 10 were operated in parallel to study the increase in light scattering as function of relative humidity (RH). The humidified M903 nephelometer was operated near 80% RH while the dry unit was maintained below 40% (Howell et al., 2006). Discussion of the impacts of aerosol humidification in the context of comparison with remote-sensing retrievals at ambient RH is found in Section 4.2.
Light absorption coefficients (σ abs ) at 470, 530, and 660 nm were measured using two Radiance Research particle soot 15 absorption photometers (PSAPs). The humidity within the PSAP was not explicitly controlled, but the PSAP optical block was heated to approximately 50 • C to reduce artifacts which would result from a changing RH; this had the effect of reducing relative humidity in this instrument to much lower than the 40% within the nephelometers. The PSAP absorption corrections were performed according to an updated algorithm (Virkkula, 2010). Instrumental noise levels are 0.5 Mm −1 for a 240-300 s sample average, comparable to values reported previously (Anderson et al., 2003;McNaughton et al., 2011). In this paper, we 20 primarily present results calculated with the wavelength-averaged (as opposed to the wavelength-specific) correction factors presented in Virkkula (2010). Further discussion of this decision and the differences between the two corrections are shown in Appendix A1.
SSA was calculated using the measured PSAP absorption combined with dried (RH<40%) TSI nephelometer scattering interpolated to PSAP wavelengths. In the comparison cases (i.e. column values), SSA is the extinction-weighted (extinction = 25 scattering + absorption) profile average according to the following procedure. To reduce noise, the reported 1-second scattering and absorption data, corrected to ambient temperature and pressure, were first averaged to 30s box-car averages. Time-averaged data are then filtered to reject cases where σ scat,530nm,30s < 10 Mm −1 to assure an adequate signal-to-noise ratio. Data are also discarded if >20% of the archived 1s SSAs are undefined over the averaging period, to account for manual quality flagging of SSA due to, e.g., calibration periods. Applying different averaging times (10-60 s averages) did not result in appreciably 30 different SSA and Ångström exponents for the column-average. SSA is then calculated as SSA= σ scat /(σ scat + σ abs ), and then arithmetically weighted by its extinction in computing a profile average. In the AOD proxy shown in Section 3.1, due to the vertical integration involved, scattering and absorption data were instead averaged into equal 100-m vertical bins (approxi-mately 15 s of flight time) and integrated over the full profile; in this specific case, only profiles with altitudes spanning at least 1.6 km to 5.1 km are considered (Table 2).
Unless otherwise specified, the "in situ" data reported in this paper are from the PSAP+Nephelometer combination, with Virkkula wavelength-averaged corrections applied. Due to the uncertain nature of the impacts of humidification on each of the scattering and absorption components individually and how they affect the resulting SSA, we leave these in situ data as dried.

5
The basis for and implications of this decision are discussed in more detail in Section 4.2.

PTI + Nephelometer
The second in situ measurement of aerosol absorption uses data from the airborne photothermal interferometer (PTI). The PTI measures aerosol light absorption by combining photothermal spectroscopy and laser interferometry (Sedlacek, 2006;Sedlacek and Lee, 2007). The hallmark of the PTI, and other photothermal-based techniques, is a complete immunity to light scattering. Due to these performance impacts, the 2016 PTI data are flagged as suspect. As such, the PTI data are archived as 30s 20 averages, with an uncertainty of ∼4 Mm −1 subjected to the 1s-measurement noise floor, and with data available only for some periods and for a subset of P-3 flights. As these data are included in the ORACLES 2016 archive, we include a discussion of them in this comparison study, though with the above caveats. In the comparison case study which has PTI data available (Section 3.1.2), the extinction-weighted SSA was only able to be calculated from the PTI absorption using data within the time period of the P-3 comparison case (profile and in-plume leg), rather than a complete profile as with the PSAP+Neph SSA.

25
Due to the limitations with these data as described above, we do not show the PTI+Neph SSA for this case, but discuss it in qualitative terms in Section 3.1.2. A direct comparison of the absorption as measured by the two in-situ instruments (PTI and PSAP) is shown in Appendix A2.
A redesign of the PTI, based in part upon lessons learned during the 2016 campaign, saw improved performance for the 2018 ORACLES campaign and will be the subject of another paper.

RSP
The NASA GISS Research Scanning Polarimeter (RSP) is a multi-angle, multi-spectral polarimeter aboard the ER-2 that measures the Stokes parameters I, Q, and U at ∼150 angles between ± ∼ 60 • in the along-track direction, in 9 spectral channels centered at 410*, 469*, 555*, 670*, 864*, 960, 1594*, 1880, and 2264* nm (Cairns et al., 1999). The seven channels denoted by an asterisk have negligible or weak and correctable water vapor absorption and were used in the microphysical aerosol properties from polarimetry (MAPP) retrieval. The RSP MAPP algorithm (Stamnes et al., 2018) was adapted for ORACLES observations of aerosols above water by incorporating into the retrieval an aerosol profile consisting of two layers (a top layer of fine mode aerosol located at 2.25 -5.5 km, and a base layer of coarse mode (sea salt) aerosol located at 0 -1 km) as 5 approximately identified by the High Spectral Resolution Lidar (HSRL-2), which also on the ER-2 aircraft. The aerosols are modeled as a bimodal population of spherical fine-and coarse-mode aerosols, with each mode defined by a lognormal size distribution. The fine mode aerosol effective radius, effective variance, and complex refractive index are then retrieved. The coarse mode aerosol is assumed to consist of nonabsorbing spherical particles with complex refractive index equal to that of water, except the real part was multiplied by a factor of 1.01. The maximum allowed wind speed for the one-dimensional 10 Cox-Munk ocean was increased to 12 m/s, to allow retrieval of high wind speeds consistent with MERRA2 profiles.
Since for RSP the fine and coarse modes are retrieved separately, the total AOD is thus the sum of the fine plus coarse mode optical depths. Due to the assumption that coarse mode aerosol is nonabsorbing sea salt, the AAOD and Ångström exponent values are provided for the fine mode only, to allow for more direct comparisons to the smoke properties retrieved by the remote sensors above clouds.

SSFR
The Solar Spectral Flux Radiometer (SSFR) is a moderate resolution radiative flux (irradiance) spectrometer covering the wavelength range from 350 to 2100 nm Schmidt and Pilewskie, 2012). The downwelling (zenith) and upwelling (nadir) solar radiation is collected by light collectors mounted to the skin of the aircraft. In the past, aerosol absorption, SSA, and asymmetry parameter were derived from irradiance pairs collected along collocated horizontal legs above 20 and below the layer (Schmidt et al., 2010). For ORACLES, this approach was impractical because of the underlying albedo variability in the presence of clouds. The alternative is to measure the irradiances in a vertical profile, realized as a spiral.
This was not an option for previous experiments where the zenith light collector was fix-mounted to the aircraft, introducing uncertainties due to the changing aircraft attitude that cannot be corrected for after the fact. Specifically for ORACLES, an Active Leveling Platform (ALP) was built for the zenith light collector on the P-3 aircraft. By controlling its angular position, 25 SSFR is able to obtain zenith (downwelling) irradiance measurements throughout the vertical profile if the spirals include short (∼9-30s) straight segments (typically offset by 90 • in heading). This spiral profile maneuver with short straight legs is referred to as a square spiral. The nadir (upwelling) irradiance measurements are affected by the underlying cloud and its variability, as well as by the aerosol between the cloud top and the nadir light collector. To separate the aerosol signal from that of clouds, SSFR uses the upwelling irradiance at 1.6 µm where the signal is dominated by clouds and filter the data such 30 that only points within one standard deviation of the mean are included for final processing. The impact of the aerosol layer on the downwelling and upwelling irradiance is then quantified throughout the spiral by plotting spectral irradiance profiles with 4STAR-reported above-aircraft AOD at 532nm as the vertical coordinate. A linear fit is performed on both upwelling and downwelling irradiance, using AOD as vertical coordinate. The absorption is then derived from the difference of the net irradiance at the top and the bottom of the layer (Cochrane et al., 2019). Since this approach uses data throughout the vertical profile, it is a more robust and accurate method than obtaining it just from the irradiance pairs above and below the layer as in a radiation wall. More importantly for measurements above clouds, this method minimizes the impact of cloud variability on the sampling of upwelling irradiances through the filtering approach described above. Any deviations from a linear relationship between AOD and irradiances are attributable to changes in the underlying cloud albedo. The filtering 5 technique allows separation of these influences from those originating from the aerosol layer. SSA is retrieved with an algorithm that iteratively changes SSA and asymmetry parameter until the modeled irradiance profiles (based on 4STAR AOD and SSFRderived cloud albedo) match the measurements (Schmidt et al., 2010). This SSA retrieval is done independently for each wavelength, without applying spectral smoothness constraints. Uncertainties in SSFR SSA as reported here reflect the 1-sigma uncertainty as calculated from the probability of the SSA and asymmetry parameter pair within the retrieval. A description of 10 the algorithm and the uncertainty analysis may be found in (Cochrane et al., 2019).

Instrument intercomparison conditions
Comparison cases were selected based on the available instrument data and the flight path for a given day. Identified cases included at least two of the following conditions to facilitate comparison between instruments: 1. Operational nephelometer plus PSAP and/or PTI in situ data during an aircraft profile (either a ramp or a square spiral 15 as described in Section 2.1.6); 2. ≥1 sky scan (4STAR) at or below the bottom of the plume (i.e. measuring the total-column aerosol properties); 3. a square-spiral profile through the full column (SSFR); 4. an ER-2 overpass of the P-3 location (RSP and/or AirMSPI).
Due to the different flight patterns necessary for the different instruments to measure/retrieve aerosol properties, this yielded 20 24 potential comparison case studies over 12 (out of 14) flight days ( Figure 2). Within this set of cases, temporal separation between measurements from different instruments varies between 10 minutes and 2 hours; the inter-measurement spatial spread was within approximately 1 degree in either latitude or longitude (ideally 100km or less, but up to 130km in select cases). Note that not all instruments were available for each comparison (Table 2). In selecting comparison periods, spatial coincidence was given priority over temporal coincidence. All comparison cases included at least one 4STAR sky scan; this was reduced to 20 25 cases when we required QC'd sky scans below 3km (considered to be below the bulk of the aerosol plume), 19 of which were coincident with an in-situ profile. Of the 24 cases, 14 had full profiles and 9 had partial profiles only, for a total of 23 of the 24 with in situ observations of SSA.
Below we focus in on two specific case studies, on 12 and 20 September 2016. The case on September 12 (blue star in Figure   2) was a mostly cloudless scene and included RSP retrievals from an ER-2 overpass. For the case on September 20 (orange 30 star in Figure 2), valid SSFR retrievals were run using data from a P-3 square spiral maneuver through a cloudy scene overlaid with substantial aerosol loading. Each of the twelve ER-2 flights (including transits to/from Namibia) yielded AirMSPI ACA vs AirMSPI 9 -6 3 vs in situ 19 9 -retrievals somewhere over the SEA; nine of the P-3 comparison cases included ER-2 overpasses, and each of these had at least one AirMSPI ACA retrieval co-located with P-3 observations. Three of these cases (including the case from 20 September discussed in Section 3) had retrievals with the highest confidence (labeled "primary" in Table 2); retrievals were run for an additional 6 cases, which are included in Section 4 to allow for broader comparison. However, these second-tier retrievals have somewhat increased potential for retrieval biases, due to small scattering angle coverage and/or broken cloud conditions.

5
Successful above-cloud aerosol retrievals for data from the RSP have been processed for one of the cases thus far (the case study on 12 September). While the present work is thus limited to a subset of the 24 cases, future comparative analysis may be able to expand the number of cases to incorporate potential newly-available data (Table 1).
It is important to note that due to the different instrument methodologies, exact spatiotemporally coincident measurements are not possible, if for no reason other than the different viewing geometries alone; the comparisons presented here are chosen 10 for their potential to obtain measurements of reasonably similar aerosol properties from different perspectives (e.g. belowversus above-aerosol remote sensing, and remotely sensed versus in situ; Figure 3). In our analysis, we first present the two individual case studies (Section 3.1) before discussing results from the aggregation of all coincident measurements from three of the instruments (Section 3.2).

Multi-instrument case studies
We next show two specific case studies, from flights on 12 September and 20 September 2016, where 6 of the methods can be compared.

5
The P-3 flight on 12 September was a radiation-targeted flight of opportunity, with two potential comparison cases identified.
Both cases are included in the analysis in Section 3.2, but in this section we focus on the second case, at approximately 18.0 • S, 8.0 • E, between 13:44 and 14:34 UTC (Figure 2). This case starts with two consecutive 4STAR sky scans at approximately 1 km altitude (above-cloud-level; also referred to below as "plume only" within this section, for reasons which are explained below) over a broken cloud scene of albedo of 0.1, followed by a short descent and two scans at 80m, which is below typical 10 cloud level but was, in this case, in a cloudless area (scene albedo approximately 0.05). The P-3 then flew a ramped ascent from 80m to 5.8 km, ending above the top of the aerosol plume. Measured relative humidity (RH) throughout the plume increased from near 0% below the plume level (up to ∼2.5km) to a maximum of around 50% at plume top (5km), or generally below the RH=40% "dry" threshold for the in situ instrumentation. This corresponds to a roughly constant water vapor mixing ratio: around 5000ppmv (3.1g/kg) through the plume. The first pair of sky scans (at 1km altitude) are separated from the second pair 15 (at 80m, at the base of the ramp) by approximately 20 minutes and 130 km. While this is slightly outside our desired spatial constraints which may give slightly poorer scene agreement for this comparison case, we believe it instructive to retain the two above-boundary-layer sky scans in examining this case. First, this facilitates better comparison with RSP (which retrieves above-cloud fine-mode aerosol separately) and second, since the bulk of the ORACLES-2016 data consists of plume-only aerosol without boundary-layer influence, the inclusion of these scans allows for better contextualization of this specific case.

20
The ER-2 overpass of the scene, occurring during the P-3's ascending ramp, resulted in 20 RSP retrievals between 14:17 and 14:21 UTC. There were no AirMSPI retrievals during this period. Figure 4 shows the SSA for each of the available instruments for this case. The most notable feature is in the 4STAR sky scans; while the retrievals agree quite well at short wavelengths (within ∼0.01 and well within 4STAR's SSA uncertainty range determined by AOD and radiance uncertainties, as described in Section 2.1.1), at the longer wavelengths the two sets of scans 25 diverge markedly. It may be noted that the uncertainty increases as well at these longer wavelengths, and there is overlap in the two sets of uncertainty estimates. The two sky scans performed near the surface (at 80m altitude), immediately before the start of the ascending ramp, agree well with one another and give a higher long-wavelength SSA than the two scans performed above the boundary layer (at 1km altitude). However, the 1km scans are more representative of the typical SSA spectral shape observed from 4STAR for the ORACLES-2016 campaign as a whole (i.e., monotonically decreasing for wavelengths ≥500nm).
in these altitudes, suggesting purely scattering sea salt. This is additionally corroborated by the HSRL-2 retrievals of aerosol type, which identified marine aerosol below the smoke plume around this time. The AERONET climatology of aerosol from "desert dust and oceanic" sites (Bahrain/Persian Gulf, Cabo Verde, and Lanai, HI sites) by Dubovik et al. (2002) shows high, largely spectrally-flat or slightly increasing SSA between 440nm and 1020nm. Indeed, LeBlanc et al. (2019) also found lower extinction Ångström exponents for "full column" (i.e. below cloud level) 4STAR AOD measurements than for those taken 5 above the boundary layer, which is consistent with the differences seen in this case ( Figure 5).
In situ measurements show a similar spectral SSA compared with the other two instruments. We note that if we apply the wavelength-specific, rather than wavelength-invariant, Virkkula corrections to the PSAP data, the PSAP+Neph SSA shows a different spectral shape, with a small maximum in SSA at 530nm (Appendix A1). It is also important to note that the in situ measurements will exclude a significant portion of the coarse-mode aerosol due to poor inlet passing efficiency of larger aerosol 10 particles (a 50% size cut around ∼4 microns). This means that nominally there are twice as many 4µm particles observed by remote sensing instruments compared with in situ, and perhaps 10 times as many at larger sizes (10 -20 µm), and thus even "full column" in situ values may be missing larger aerosol. Thus, the SSA and AEs derived from the in situ instruments will have a greater contribution from the biomass burning aerosol than will the values retrieved from 4STAR (and in the later comparison case, AirMSPI and SSFR), which include all ambient aerosol (as noted above, RSP models fine and coarse mode 15 aerosol separately). The differences in SSA values are also within the instrument uncertainty (4STAR) and variability (in situ) ranges. Figure 5 shows the AAOD and AOD for this same case. Of the measurements from 4STAR, the two 80m cases have slightly higher AOD than the 1km measurements at 400nm (0.45 versus 0.42; AOD uncertainty ±0.018), but significantly higher AOD at 995nm (0.18 versus 0.11; uncertainty ±0.02). Thus, the 80m values have markedly lower Ångström exponents-a difference 20 of 0.4 for both SAE and EAE. This is again consistent with a significant coarse-mode component in the aerosol between 80m and 1km (i.e. the boundary-layer aerosol) -for this region, very likely sea salt. Comparison with the 1km 4STAR sky scans is additionally instructive for this case as due to the very low absorption data (and consequent low signal-to-noise) in the boundary layer, the SSA values were not reported for these altitudes; thus the in situ values in Figure 4 are effectively averaged over only plume altitudes, contributing to the lower "AOD" from in situ reported here. Given that RSP and 4STAR both measure the 25 full column (including this coarse mode), we expect the retrieved column SSA to be somewhat higher than that from the measurements/retrievals of SSA for the aerosol for the BB plume altitudes only; this could be contributing to the higher SSA from RSP and to a lesser degree from 4STAR as compared with in situ instrumentation (Figure 4). Based on this we might also expect the 4STAR retrieved SSA to be higher for the retrievals from 80m than from 1km, and this is not apparent except at wavelengths 870nm and longer. However, this could potentially be explained simply by a smaller relative contribution of 30 boundary-layer aerosol to the total aerosol loading, particularly at the shorter wavelengths (i.e. the smaller difference between the two sets of sky scans at shorter wavelengths).
The AAE and EAE are given by the slopes of the AAOD and AOD versus wavelength, and the SAE can be inferred from the two ( Figure 5). In comparing with the other instruments in this case study, the 1km altitude 4STAR EAEs and SAEs are more comparable to those derived from the in situ instruments. The 4STAR SAE and EAE values from 80m (including sea salt) are 35 lower than the corresponding RSP Ångström exponents of AOD fine+coarse . The 4STAR 1km values (plume only) agree with the RSP AOD fine only for mid-visible values and the two diverge for wavelengths greater than 700nm, which may somewhat be expected given the lower signal at longer wavelengths. While RSP observes slightly higher AOD (compared with 4STAR) at shorter wavelengths, at the longer wavelengths the RSP AOD is less than the 80m values (i.e. 4STAR measurements with additional coarse mode aerosol), but still greater than the 1km values. The in situ AOD proxy is again using dried values for 5 consistency with the SSA values shown. The in situ is notably lower than the remote-sensing methods, as may be expected: while this profile extended above the plume top height (as seen by HSRL-2), thus including the bulk of the aerosol, the altitude limitations may give integrated AODs which are slightly lower than the remote sensing instruments. In addition to this, the drying of the in situ scattering and absorption will also reduce the calculated optical depths, though for this case, the plume RH is largely less than 40%, with a brief maximum of 49% at plume top. As such, we expect the effect of humidification on 10 the scattering to be less than 5% (based on the dry/wet nephelometer data), which has an effect of increasing the in situ AODs by 0.01-0.02. The lower in situ AOD values are likely primarily due to the inlet limitations described above; these instruments are likely missing larger particles seen by the remote sensing instruments.  Figure 5. Spectral a) AAOD and b) AOD from 4STAR, RSP, and the in situ calculations for the case on 12 September 2016, as well as the AAE, SAE, and EAE values (derived from the slopes in log-log space of AAOD and AOD). RSP EAE and SAE include fine+coarse mode AOD, whereas RSP AAE are for fine mode only. Note that the in situ vertical profile for this case extended from 80m to 5.8km and may have undersampled the coarse-mode sea salt at lower altitudes due to inlet efficiency, whereas the two remote sensing instruments give full-column values above the P-3 (4STAR) to top-of-atmosphere, or below the ER-2 (RSP), extending to the surface. For the given 4STAR Ångström exponents, the first set refer to the sky scans from 1km, and the second refer to the scans from 80m, immediately before the aircraft ascent.

Case study: 20 September 2016 (cloudy scene)
The flight on September 20 (P-3 RF11) was focused on measuring atmospheric radiation with two parallel N-S flight lines along 9 • E and 10.5 • E. We again identify two cases which are suitable for the instrument comparison (one on each longitude line, 10.5 • E and 9 • E). Here we focus on the second of these two cases (P-3 RF11.2), which allowed for comparison of SSA measured by the in-situ instruments and 4STAR on the P-3 with retrievals of SSA from both AirMSPI (on the ER-2) and SSFR 5 (on the P-3) (as with RF08, both cases are included in the analysis in Section 3.2). This case was centered at approximately 16.7 • S, 9 • E and began at 10:45 UTC with a partial profile descent (ramp) from plume level (4.3km) to above-cloud (600m), followed by six sky scans at above-cloud/below-plume altitude between 10:59 and 11:08 UTC, with below-aircraft scene albedo between 0.45 to 0.62. Of these, two scans passed QC (the others were excluded due to high solar elevation on almucantar scans and/or high error in the retrieval results). After the above-cloud leg, there was one ascending full-profile square spiral maneuver saw dust during this profile, and indicated that <10% of scattering was due to dust. For this case, the above-BL RH was somewhat greater and also more varied with altitude compared with the case from 12 September: between 10% and 80% through the atmospheric profile (water vapor mixing ratios ranging from 3500 to 13500 ppmv, or 2.2 to 8.4g/kg), though RH remained below 40% except for the plume maximum between 4.2km and 5.6km. Again we note that while the remote-sensing (4STAR, AirMSPI, and SSFR) SSAs are measured at ambient humidity, the in-situ (PSAP+Neph and PTI+Neph) values are 20 for aerosol dried to RH< 40%. While this difference was minimal for the first case (12 September) due to the generally lower RH, it has the potential to be higher for the second case (20 September). We discuss the implications of aerosol humidification in more detail in Section 4.2. Figure 6 shows the spectral SSA from each instrument available for this case. We first note the general agreement between most instruments (within the stated uncertainties). While 4STAR and AirMSPI both show SSA decreasing at longer 25 wavelengths, 4STAR reports slightly lower SSA values (particularly at longer wavelengths) than AirMSPI. This is a common feature seen in most of the comparison cases (e.g. Figures 8a-c). The in-situ observations have less wavelength range than 4STAR and AirMSPI, but again show a decrease in SSA at its longer wavelength. The rate of decrease from 530nm to 660nm is seen to be greater than from 470nm to 530nm (this is also seen in the Ångström exponents calculated from 2-wavelength pairs, discussed in Section 3.2). As in Figure 4, the wavelength-specific Ångström exponents give a different SSA spectral 30 shape with a small maximum at 530nm; this artifact ( Figure A2) was more pronounced here than in the case in Figure 4, possibly due to the greater aerosol loading in this case.
As was discussed in Section 2.1.4, the 2016 PTI absorption data are considered more suspect than the other instrument retrievals, and thus we exclude the PTI+Neph-derived SSA from this case to focus our analysis on the more robust methods.
We briefly note that PTI+Neph SSA 532nm for the available cases (including this one) is higher than the PSAP+Neph SSA.
This result is expected in light of the lower reported absorption by the PTI ( Figure A3). Additionally, due to limited PTI data at lower (below-plume) altitudes, the values going into the average value are likely biased towards the higher altitudes of the plume itself. This is relevant as the PSAP+Neph vertically-resolved data consistently show SSA values which were higher at higher altitudes, and thus this sampling pattern of the PTI (i.e. preferentially at higher-SSA altitudes) may also contribute 5 somewhat to the PTI's higher SSA relative to the column-average extinction-weighted PSAP+Neph. The vertical structure of SSA, its spatial patterns, and relation to other ORACLES data are not explicitly discussed here, but will be the focus of future papers.
The SSFR retrieval largely agrees with 4STAR, AirMSPI, and the in-situ data (again within the given instrument uncertainties) for mid-visible wavelengths, though it reports lower SSA for the shorter (<440nm) wavelengths. The spectral shape is 10 somewhat more spectrally flat but is consistent for all retrieved wavelengths to within the instrument uncertainty. Note that the retrieval at each individual wavelength is performed separately for SSFR. This is in contrast to 4STAR, AirMSPI, and RSP which involve assumptions about aerosol size distributions and particle shape. In addition, for these instruments the spectral refractive indices are retrieved by fitting all wavelengths simultaneously: a spectral smoothness constraint is imposed on the real and imaginary parts of aerosol refractive index to improve the retrievals. For 4STAR, this means that the inversion algo-15 rithm may converge to a different result based on small perturbations in the wavelength-dependent AOD and radiance values used as inputs; this range is what we attempt to encapsulate within the stated uncertainty bars. For AirMSPI, the polarized radiances measured in the 470, 660, and 865 nm bands have less sensitivity to aerosol refractive index and coarse mode aerosol size than to aerosol loading and cloud microphysical properties. As a result, the AirMSPI retrieval accuracy of refractive index and coarse mode aerosol size distribution are subjected to greater measurement errors than the AOD and cloud microphysical 20 part of the retrieval, which potentially leads to errors with SSA (cf. the error bars with SSA). In contrast, the SSFR retrieval could be considered a somewhat more "direct" derivation of SSA, in that it retrieves SSA directly and individually for each wavelength on the basis of absorbed irradiance and 4STAR-measured AODs alone, with minimal constraints and assumptions on aerosol (e.g. size distribution, shape, and mixing state). At the same time, we note that the wavelength-dependence of SSA and asymmetry parameters which are retrieved from the SSFR measurements are not necessarily consistent with a physically 25 realizable microphysical model, which may well be a drawback in trying to generalize these retrievals for related radiative transfer calculations-though particularly in complex combined aerosol-cloud scenes such as these, the positives may outweigh the potential negative aspects of this retrieval method. Regardless of the potential uncertainties of each method, the fact that retrievals using such different measurement approaches agree so well is encouraging. Figure 7 shows the spectral optical depths for this same case, by instrument. Note that the AOD input to the SSFR retrieval 30 (AOD 400nm = 0.78) is obtained by 4STAR at the location of the square spiral, while the AOD reported as 4STAR ( Figure   7b) are from the times of the sky scans on the southbound leg directly preceding the spiral, as described above. The first scan (AOD 400nm = 1.05) was approximately 50 km from the square spiral location, which accounts for the difference in AOD between the two. The second scan (AOD 400nm = 1.24) is approximately 110 km from the spiral location (8 minutes after the previous scan in flight time). While this is on the higher end of our desired spatial spread, we show both scans here to show 35

4STAR
AirMSPI PSAP+Neph PTI+Neph Spectral shape of SSA, case of 0920   and PSAP absorption, and integrated over the vertical column. This case had a full profile from 367m to 7.1km (shown) and additional partial profiles which appear in Figure 6. Note that SSFR uses 4STAR AOD as input; this coincident AOD is labeled as SSFR for clarity; thus, the differences between these three AOD spectra are due to spatial variability in the total amount of aerosol loading, likely due to variations in plume top height (∼5.5-6km) as observed by the HSRL-2.
the good agreement in the spectral shape of the aerosol optical properties (i.e. spectral SSA and Ångström exponents, Figures   6 and 7) between the two retrievals even with differences in aerosol column loading (AOD and, consequently, AAOD) ( Figure   7). Indeed, the Ångström exponents for this case vary within only 0.2 for all five instruments, which is better agreement than the three instruments shown in the previous case. The outlier again is the 4STAR AAE values of approximately 0.9, lower than any other retrieval. This is discussed in more detail in the following section.

Campaign-wide ORACLES-2016 instrument comparisons
The two cases shown in the previous section are intended to give an idea of the range of observed cases in clear and cloudy skies, while allowing comparison with available RSP and SSFR retrievals. Next, we broaden scope to consider instrument comparisons for the campaign as a whole. We focus on in situ versus 4STAR comparisons, as these are the two methods with enough coincident measurements to allow for a statistical examination. We also include a more limited number of comparisons 5 using AirMSPI data where available. Figure 8 shows scatter plots of several parameters (SSA, EAE, SAE, and AAE) as measured by sets of two instrument pairs-4STAR versus in situ versus AirMSPI-across the full set of comparison cases. As one case may have multiple retrievals (i.e. multiple vertical profiles, sky scans, and/or sweeps), the case-average data are indicated by filled black markers. All data (i.e. multiple individual retrievals included in a comparison case) are shown as small x-marks, with grey lines indicating the 10 range of retrievals within a given case. The top row of Figure 8 shows scatter plots of SSA 530nm reported by 4STAR, in situ, and AirMSPI for the aggregation of all comparison cases where two of the three can be compared (Table 2). 4STAR SSA is generally lower than the in situ SSA (Figure 8a), and AirMSPI SSA is higher than both 4STAR or in situ (Figure 8b,c). While 4STAR and in situ generally track one another, the relationship is not statistically significant (R=0.23, p=0.33). Similar results are seen for EAE and SAE (Figure 8d-i), though the correlation in SAE between 4STAR and in situ is the only combination 15 that could be considered robust: R SAE =0.66 with p<0.01, whereas R EAE =0.44 and p=0.06.
It is interesting to note that the one outlier in Figure 8d and g (low 4STAR SAE and EAE) is the case from 12 September described earlier (Figs 4 and 5), where the two 4STAR sky scans at lower altitude (i.e. immediately before the in situ profile) are those which result in the very low AE values which depress the average for that case. As seen in Figs 4 and 5, there is no such divergence in SSA 530nm or AAE for this case. We also note that the correlation described above between 4STAR and in 20 situ EAE disappears when this outlier case is excluded.
In contrast to the other parameters, the 4STAR AAE shows, if anything, a negative relationship with in-situ-derived AAE ( Figure 8j), though this relationship is again not statistically significant (R AAE = −0.37, p=0.12). We additionally note that the 4STAR-reported AAE values are low in general, often <1, and are split into two populations, one in fairly good agreement with in situ (R=0.50, p=0.08) and one substantially offset to lower 4STAR AAEs relative to in situ, with a similar but less 25 robust correlation (R=0.59, p=0.22). As a whole, the full 4STAR AAE data set is negatively correlated with the in situ values.
The relationships between AAE from AirMSPI and either of the other two instruments (Figure 8k,l) are also not statistically significant, due to the small range in AAE values retrieved from AirMSPI. The variability in AAE for aerosol within a given case (date and location) is also often quite large; this, combined with the lack of correlation between instruments, may simply reflect a higher uncertainty in derived absorption Ångström exponents from all methods. 30 As the 4STAR retrievals frequently show AAE values less than 1, this bears some discussion. Notably, Bahadur et al. (2012) also derived very low values of AAE (as low as 0.55 for pure BC) using the same AERONET retrieval algorithm. While some previous observational studies have allowed for AAE values less than 1 (e.g., Bergstrom et al., 2007;Lack et al., 2008), these results may be due to measurement artifacts or instrument uncertainties. While pure BC is typically considered to have an AAE of 1, theoretical and practical studies have shown this to vary based on particle size (e.g., Lack and Cappa, 2010;Wang et al., 2016), and may be either higher or lower than 1, particularly when various coatings are applied (e.g., Gyawali et al., 2009).
However, brown carbon typically has AAE greater than that of BC, and the 4STAR AAE values are in the lower range of the values expected from theory even for pure BC (e.g., Schnaiter et al., 2005;Gyawali et al., 2009;Lack and Cappa, 2010). Thus, values of AAE for a mixed aerosol (smoke) that are less than 1 (for the wavelength range 440-675nm) may be considered 5 suspect in that they run counter to many observations of ambient-aerosol (i.e. black carbon plus brown carbon) AAE, including in this region (Bergstrom et al., 2007). As such, these results should be treated with caution. Another factor to consider is the compounding uncertainties (i.e., as the slope of another derived property) inherent in the derivation of AAE by any method. As  The first two panels (Figure 9a,b) show the dependence of the difference between AAE from in situ and 4STAR measurements as a function of AAE. This shows a fairly strong correlation between higher values of in situ AAE and greater differences (in situ minus 4STAR), with the expected opposite correlation observed for high 4STAR AAE versus AAE difference (R=0.76 20 and -0.89, respectively, both p<0.001). In other words, as AAE increases as measured by the in-situ observations, the difference between the in situ and 4STAR AAEs increases -and higher in situ AAE values correspond with lower 4STAR AAE values, resulting in a greater difference between the two (Figure 8j). The fact that the difference between the two is correlated with each suggests that this is not a clear case of one versus the other driving the large differences.
A weaker correlation of the same sign is observed for the difference in SAE and in situ-or 4STAR-measured SAE (R=0.48 25 and -0.57, respectively, both p<0.001, though the latter shrinks to R=-0.22, p=0.04 when removing the outlier shown) ( Figure   9d,e). While there is a weak (R=0.50) but significant correlation (p<0.001) between AAOD and the AAE difference between the two instruments (Figure 9c), there appears to be no correlation between total aerosol loading (4STAR AOD) and SAE ( Figure 9f). This suggests that total aerosol loading does not affect the instrument agreement for this parameter, but higher values of absorbing aerosol may bias one or both of the instruments; while the filter-based PSAP instruments have well-known 30 artifacts, there is also a weak negative correlation between AAOD 530nm and AAE 4STAR (R=-0.34, p=0.001).   4STAR AAE, and c) 4STAR AAOD. The bottom panel shows the same, but for SAE differences versus d) in situ SAE, e) 4STAR SAE, and f) 4STAR AOD. Note that multiple individual points for one instrument (profiles or sky scans) may be shown for comparison cases which included >1 sample per instrument (i.e. x-marks in Figure 8). This is an indication of the variability within a specific case. spread between the AAE calculated between the two shortest wavelengths (470 to 530nm; blue circles) and the two longest wavelengths (530 to 660nm; red triangles), with the former showing the largest difference between in situ and 4STAR as well as the largest values of in situ AAEs. The same is true for the in situ SAE, except with opposite sign: the SAEs calculated using the longest wavelengths have the highest values. The differences in AAE are largely positive regardless of the wavelengths used: in other words, in situ gives higher AAE than 4STAR, with the 4STAR AAE values anomalously low relative to previous 5 estimates of AAE, as discussed earlier. For SAE (and also EAE), the differences are more symmetrical, with negative values (4STAR greater than in situ) for the shortest wavelengths, and positive values (in situ greater than 4STAR) for the longest wavelengths. It is also worth noting that the difference between values for these two wavelength ranges (shorter minus longer) is between 0.1 and 0.4 for AAE and up to -0.5 for SAE from in situ, and up to -0.3 for AAE and SAE from 4STAR, which is a substantial range, given the range in instrument AEs over all the comparison cases were largely within 0.5 of one another. 10 The significant variability seen here between values simply calculated from different wavelengths again suggests caution not to overinterpret Ångström exponents, particularly those calculated using only two wavelengths.
In contrast to the AE results, there is no dependence in the differences in mid-visible SSAs between the in-situ measurements and 4STAR retrievals: for AOD 470nm > 0.4 and AAOD 470nm > 0.05 ( Figure 10) the differences are within ±0.03, within the expected uncertainties. At lower AOD/AAOD the differences are more pronounced, tending to higher values for 15 4STAR at lower loadings. This is consistent with the minimum AOD threshold value defined in the AERONET QC procedure (AOD 440nm > 0.4).

Campaign-wide measurements of SSA from multiple instruments
We now turn to the full set of SSA data from ORACLES-2016. Figure 11a shows campaign-wide averages of SSA for 4STAR, AirMSPI, and PSAP+Nephelometer. As these are campaign-wide values, they are not strictly comparable to one another; for example, the spatial coverage of the AirMSPI retrievals is larger than the other two instruments, due to the greater spatial range 5 of the ER-2 versus P-3 flights (Figure 11a, inset). Also, the AirMSPI data considered here are from 6 flight days, compared with 13 flight days for 4STAR and 14 days for the in situ measurements. In addition, the P-3-based measurements (PSAP+Neph and 4STAR) were able to sample more coastal aerosol, which may be more influenced by variability in local aerosol sources and thus composition (compared with far-from-coast flights which would sample the upper-level plume of more uniform origin).
Further discussion of the 4STAR-observed temporal and spatial variability of aerosol loading and size in ORACLES-2016 may  individual 4STAR sky scans is also higher at these longer wavelengths. In-situ data are not available at these wavelengths. Figure 11b shows the ORACLES spectral SSA values compared with those from previous studies, as presented in Figure   1. SSA from the three ORACLES measurements are within the range of previous observations of SSA, with the difference in 20 SSA between the SAFARI "fresh" versus "aged" plume  bounding these observations. It is important to note that this "fresh" plume is based on a single flight directly over a terrestrial emission source (13 September 2000), whereas the "aged" values are the mean values from the remaining 8 flights, ostensibly sampling aerosol at least 2-3 days old   Table 2). While the "aged" SAFARI values should be more comparable to the expected age of ORACLESsampled aerosol from oceanic overflights (compared with the "fresh" plume), in-field experience with aerosol model forecasts 25 indicated the age of ORACLES-sampled free-tropospheric aerosols were typically older than four days .
Both of these ORACLES and SAFARI values are based on in situ measurements (PSAP+Neph), whereas the SAFARI result shown in Russell et al. (2010) (and also described in Bergstrom et al., 2003) is from an SSFR-centered retrieval combined with data from a precursor to 4STAR, AATS-14 (the Ames Airborne Tracking Sunphotometer at 14 wavelengths) similar to the retrieval used in this paper. The retrieval is from a radiation wall within a single flight (6 September 2000). By way of 30 comparison, flight-average SSA from the PSAP+Neph measurements on that same flight was given as 0.87, 0.86, and 0.83 at 450, 550, and 700 nm , somewhat lower than the "aged" average of 0.91, 0.90, and 0.87 and only 0.01 lower than the SSFR values for the same flight. In the same work, SSA was also derived using filter measurements integrated over PCASP size distributions; the PCASP-derived values for this particular flight are given as 0.92, 0.89, and 0.87, closer to the high end of the range. Thus, with the available data, it is difficult to definitively say whether the variability among previous measurement methods is predominantly due to systemic variability between different measurement methods, or whether it is purely a function of natural variability-from either seasonal changes in emission factors, evolution in aerosol properties after emission, or spatial variability such as the SSA dependence with altitude as seen in ORACLES. a) b) Figure 11. a) SSA from all measurements/retrievals over ORACLES-2016, from 4STAR, AirMSPI, and the nephelometer and PSAP insitu data. The campaign-wide median is indicated by the thick solid line, 25-75th percentiles are indicated by thin solid lines, and 10-90 percentiles are indicated by the dashed lines. Note that the medians of the different instruments are not strictly comparable, as the campaignwide averages are calculated using more data than just the coincident cases. The inset shows the spatial distribution (lat/lon) of measurements by the three instruments. b) The median SSA by instrument as shown in a) compared with SSA from previous studies in this same region ( Figure 1).

Impacts of humidification on aerosols
In this work we have presented SSA as calculated from in situ measurements of scattering (from a nephelometer) and absorption (from a PSAP and a PTI); all are for aerosol dried to RH< 40%. It is well-known that an increase in RH will result in an increase in scattering, which, if it is the sole consequence of the humidification, will act to increase the SSA. However, humidification of aerosol will impact both scattering and absorption, likely not to the same degree, and thus will affect the resulting SSA, 5 potentially in competing directions. For this reason, we have presented in situ measurements as measured (i.e. dried) in this present work.
Aerosol absorption is likely affected by humidification, but the magnitude of this effect is much less well understood, due in large part to the difficulty in measuring humidified absorption directly (e.g., Arnott et al., 2003). However, several studies have attempted to address this question. One theory-based study using a BC core/shell model (Redemann et al., 2001) found that 10 in terms of the aerosol absorption, the absorption enhancement at RH=80% was approximately a factor of 1.1, resulting in a decrease in SSA of 0.02 at 550nm. They also found that the degree to which humidification enhanced absorption is dependent on the aerosol size distribution, with absorption enhancement as high as a factor of 1.75 for certain size and humidity conditions (up to 99.5%). In a later lab-based study measuring total extinction and nephelometer-measured scattering, Brem et al. (2012) found enhancement of absorption from biomass-based OC aerosol to be significant at 467 and 530nm for RH> 85% (the 15 660nm data were within the instrument noise). Between 32% and 95% RH, the absorption at 467 and 530nm increased by factors of 2.2±0.7 and 2.7±1.2, greater than that found by Redemann et al. (2001); combined with the observed enhancement in scattering, this corresponded to a change in SSA on the order of 0.06 for 470nm and 0.03 for 530nm, though the authors acknowledge that their method is subject to large uncertainties. We further note that the majority of studies considering the effect of humidification on aerosol absorption (and scattering) consider the more extreme values of RH≥ 85% or higher, which 20 was a rare occurrence in the BB plume sampled in ORACLES-2016 (observed plume RH was a maximum of 80% and often lower, as described in earlier sections). The magnitude of these effects are thus likely to be some factor smaller than the values reported in the literature. The unequal enhancement factors at different wavelengths suggest that the absorption Ångström exponents would also be affected, adding to this uncertainty in this parameter. As the effects of humidification on absorption have been seen to depend on aerosol age and composition (i.e., hydrophobic vs hydrophilic; coated vs uncoated aerosol) (e.g., can estimate how much it might be affecting scattering, and bound the upper limit on SSA accordingly (i.e. assuming zero effect on absorption). Based on a campaign-average scattering enhancement of 1.4 for a plume RH of 80%, and making the coarse assumption of no effect of humidity on absorption, we would expect the maximum impact of typical humidification on scattering to increase instantaneous SSA (at 530nm) by a maximum of 0.03-0.05. However, in reality, this value will be lower, first due to the variability of RH with altitude as described in Section 3.1 which will have a lesser impact at lower altitudes and 5 thus on the column-averaged values considered here; and second due to the competing impacts of humidification on absorption as described above, which likely have the opposite effect of lowering SSA. Due to the high uncertainties surrounding these competing effects, we leave it to a future work to provide more a detailed quantification of humidification effects on aerosol absorption and the SEA biomass burning SSA.

10
In this work we present new measurements of absorbing aerosol optical properties over the southeast Atlantic Ocean, a region with a significant and persistent seasonal biomass burning plume overlying stratocumulus clouds and which, up to now, has had a dearth of observations. For specific comparison cases, the retrievals from remote sensing (4STAR, AirMSPI, RSP, and SSFR) and in situ (derived from PSAP, Nephelometer, and PTI) agree within given uncertainty ranges, though with some indications of systematic differences between the different methods. Specifically, the modified AERONET retrievals applied to 4STAR 15 data typically produce the lowest SSA, while the AirMSPI polarized retrieval generally yields the highest SSA. There are a number of potential causes behind these patterns, including the different information content in the measurements from different instruments (e.g., the AirMSPI polarized radiances are less sensitive to aerosol refractive index versus aerosol loading), viewing geometry (e.g., upward-looking versus downward-looking observations may have different information using transmitted versus reflected light), retrieval assumptions (e.g. aerosol model used, a priori assumptions, smoothness constraints), as 20 well as simply the degree of co-incidence which was achievable for these comparison cases. Correlations between individual instruments over an aggregate of cases (using between 9 and 19 available comparison cases for different instrument pairs) were not significant in most cases, with the exception of a weak positive correlation between 4STAR-and in situ-derived SAE).
AAE is the least certain of the retrieved absorption properties, as it shows a weak, yet negative correlation between the two instruments considered, with a significant portion of the data reporting AAE values less than 1. Again, we can only specu-25 late as to the causes of the differences; each retrieval comes with a unique set of a priori assumptions and slightly different measurement inputs, which may explain the small range in AAE values from AirMSPI, and the generally lower AAEs from 4STAR. Finally, we note that this work represents a starting point for many subsequent ORACLES-based analyses. These will include a more detailed comparison of the methodological differences between different instrument algorithms; a discussion of the SSA spatial variability and its potential causes; and an exploration of the effect of humidification on aerosol absorption. 30 In terms of the ORACLES-2016 dataset as a whole, we find median SSAs from 4STAR to be 0.87, 0.85, 0.82, 0.79, and 0.78 at 400, 500, 675, 870, and 995nm; from AirMSPI to be 0.88, 0.87, and 0.84 at the retrieved wavelengths of 470, 660, and 865nm, and from in situ measurements to be 0.86, 0.86, and 0.84 at 470, 530, and 660nm. Campaign-wide data variability (5th-95th percentiles) are roughly equivalent for 4STAR and in situ measurements in the mid-visible (±0.03), and is greater at the longer 4STAR wavelengths (±0.05 at 870nm). The AirMSPI data exhibit less variability at all wavelengths (±0.015 at 470nm and ±0.02 at 865nm). While these are not directly comparable to one another due to differences in spatial and temporal sampling of the different instruments, they give an indication of the best estimate and range of biomass burning SSAs over the southeast Atlantic Ocean. The range of SSA values reported between different instruments during ORACLES is consistent with 5 the range reported among previous observational studies over this region, but slightly higher than those reported downwind on Ascension Island (Zuidema et al., 2018). In Section 1.2, we discussed the radiative forcing implications of SSA; finally, we note that the range of SSAs observed from each instrument is of the magnitude expected to change local direct radiative effects by 10-20W/m 2 (SSA±0.03), which may give an indication of the impact of the results. Any studies which rely on a prescribed set of aerosol properties -such as SSA -as input, should thus consider a realistic spatiotemporal distribution 10 of aerosol optical properties in order to best capture the reality of the aerosol conditions over this region. It is important to take into account the impacts of the spatial variability and uncertainty from a given instrument, as they may affect resulting determinations of radiative effects of biomass burning aerosols.
Appendix A: Additional discussion of in situ measurements

A1 PSAP Virkkula corrections
As discussed in the above text, the PSAP absorption corrections were applied as given in Virkkula (2010). Corrections of this nature must be applied due to the limitations inherent in filter-based measurements. Virkkula (2010) gives both wavelengthspecific (470, 530, and 660nm) corrections, and a wavelength-averaged correction (his Table 1). We choose the latter due to the 5 discovery during the LASIC campaign (Zuidema et al., 2018) that using wavelength-specific values resulted in an unphysical jump in AAE upon changing of the filters. The difference between the absorption coefficients, and the resulting SSA, for the two methods is shown in Figure A1. Note that the wavelength-specific values generally show SSA 470 < SSA 530 due to higher reported absorption at 470nm. Figure A2 highlights the difference in wavelength-averaged versus wavelength-specific SSA for the individual profiles in the 20 September case, as discussed in the text. Note that for the analysis below in Section A2, 10 the Virkkula corrections likely do not contribute to the observed differences between PTI and PSAP, as the absorption at the 530nm wavelength is less impacted by the choice of correction factor.

A2 Comparison of absorption from in situ instruments
As ORACLES-2016 had two measures of aerosol absorption available, we think it useful to discuss these two in-situ instruments as they compare to one another, though with an eye to the technical difficulties of the PTI in that year's deployment.
15 Figure A3 shows 30-second averages of absorption from the PTI and from the PSAP for all available coincident data for the 5 flights with the PTI operational (flights on September 10,12,14,20,and 24). The effect of the high noise floor applied to the reported PTI values is evident in the data variability, and the PTI absorption is seen to be generally lower than PSAP absorption over all days, consequently with a smaller dynamic range: 5th-95th percentiles of 20.3 to 28.8 Mm −1 (median 22.8 Mm −1 ) for PTI compared with 21.8 to 40.9 Mm −1 (median 26.6 Mm −1 ) for the coincident PSAP measurements. The generally lower 20 PTI absorption could be due to the mechanical difficulties in the 2016 deployment, as was described in Section 2.1.4. From a physical perspective, the difference between the two instruments may additionally be enhanced by an artificially-low-biased PTI signal driven by latent heat of evaporation under high-humidity aerosol conditions. On the part of the PSAP measurement, the multiple scattering from this filter-based measurement could potentially give an artificially-high-biased PSAP signal. Improvements in the PTI instrument design for the 2018 deployment will likely facilitate better comparative analysis between the 25 two measurement techniques at a future date.

Wavelength-specific Virkkula
Average Virkkula Figure A1. Normalized nephelometer scattering (left), PSAP absorption (center), and resulting SSA (right) for 470, 530, and 660nm, using wavelength-specific (top) and wavelength-averaged (bottom) Virkkula corrections. Values are normalized to highlight the relative spectral shape. Blue squares indicated the median of all observations (black). Note that while the scattering and absorption both exhibit an overall decrease with wavelength, the much sharper decrease in absorption results in a small maximum in SSA at 530nm under the standard wavelength-dependent Virkkula corrections applied to the PSAP. This spectral feature disappears in the average Virkkula case. Figure A2. Profile-averaged extinction-weighted SSA for the case study shown on 20 September 2016. The impact on the 470nm SSA (and consequently, the spectral shape) is the most notable difference. We attribute this spectral shape to the stronger increase in aerosol absorption at the shortest wavelength-while scattering also increases, it exhibits a smaller wavelength dependence than does absorption ( Figure A1).  Figure A3. Comparison between PTI vs PSAP aerosol absorption coefficient (at 532 and 530nm, respectively) for all available flight data.
Lines show a 1:1 relationship (red dashed) and a total-least-squares fit (green dashed) through the data. The slope of the green line is 0.445 with an intercept of 10.9. There was no discernible distinction in the fits or correlations between the two instruments either between in-plume versus profile, or by altitude.