AEROCOM/AEROSAT AAOT & SSA study, part I: evaluation and intercomparison of satellite measurements

. Global measurements of absorptive aerosol optical depth (AAOD) are scarce and mostly provided by the ground network AERONET (AErosol RObotic NETwork). In recent years, several satellite products of AAOD have appeared. This study’s primary aim is

model and measurement and a regularization term using a priori estimates of values of some of the retrieved parameters. The algorithm, including an application to PARASOL measurements over ocean, is described in Hasekamp et al. (2011). More recent refinements are described by Stap et al. (2015); Wu et al. (2015); Lacagnina et al. (2015); Fu and Hasekamp (2018); Fu et al. (2020). Retrieval results from the SRON algorithm have been used for aerosol type determination by Russell et al. (2014), in studies related to aerosol absorption and direct radiative effect by Lacagnina et al. (2015Lacagnina et al. ( , 2017, and aerosol-cloud interactions by Hasekamp et al. (2019). Currently, the algorithm has been applied to one year (2006) of global aerosol data. GRASP (Generalized Retrieval of Aerosol and Surface Properties) is a unified retrieval algorithm for atmosphere properties from diverse remote sensing observations (Dubovik et al., 2011(Dubovik et al., , 2014, based on earlier work by Dubovik and King (2000); Dubovik et al. (2002Dubovik et al. ( , 2006 for AERONET Inversions. 155 In the current paper, retrievals from the so-called "models" dataset (here: GRASP-M) are presented. Aerosol is assumed an external mixture of five different aerosol components and are retrieved together with spectral parameters of surface BRDF and BPDF. The aerosol is assumed a mixture of spherical and non-spherical particles. Each fraction is characterized by particle size distributions similarly to AERONET retrievals. The non-spherical component is modeled as a mixture of randomly oriented spheroids with fixed shape distribution (Dubovik et al., 2006). The actual inversion uses multi-pixel retrieval (Dubovik et al.,160 2011) where horizontal pixel-to-pixel variations of aerosol and day-to-day variations of surface reflectance are enforced to be smooth.
The full archive of POLDER/PARASOL observations was retrieved using GRASP and can be found at https://www.graspopen.com. In addition to the "models" dataset, two other datasets are available ("improved" and "high-precision") that use slightly different assumptions in the retrieval. The dataset used in this paper is considered the most applicable for a wide range 165 of circumstances.
Another issue is that POLDER-GRASP-M provides aggregate AOD and SSA for slightly different samplings (there is an additional minimum AOD threshold for the calculation of the AAOD that will be aggregated and the resulting aggregated SSA).
We assumed that this SSA nevertheless represents the same scene as the AOD aggregate and recalculated an AAOD from that AOD and SSA. Consequently, The POLDER-GRASP-M AAOD presented in this paper is different from the AAOD found in 170 the official L3 product. The latter shows a high bias vs. AERONET due to same aforementioned minimum AOD threshold.
Note that in-situ measurements (Delene and Ogren, 2002;Andrews et al., 2011Andrews et al., , 2017Schmeisser et al., 2018) have suggested a change in SSA at lower AOD so our SSA assumption may introduce additional biases.
For this study the L3 GRASP data were filtered based on the FittingResidual field which was required to be smaller than 0.05 (over Land) or 0.1 (over Ocean). This subset evaluates substantially better for AOD retrievals and somewhat better for 175 AAOD retrievals than the full dataset.

AERONET
AERONET (Holben et al., 1998) DirectSun V3 L2.0 (Giles et al., 2019Smirnov et al., 2000) and Inversion V3 L1.5 & 2.0 data were downloaded from https://aeronet.gsfc.nasa.gov, logarithmically interpolated to values at 550 nm and aggregated by averaging over 30 minutes. The DirectSun dataset contains only AOD (at multiple wavelengths). These obser-180 vations are based on direct transmission measurements of solar light and have a low uncertainty of ±0.01 (Eck et al., 1999;Schmid et al., 1999), at 400nm and larger. The Inversion dataset contains both AOD and AAOD (at multiple wavelengths) and these observations are based on measurements of scattered solar light from multiple directions. This inversion uses radiative transfer calculations (Dubovik and King, 2000) and yields larger errors than the DirectSun measurements. In particular, Dubovik et al. (2000) showed that SSA errors decrease with increasing AOD and estimated 440nm SSA errors of ±0.03 for 185 water-soluble aerosol at 440nm AOD ≥ 0.2 although for dust and biomass burning aerosol higher AOD ≥ 0.5 were needed.
These error estimates were based on numerical calculations. A recent in-depth estimate of the uncertainty in Inversion V3 data (Sinyuk et al., 2020) suggested those thresholds to be 440nm AOD > 0. Since an individual AERONET site cannot be expected to be representative for a 1 o × 1 o grid-box, satellite evaluation may be negatively affected. To select only sites with high representativity we use a list published in Kinne et al. (2013) as described in Schutgens et al. (2020), where we also describe some tests for its suitability (based on 14 satellite AOD products). The Kinne list was developed with the AERONET Direct Sun product (i.e. AOD) in mind but a high-resolution modelling study by 200 Schutgens (2019) suggests that representativity for AOD and AAOD observations can differ substantially for individual sites.
We chose to use the Kinne list because it also includes information on maintenance quality, likely more important for Inversion than Direct Sun retrievals.
2.1.6 How independent are these satellite products?
An interesting question is how independent these satellite products are. The FL-MOC product uses OMAERUV AAOD as input 205 over land, while the POLDER products share very similar treatment of surface reflectance. It should be noted that FL-MOC only uses OMAERUV AAOD as an a-priori with sizeable uncertainty. CALIOP backscatter is expected to provide a constraint on SSA. As a matter of fact, our analysis shows that FL-MOC and OMAERUV exhibit rather low correlation in either AAOD 7 https://doi.org/10.5194/acp-2020-1207 Preprint. Discussion started: 7 December 2020 c Author(s) 2020. CC BY 4.0 License. or SSA. This suggests that the OMAERUV a-priori does not lead to a strong dependency of FL-MOC on OMAERUV. On the other hand, it also suggests that at least one of these products contains sizeable errors.

210
The POLDER retrievals use the same mathematical function for the BRDF over land (Litvinov et al., 2011) but estimate the parameters to this function independently.

Collocation & analysis methodology
To evaluate and intercompare the remote sensing datasets, they will need to be collocated in time and space to reduce representation errors (Colarco et al., 2014;Schutgens et al., 2016bSchutgens et al., , 2017. In practice this collocation is another aggregation (performed 215 for each dataset individually) to a spatio-temporal grid with slightly coarser temporal resolution (1 or 3 hours, the spatial gridbox size remains 1 o × 1 o ). This is followed by a masking operation that retains only aggregated data if it exists in the same grid-boxes for all involved datasets. More details can be found in Appendix A.
We need to allow some flexibility in the time separation between data (here 3 hours) to ensure sufficient numbers of collocated data pairs for further analysis. Schutgens et al. (2020) showed that shorter time separations greatly limited the number 220 of pairs but did not substantially alter the correlation of satellite AOD with AERONET. On the other hand, longer time separations appear to negatively affect the correlation of satellite AAOD with AERONET, see Fig. 2. The analysis shows that satellite AOD correlation with AERONET Inversion data slowly decreases as the collocation criterium is relaxed from 3 to 24 hours. However, satellite AAOD shows a sharp drop in correlation with AERONET at 6 hours (OMAERUV is the exception, the correlation is low and barely changes). We surmise this is due to plumes of absorbing aerosol drifting over the sites, requiring 225 tight temporal constraints on collocation. Consequences of this finding will be further discussed in Sect. 7.
As the FL-MOC dataset, based on CALIOP measurements, is smaller than the other satellite datasets, we were compelled to collocate FL-MOC with AERONET within 2 o instead of 1 o . Even so, the data count for the FL-MOC evaluation is low and this results in significant statistical noise.
After spatio-temporally collocating two or more datasets, the data may be further averaged in space and/or time for analysis 230 purposes. Spatio-temporally averaged SSA is always derived from averaged AOD & AAOD: During the evaluation of products with AERONET, a distinction will be made between either land or ocean grid-boxes in the common grid. A high resolution land mask was used to determine which 1 o × 1 o grid-box contained at most 30% land (designated an ocean box) or water (designated a land box). Most ocean boxes with AERONET observations will be in coastal 235 regions, with some over isolated islands.

Taylor diagrams
A suitable graphic for displaying multiple datasets' correspondence with a reference dataset ('truth'), is provided by the Taylor diagram (Taylor, 2001). In this polar plot, each data point (r, φ) shows basic statistical metrics for an entire dataset. The distance 8 https://doi.org/10.5194/acp-2020-1207 Preprint. Discussion started: 7 December 2020 c Author(s) 2020. CC BY 4.0 License.
from the origin (r) represents the internal variability (standard deviation) in the dataset. The angle φ through which the data point is rotated away from the horizontal axis represents the correlation with the reference dataset, which is conceptually located on the horizontal axis at radius 1 (i.e. every distance is normalised to the internal variability of the reference dataset). It can be shown (Taylor, 2001) that the distance between the point (r, φ) and this reference data point at (1, 0) is a measure of the Root Mean Square Error (RMSE, unbiased). A line extending from the point (r, φ) is used to show the bias versus the reference dataset (positive for pointing clock-wise).The distance from the end of this line to the reference data point is a measure of the 245 Root Mean Square Difference (RMSD, no correction for bias).

Uncertainty analysis using bootstrapping
Our estimates of error metrics are inherently uncertain due to finite sampling. If the sampled error distribution is sufficiently similar to the underlying true error distribution, bootstrapping (Efron, 1979) can be used to assess uncertainties in e.g. biases or correlations due to finite sample size. Bootstrapping uses the sampled distribution to generate a large number of synthetic sam-250 ples by random draws with replacement. For each of these synthetic samples, a bias etc. can be calculated and the distribution of these biases provides measures of the uncertainty, e.g. a standard deviation, in the bias due to statistical noise. Bootstrapping has been shown to be reliable even for relatively small sample sizes (that is the size of the original sample, not the number of bootstraps), see Chernick (2008). In this study, the uncertainty bars in some figures were generated by bootstrap analysis.
If the sampled error distribution is different from the true error distribution, bootstrapping will likely underestimate uncer-255 tainties. Sampled error distributions may be different from the true error distribution because the act of collocating satellite and AERONET data favours certain conditions. E.g. the effective combination of two cloud screening algorithms (one for the satellite product, the other for AERONET) may favour clear sky conditions and limit sampling of errors in case of cloud contamination. This uncertainty due to sampling is unfortunately hard to assess (see e.g. Schutgens et al. (2020)).
As an example of uncertainty due to sampling, we present Fig. 3 in which an evaluation of the current satellite AOD data with 260 Inversion L2.0 data shows substantial shifts compared to Direct Sun L2.0. As the uncertainty ranges indicate, the changes in biases are not due to statistical noise. Neither is this due to differences in collocated DirectSun and Inversion L2.0 AOD values, that agree very well. Rather, the issue is that AERONET Inversion data are an unrepresentative subsample of the DirectSun data. It is unclear what this means for the AAOD and SSA evaluation but readers should be aware of this unaccounted-for sampling issue that may introduce biases.

4 A first look at the satellite products
Multi-year averages of satellite AAOD and their differences are shown in Fig. 4. The AAOD maps can only be compared with some caution, as they are derived from products with different temporal sampling. The differences, on the other hand, are based on collocated data and confirm major features. The products all agree on a major AAOD hotspot from (likely) African Savannah biomass burning. Three products agree on known polluted regions like India and China also being AAOD hotspots 270 (OMAERUV, which is relatively featureless, is the exception). POLDER-GRASP-M and OMAERUV show a clear AAOD hotspot due to Amazonian biomass burning. POLDER-GRASP-M estimates relatively high values over land, and the ocean at high northern latitudes. OMAERUV shows relatively low AAOD over land but high over the entire ocean. FL-MOC clearly estimates higher AAOD over the Sahara than either POLDER-GRASP-M or OMAERUV. POLDER-SRON estimates relatively high AAOD over the Rocky mountains, the Andes and Australia. Unfortunately, even in multi-year averages significant 275 differences in regional AAOD between the products are observed, in excess of 50%. Figure S1 shows the corresponding SSA maps. As expected, POLDER-GRASP-M has relatively low SSA and OMAERUV relatively high SSA over land. FL-MOC has the highest SSA over ocean of all products.
One caveat is that AAOD and SSA retrievals are likely to be better (more accurate and precise) at high AOD. In the above analysis, no account was taken of AOD levels and the products were discussed as they are. The impact of AOD will later be 280 discussed.

Evaluation of satellite products with AERONET
Taylor plots of the performance of the satellite products are shown in Fig Correlations for AAOD and SSA are lower than for AOD suggesting that it is more challenging to retrieve absorptive qualities.
Interestingly, POLDER-SRON's SSA correlates significantly better with AERONET than POLDER-GRASP-M's but this is a sampling effect: once both products are collocated, POLDER-GRASP-M's SSA correlation with AERONET increases from 0.41 to 0.69.

290
The impact of statistical noise on the AAOD evaluation is explored in Fig. 6. Using a bootstrapping technique, the spread in correlation and standard deviation were explored. For most datasets, the results seem fairly robust, except for FL-MOC which uses only 24 data points. A proper intercomparison of products requires collocation (of all the satellite data), which reduces available cases even further. Figure S2 shows that results are not very different from a sense of perspective, 53 data points represents less than 0.0008% of the total POLDER-GRASP-M data amount used in this paper.

Evaluation and intercomparison of AOD
In Fig. 7, we provide more detail on the satellite AOD products and their evaluation against AERONET Direct Sun L2.0 AOD.
In the central column, we show the products themselves, averaged over several years. Note that the products exist for different 300 years and even within the same years have different samplings so comparisons should be made with caution. In the left and right column, we show satellite data collocated with AERONET. On the left-hand side is a scatterplot of the raw data (with associated statistics provided) and on the right-hand side is a map of multi-year difference with AERONET (provided at least 32 data points were available per site).
The scatter plots show good correlation with AERONET. The POLDER products show higher correlations and slopes closer 305 to 1 (one) than FL-MOC and OMAERUV. Nevertheless, differences in evaluation seem rather small, which unfortunately cannot be said for the global distributions of AOD. POLDER-GRASP-M has rather high AOD over land and OMAERUV has rather high AOD over ocean (note that the satellite data themselves are not collocated). The multi-year differences with AERONET suggest that OMAERUV overestimates everywhere except in some regions with strongly absorbing aerosol. An intercomparison of satellite AOD with Aqua-DT is presented in Fig. S3 and suggests typically higher estimates over (Southern

310
Hemisphere) Land for the POLDER products and over Ocean for OMAERUV. Note that Aqua-DT is not without significant regional biases, see Schutgens et al. (2020). The right-most column in Fig. 9 shows SSA difference as a function of (AERONET) AOD. To ensure the largest possible 330 range in AOD values Inversion L1.5 instead of L2.0 is used. Especially at lower AOD, this dataset will have larger errors in AAOD and SSA than L2.0. Interestingly, as AOD increases, all satellite products seem to agree better with AERONET (for FL-MOC, the bin with largest AOD values is affected by a very low data count). This is of course as one would expect. For smaller AOD, there is increasingly more spread although the difference distribution remains fairly unbiased. The exception is POLDER-GRASP-M which shows increasingly lower SSA than AERONET at low AOD. We suggest that it is rather unlikely 335 that three different satellite products have a similar SSA bias at low AOD as AERONET (and hence show no bias in the difference with AERONET) and that this low bias in POLDER-GRASP-M analysis is real. However, a better understanding of the nature of errors (bias vs. random) in AERONET SSA at low AOD is desirable.
Summarizing, there is skill in satellite AAOD and SSA but compared to AOD the correlations with AERONET are substantially lower. POLDER-SRON is the exception, with similar and fairly high correlations (∼ 0.75) for all three parameters.

340
However, it seems to underestimate AAOD by ∼ 25% at high AAOD (slope of 0.76 in the AAOD scatter plot). OMAERUV appears to show the largest deviations from AERONET (low correlations and slopes) but its overall error statistics (mean and standard deviation) is not too different from the other products. Results for FL-MOC may be a statistical fluke due to the low data count. POLDER-GRASP-M shows quite high correlations for AOD (0.86) and AAOD (0.6) with reasonable slopes but has a very low correlation with AERONET for SSA (0.41), but this seems to depend strongly on sampling as discussed at the 345 start of this Section. In addition, it appears to systematically underestimate SSA at low AOD. Yet another aspect to this dataset (not visible in any of the analysis shown) is that it appears to have hard cut-off as SSA values larger than 0.99 do not occur.
A profound problem is the paucity of data. Even for POLDER-GRASP-M, we can only evaluate its performance (against AERONET) for less than 0.006% the total number of available observations. Is this sufficient to make meaningful statements about the performance of a product at large? In Schutgens et al. 2019a, we showed that the process of collocation can skew 350 error statistics (by changing the sampling) to the point that it becomes hard to meaningfully distinguish performance of several products. That study was done for AOD which allows much higher numbers of collocated data with AERONET than AAOD.
To elucidate this, we compare the difference in SSA between the two POLDER products (collocated within 3 hours) for three different samplings. First, we look at global POLDER SSA statistics. Secondly, we look at POLDER SSA statistics over AERONET sites only. Thirdly, we look at POLDER SSA statistics that are collocated with AERONET observations. Figure 10 355 shows the associated difference distributions. Using various non-parametric statistical tests (Mann-Whitney U, Student's t, Kolmogorov-Smirnov) we can show that the distribution means for the first and third sampling are fundamentally different.
Not only that, but the mean difference in SSA for the first sampling is 2.6 as large (-0.043 vs. -0.017) as for the third sampling (and is statistically significant). As POLDER-SRON is biased high and POLDER-GRASP-M is biased low vs AERONET, the corrollary to this is of course that at least one of the products has a larger bias vs the truth globally than can be seen 360 in the AERONET observations. Conversely this suggests that the AERONET Inversion dataset does not allow a truly global evaluation of satellite datasets: it provides a sub-sample with skewed statistics of SSA. Incidentally, it is the temporal subsampling enforced by collocation with AERONET observations that causes the largest shift in the difference distribution (POLDER measurements over AERONET sites show a similar SSA distribution as the global dataset). It is possible that the SSA difference is partly driven by cloud contamination which we know is present in these satellite datasets (Schutgens et al.,365 2020) and may be ameliorated when a third cloud masking (from AERONET) is applied (through the collocation of data).

Intercomparison of satellite AAOD and SSA
To get a better appreciation of the satellite products, we now present a global intercomparison. To start with, Fig. 11 shows SSA differences between two products as a function of their mean AOD. As in Fig. 9, these differences become smaller (i.e. show still exhibit random differences of 0.03 or larger for AOD 1, as also confirmed by the AERONET evaluation. In addition, substantial biases remain. The previous analysis was global but substantial differences can be seen between land and ocean scenes. For instance, the SSA bias between the POLDER products over land, does not decrease at lower AOD but remains fairly constant. A more detailed analysis can be found in Fig. 12 which shows biases, correlations and regression slopes for different products. Un-375 surprisingly, correlations and slopes tend to improve with minimum AOD, while biases may remain fairly constant (POLDER products), decrease (OMAERUV vs POLDER-GRASP-M) or even increase (FL-MOC). As a consequence it should be challenging to determine an AOD threshold above which products can be expected to perform within certain parameters. A similar analysis for AAOD can be found in Fig. S4.
A final analysis concerns multi-year averages of these products. Model evaluation will be done on such averages and it may 380 be useful to better understand the agreement (or lack thereof) between products in that case, even though the aforementioned biases are unlikely to be much reduced. Figure 13 shows an intercomparison of three products (FL-MOC is excluded due to its low data count). The analysis shows statistics of the intercomparison of multi-year averages of SSA, as a function of two thresholds: a minimum AOD and a minimum number of super-observations during three years (per 1 o × 1 o grid-box).
The underlying super-observations were always collocated (to within 3 hours) before temporal averaging took place. We see 385 that in general correlations increase and standard deviation in the difference decrease when either threshold increases. The improvement with increasing AOD has already been discussed and is due to better signal-to-noise conditions for the retrieval schemes. The improvement with increasing number of observations (used in the temporal averaging) can be interpreted as a significant random error in either product being lessened through averaging. In general, the AOD threshold has a more profound impact but the number of observations threshold allows more flexibility (by choosing a longer time-series to work with, smaller 390 SSA differences (up to a point!) may be achieved).
However, biases between products can be quite robust as is particularly clear for the POLDER products. The decreasing bias for OMAERUV vs. POLDER-SRON (and, incidentally, the sudden jump in correlation for AOD > 0.4) is not really a sign of a better agreement between products at high AOD. Under these conditions, most observations come from the African dust and biomass burning regions. POLDER-SRON retrieves very reflective dust and very absorptive biomass burning aerosol 395 while OMAERUV retrieves fairly reflective dust and fairly absorptive biomass burning aerosol. Consequently, global SSA bias decreases due to a balancing of very different biases over these regions while similar spatial patterns yield high correlations.
Maps of the SSA difference between the POLDER products as a function of minimum AOD can be seen in Fig. S5. A higher minimum AOD mostly constrains data to a smaller portion of the globe but does not affect local biases greatly.

Appendix A: Generic aggregation and collocation
The aggregation of satellite L2 products into super-observations in this paper, and the subsequent collocation of different 720 datasets for intercomparison and evaluation used the following scheme.
Assume a homogenous L2 dataset with times and geo-locations and observations of AOD. Homogenous means that AOD and AAOD are available for the same times, geo-locations and wavelengths. Each observation has a known spatio-temporal foot-print, e.g. in the case of satellite L2 retrievals that would be the L2 retrieved pixel size and the short amount of time (less than a second) needed for the original measurement. Satellite L2 data are aggregated into super-observations as follows. A regular spatio-temporal grid is defined as in Fig. A1.
The spatio-temporal size of the grid-boxes (here 30 min × 1 o × 1 o ) exceeds that of the footprint of the L2 data that will be aggregated. All observations are assigned to a spatio-temporal grid-box according to their times and geo-locations. Once all observations have been assigned, observations are averaged by grid-box. It is possible to require a minimum number of observations to calculate an average. Finally, all grid-boxes that contain observations are used to construct a list of super-730 observations as in Fig. A2. Only times and geo-locations with aggregated observations are retained. As the original L2 dataset was homogeneous, so is the resulting L3 dataset.
Station data is similarly aggregated over 30 min ×1 o ×1 o . Point observations will suffer from spatial representativeness issues (Sayer et al., 2010;Virtanen et al., 2018;Schutgens et al., 2016a), but the representativity of AERONET sites for 1 o × 1 o gridboxes is fairly well understood (Schutgens, 2019), see also Section 2.1.5. These aggregated L3 AERONET and MAN data will 735 also be called super-observations. Different datasets of super-observations can be collocated in a very similar way. Again a regular spatio-temporal grid is defined as in Fig. A1 but now with grid-boxes of larger temporal extent (typically 3 hr × 1 o × 1 o ). Because this temporal extent is short compared to satellite revisit times, either a single satellite super-observation or none is assigned to each grid-box. A single AERONET site however may contribute up to 6 super-observations per grid-box (in which case they are averaged).

740
After two or more datasets are thus aggregated individually, only grid-boxes that contain data for both datasets will be used locations and are called collocated datasets. By choosing a larger temporal extent of the grid-box, the collocation criterion can be relaxed.
As the super-observations are on a regular spatio-temporal grid and collocation requires further aggregation to another 745 regular but coarser, grid, the whole procedure is very fast. It is possible to collocate all 7 products from afternoon platforms over three years using an IDL (Interactive Data Language) code (that served as a prototype for CIS) and a single processing core in just 30 minutes. This greatly facilitates sensitivity studies.
Starting from super-observations, a 3-year average can easily be constructed by once more performing an aggregation operation but now with a grid-box of 3 yr × 1 o × 1 o . If two collocated datasets are aggregated in this fashion, their 3-year average can 750 be compared with minimal representation errors. This allows us to construct global maps of e.g. multi-year AOD difference between two sets of super-observations.       Figure 11. Difference in satellite product SSA as a function of AOD (averaged over both products). Two vertical axes are used: the left-hand side is used for individual data points (sub-sampled), the right-hand axis is used for the grey-scale distribution (9, 25, 50, 75, 91% quantiles) and the median difference (blue line). Data were collocated within 3 hours. 34 https://doi.org/10.5194/acp-2020-1207 Preprint. Discussion started: 7 December 2020 c Author(s) 2020. CC BY 4.0 License. Figure A1. A regular spatio-temporal grid in time, longitude and latitude. Such a grid is used for the aggregation operation that is at the heart of the collocation procedure used in this paper. Grid-boxes may either contain data or be empty. Note that data may refer to any combination of observations, e.g. AOD at multiple wavelengths or AOD and AAOD at 550 nm. However, the dataset is homogenous. Reproduced from Watson-Parris et al. (2016). Figure A2. A list of data. Such a list is the primary data format used for both observations and model data in this paper. Note that data may refer to any combination of observations, e.g. AOD at multiple wavelengths or AOD and AAOD at 550 nm. However, the dataset is homogenous. Reproduced from Watson-Parris et al. (2016).