Site representativity of AERONET and GAW remotely sensed AOT and AAOT observations

Abstract. Remote sensing observations from the AERONET (AErosol RObotic NETwork) and GAW (Global Atmosphere Watch) networks are intermittent in time and have a limited field-of-view. A global high-resolution simulation (GEOS5 Nature Run) is used to conduct an Observing System Simulation Experiment (OSSE) for AERONET and GAW observations of AOT (Aerosol Optical Thickness) and AAOT (Absorbing Aerosol Optical Thickness) and estimate the spatio-temporal representativity of individual sites for larger areas (from 0.5 to 4 in size). 5 GEOS5 NR and the OSSE are evaluated and shown to have sufficient skill, although daily AAOT variability is significantly underestimated while the frequency of AAOT observations is over-estimated (both resulting in an under-estimation of temporal representativity errors in AAOT). Yearly representation errors are provided for a host of scenarios: varying grid-box size, temporal collocation protocols, and site altitudes are explored. Monthly representation errors are shown to correlate strongly throughout the year, with a pronounced 10 annual cycle. The collocation protocol for AEROCOM (AEROsol Comparisons between Observations and Models) model evaluation (using daily data) is shown to be sub-optimal and the use of hourly data advocated instead. A previous subjective ranking of site spatial representativity (Kinne et al., 2013) is analysed and a new objective ranking proposed. Several sites are shown to have yearly representation errors in excess of 40%. Lastly, a recent suggestion (Wang et al., 2018) that AERONET observations of AAOT suffer a positive representation bias 15 of 30% globally is analysed and evidence is provided that this bias is likely an overestimate (the current paper finds 4%) due methodological choices.


Introduction
As the temporal sampling of observations is often intermittent and their field-of-view limited, the ability of observations to represent the weather or climate system is negatively affected (Nappo et al., 1982).This adverse effect can be described through a representation error, which allows comparison to e.g.observational errors or model errors.
Section 2 describes the high-resolution simulation data and AERONET observations used in this study.The OSSE for estimating representation errors is briefly explained in Sect. 3 but more details can be found in S17.An evaluation of the highresolution simulation with a particular focus on its use in an OSSE is given in Sect. 4. While the simulation shows deviations from AERONET observations, the agreement is deemed sufficient to study representation errors.Representation errors in AERONET AOT & AAOT are studied in Sect. 5. A ranking of AERONET sites in terms of their representativity is given in Sect.6.As may be expected, the paper finishes with a summary of the conclusions (Sect.7).

GEOS-5 Nature Run
The GEOS-5 Nature Run (G5NR here-after) is a 2-year global, non-hydrostatic simulation from June 2005 to May 2007 at a 0.0625 o grid-resolution (∼ 7 km near the equator).Not just a simulation of standard meteorological parameters (wind, temperature, moisture, surface pressure), G5NR includes tracers for common aerosol species (dust, seasalt, sulfate, black and organic carbon) and several trace gases: O 3 , CO and CO 2 .The simulation is driven by prescribed sea-surface temperature and sea-ice, daily volcanic and biomass burning emissions, as well as monthly high-resolution inventories of anthropogenic sources (Putman et al., 2014).
Aerosol in GEOS-5 are calculated using the Goddard Chemistry, Aerosol, Radiation, and Transport (GOCART) module (Chin et al., 2002) that uses 15 tracers to describe externally mixed species of organic carbon, black carbon, sulphate, sea-salt and dust.Biomass burning emissions are obtained from QFED (Quick Fire Emissions Dataset) (Suarez et al., 2013) with a diurnal cycle imposed online.Anthropogenic emissions of organic and black carbon use EDGAR-HTAP (Emissions Database for Global Atmospheric Research-Hemispheric Transport of Air Pollution) emissions (Janssens-maenhout et al., 2012) which were rescaled to match AEROCOM Phase II emissions.Non-shipping anthropogenic SO 2 emissions come from EDGAR v4.1.
Evaluation (Gelaro et al., 2015) against NASA/GMAO MERRA (Modern-Era Retrospective analysis for Research and Applications) Aerosol Reanalysis (da Silva et al., 2012) suggest that global organic carbon, black carbon and sulphate AOT are underestimated by 30 − 40% while dust AOT is overestimated by ∼ 50%.Global sea-salt AOT is similar to MERRA within 10%.(Note that Castellanos et al. (2019) derived global rescaling factors for aerosol speciated AOD in G5NR.How such scaling factors will affect AAOD is unknown.True scaling factors are unlikely to be global, and representation errors in this paper are relative anyway.In this paper the original, i.e. not rescaled, model data will be used).Comparison with AEROCOM models shows that G5NR sulphate life-times are quite low (at 2.7 days) while the other species fairly agree with the AEROCOM multi-model mean.Clouds in G5NR show reasonable cloud fractions compared to CERES-SSF (Clouds and the Earth's Radiant Energy System-Single Scanner Footprints), although in the equatorial/sub-tropical region (30S-30N), G5NR has a deficit of partially cloudy scenes.In addition there are too few clouds off western continental coasts and the southern branch of the ITCZ is too strong.CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization) data suggests G5NR cloud fraction are too low, especially over equatorial/sub-tropical lands in the Northern Hemisphere, and too high in the northern polar region.
For this study, the following hourly G5NR data for 2006 were obtained: see Table 1.averaged over an hour.For all years starting in 1992, geolocation data were obtained for all sites (1144 in total).

GAW geolocations
GAW geolocation data were obtained from NILU (Norwegian Institute for Air Research).Two networks were used: the GAW-AOT network which comprises 29 sun-tracking sphotometers that measure AOT; and the GAW-ABS network which comprises 81 surface-based filter instruments.While GAW-ABS is not capable to provide (A)AOT measurements, here we will assume a potential remotely sensed columnar product, similar to AERONET (A)AOT, and consider its representation errors.

Method: analysis of representation errors
The representation error is defined as the difference between a perfect observation (i.e.no observational error) and a truth value (area average).Here, a self-consistent high resolution simulation will be used to generate both observation and truth (a so-called OSSE), as was first described in S16b and extended in S17.The representation error may refer to instantaneous values or time averages.This work concerns itself mostly with yearly averages (and some monthly averages).For instantaneous and daily error values, see S16b and S17.The mapping from G5NR data to the data used in this study is given in Table 2.
Perfect observations are generated from the high-resolution simulation by choosing the data at the location of an AERONET or GAW site and sub-sample those data in time according to certain conditions for solar zenith angles (SZA), cloud-fraction and AOT.The truth is generated from the high-resolution simulation by averaging AOT and AAOT over a large area (0.5 o to 4 o gridboxes) and further averaging in time.Here we should distinguish three different protocols depending on how one intends to use the observations, see Table 4.In the case of a gridded climatology derived from observations, the truth should be an average over a continuous long-term time range (say a year).In the case of model evaluation, it is possible to resample model data to the times of the observations.E.g. within the AEROCOM community, a daily collocation protocol is often used, where daily model data is used for days with observations only (irrespective of the temporal sampling of those observations throughout the day).To assess representation errors in this case, the truth needs to be sampled accordingly to days with observations before yearly averages are determined.The same protocols were also explored in S17.

10
The current methodology differs slightly from S17 in that: 1. a different model is used to construct the OSSE, 2. previously, SZA was assumed to be sufficiently high for a fixed fraction of the day (10 hours).In the current work, SZA is calculated from downward-welling TOA SW radiation and will vary with geo-location and time-of-day,  3. previously, the truth was generated for grid-boxes centered on the observations.In the current work, those grid-boxes are assumed regularly spaced from 0 o to 360 o longitude and −90 o to 90 o latitude.The AERONET and GAW sites can be located anywhere within those grid-boxes (at their real geo-location), 4. previously, the high-resolution simulation had a constant grid-size of (about) 10 km.In the current work, the grid-size varies but has a constant angular size of 0.0625 o (∼ 7 km at the equator).
The last point implies that the simulation grid-box used for the observation decreases towards zero as we approach the poles.
Since this is clearly undesirable (field-of-view will remain on the order of several kilometers), we will limit our analysis to latitudes below 60 o .
Our methodology allows separation of the factors that determine the representation error: spatial extent of the grid-box, and observational intermittency due to low SZA, high cloud-fraction or low AOT.We will not present such causal analysis in this paper (see S17 instead) but will refer to it to explain results.

Statistical parameters
To show the distributions of representation errors, box-whisker plots using the 2, 9, 25, 75, 91 and 98% quantiles will be used in this paper.For a normal distribution, these quantiles will be equally spaced.Any skewness or extended wings in a distribution will be readily visible.In addition to quantiles, the mean error and the mean sign-less error will be provided.The mean sign-less error (or mean absolute error) is deemed more relevant than the standard deviation as 1) it includes biases; 2) the errors are seldom normally distributed, and a standard deviation is very sensitive to larger errors ("out-liers").For a normal distribution with a mean of zero and a standard deviation of one, the mean sign-less error is ∼ 0.8.

Evaluation of G5NR and OSSE
In this section, G5NR is evaluated with real AERONET observations of AOT and AAOT, with special focus on its usefulness in an OSSE.As G5NR generates its own meteorology that deviates from 2006, one might expect differences between simulation as nicely with the observations but there is still correlation (∼ 0.45) (the evaluation of AAOT will of course be affected by large measurement errors).The agreement in standard deviation suggests that simulated and observed AOT and AAOT show similar temporal variation.But the global agreement also suggests that the simulation captures spatial variation rather well.
This is also true on shorter length scales, as an analysis by region shows in Table 5. Europe appears to be the exception but this is mostly due to a few southern sites.Removing them from the analysis, significantly increases correlation.This may be related to the overestimation of dust and underestimation of carbonaceous & sulphate aerosol in G5NR (Gelaro et al., 2015), which will affect north-south gradients in AOT in Europe.DRAGON (Distributed Regional Aerosol Gridded Observation Networks) campaigns might allow evaluation of the spatial distribution of simulated AOT at even smaller length-scales (10's of kilometers) but they did not start until 2010.(which now uses a minimum of 30 observations per site) yields a similar but slightly poorer result for G5NR, see Fig. 2, and over a shorter range of values.
Figure 3 shows mean values per site for the daily difference in maximum and minimum AOT.Again good agreement for simulated AOT is seen but AAOT compares rather poorly.However, it's correlation is still above 0.6 and it is clear that the simulation underestimates daily AAOT variation.AAOT measurement errors are not expected to have a big impact on daily variation (which is the difference between two measurements).
Figure 4 evaluates the OSSE and shows temporal coverage (or frequency of observation) per site as a function of latitude.
G5NR's simulated coverage is calculated using the limitations described in Table 3 (and explained later in Sec. 3).This coverage would be 100% if observations are available 24 hours a day, 365 days a year.In practice it cannot be higher than 50% due to the day-night cycle, and will be less due to cloudiness or low AOT.
The bimodal structure that is visible in both the simulation and observations is due to SZA variation (which reduces coverage towards the poles) and cloudiness (which reduces coverage near the equator).Simulated and real coverage per site are not expected to agree well due to meteorological differences.Still, the results suggests that the OSSE predicts similar frequency of Direct Sun observations as actually observed.
However, the OSSE also simulates more Inversion observations in the Northern hemisphere than actually occur.This suggests there are limiting factors in observational coverage that are not accounted for in Table 3.One factor is that real Inversion measurements are simply attempted less frequently (several times per day) than Direct Sun measurements (several times per hour).Other factors may include inversion failure at low SZA (real observations show that Inversion data generally have larger SZA than DirectSun data even though Inversion data is generally closer to the equator) and overestimation of dust AOT in G5NR (largest over-estimates of coverage occur for Sahara nd Saudi Arabia sites).Finally, instrument malfunction & maintenance are not taken in to account, which will explain some of the discrepancy.
In all, it seems that G5NR can realistically simulate spatial and temporal variation in AOT and AAOT, although there is some underestimation of daily AOT variation and significant underestimation of daily AAOT variation.G5NR can also be used to fairly realistically simulate frequency of observation (temporal coverage), although it will over-estimate for the Inversion products in the Northern Hemisphere.Further evidence for G5NR's applicability in an OSSE is given in Fig. 12 where it is shown that the present study agrees with an earlier analysis by Kinne et al. (2013) on the most representative AERONET sites.Results so far suggest that daily collocation is a significant improvement from yearly collocation.This is in contrast to S17 (Fig. 7) where the representation errors for daily and monthly collocation were found to be similar.Further analysis of the data suggests that the absence of diurnal (anthropogenic) emission profiles in G5NR may cause underestimation of daily collocation errors.

Representation errors in AOT
Representation errors for AOT do not differ much for the Direct Sun L2.0 and Inversion L1.5 products, see Fig. 9.However, the condition of a minimal AOT (>= 0.25) for valid observations causes large but unsurprising errors for the Inversion L2.0 product.This issue with the Inversion L2.0 data is well-known but the current analysis may be the first realistic estimate of incurred errors.Figure 9 also shows results for two sensitivity studies where observational coverage in the Northern Hemisphere was artificially lowered (see discussion in last paragraph of Sect.4) but this has no clear impact as temporal coverage is quite low anyway.
AERONET was not designed with representativity in mind but the GAW network was.Nevertheless, Fig. 10 suggests that GAW sites exhibit slightly larger representation errors than AERONET.In particular, GAW error statistics are strongly skewed to negative values.In the G5NR OSSE, GAW sites are located at higher altitudes and more often on isolated mountains than AERONET sites.A high altitude site on an isolated mountain will observe a shorter atmospheric column than the surrounding grid-box which will cause a negative representation error, see Fig. 11.Actual site altitudes vary from -410 to 5320 m.G5NR site altitudes correlate very well (r = 0.98) but tend to underestimate by 28 m on average, with a random error of 171 m.
Previous work by Kinne et al. (2013) ranked AERONET sites according to their representativity, see Table 6.This ranking is subjective in that it is non-quantitative, based on personal knowledge of the sites and only defines representativity in broad terms.The ranking is only available for sites that had at least 5 months of data before 2008.Using the methodology of this paper, representation errors were calculated for all sites of a certain ranking, see Fig.

Representation errors at monthly time-scales
Surprisingly, monthly representation errors are not that much larger than yearly errors, see Fig. 14.If monthly errors for the same site were independent and random, one would expect them to be ∼ √ 12 ≈ 3.5 larger than yearly errors but that is clearly not the case.As a matter of fact, monthly errors are strongly correlated from month to month, throughout the year, see Fig. 15.
The increase in correlation with January after September, is probably due to yearly cycles in meteorology and emissions and very likely to be a realistic aspect of representation errors.The implication of this is that multi-year averages may not reduce representation errors as strongly as one would hope.
This analysis also provokes the question whether representation errors (per site) should be seen as mostly biases or random errors with strong correlations (see also Schwarz et al. (2018)).Our preliminary analysis suggests that at the monthly scale, both cases can occur.Figure 16 shows both maximum and minimum monthly errors by site as a function of yearly error.Many sites show large variations in monthly representation errors, but significantly reduced yearly errors, suggesting that the errors are essentially random.However, some sites also show very similar monthly maxima and minima, and yearly errors, suggesting that these errors are better interpreted as biases.Further discusion of this can be found in Sect.6.

Representation errors in AAOT
Representation errors for Inversion L1.5 AAOT product are shown in Fig. 17 Strikingly, daily collocation yields very similar errors as hourly collocation.This is very likely due to a limitation in the OSSE.The daily variation of AAOT is strongly underestimated by G5NR (see Sect 4 and Fig. 3), possibly due to an absence of diurnal anthropogenic emission profiles.
Regionally, there is some variation in representation errors but not a lot, see Fig. ??.The exception is for the yearly collocation protocol which allows significant biases for sites in South America and Africa.This is related to the AOT criterium for valid observations and the dominant influence of episodic biomass burning for these two continents: outside the burning season much less observations are made.Consequently the observations will favour the absorbing biomass burning aerosol.
A comparison between AERONET and GAW, Fig. 19, shows error distributions that are positively skewed for AERONET and negatively skewed for GAW-AOT.The smaller bias for GAW than AERONET is due to a balancing of the positive bias due to the AOT criterium for valid observations and the negative bias due to site altitude (see also Fig 10 and its discussion).
An analysis of the impact of the site rankings by Kinne et al. (2013), shows similar results for AAOT as for AOT, see Fig 20

A ranking of representativity for the AERONET sites
A ranking of AERONET and GAW sites in terms of their spatial representativity for AOT and AAOT can be found at Schutgens (2019).Only sites below 60 o latitude are considered, and temporal sampling of observations is ignored.The latter was done for two reasons: 1) as discussed in Sect 2 and 4, temporal sampling of observations is considered less accurately modelled by the OSSE than spatial variability ; 2) both S17 and the current study show that once hourly collocation is used, the remaining representation error is similar although slightly larger than the spatial representation error.
Relative representation errors are classed according to bin boundaries 0%, 5%, 10%, 20%, 40% and up.Using a block bootstrap method on time-series per site, the uncertainty in yearly representation errors was assessed.Typically more than 85% of all resampled time-series yield a representation error in the same class as the original time-series.For large gridboxes (4 o ) and small errors (< 10%), this may drop down to 66% of the resampled time-series.In any case, yearly relative spatial representation errors can be classed robustly.Of course, G5NR and the OSSE are not perfect, which will introduce an uncertainty into the ranking that can not currently be assessed.
Compared to the subjective ranking by Kinne et al. (2013), the new ranking is objective because the rank is related to a well-defined representation error that is quantified bottom-up from known emission sources and calculated meteorology.That in itself is of course no guarantee for accuracy.
Inspection of the rankings turns up several interesting points.Analysis in the previous sections determined a few "rules" for the behaviour of representation errors (e.g.errors decrease as does grid-box size) but these can easily be "broken" for specific sites: a smaller grid-box may actually lead to larger representation errors (e.g.AOE_Baotou, Ascension_Island, Aras_de_los_Olmos), monthly errors may be substantially larger than yearly errors (e.g.ARM-Darwin, BORDEAUX).Also, representation errors for AOT and AAOT may be very different: Bayfordbury shows small yearly representation errors for AOT but large errors for AAOT, while Mace_Head shows the opposite.This work extends previous work on temporal representation with global low-resolution models (Schutgens et al., 2016b) to spatio-temporal representation.It also extends previous work on spatio-temporal representation with regional high-resolution simulations (Schutgens et al., 2016a(Schutgens et al., , 2017) ) to the global domain.The current work is more limited in scope than the previous studies and only considers ground-based remote sensing observations.For satellite remote sensing, see Schutgens et al. (2016b) and Schutgens et al. (2017).For in-situ measurements, see Schutgens et al. (2016a) (and Schutgens et al. (2017)).
G5NR and the OSSE are evaluated and found to show significant skill.AERONET mean AOT per site, as well as yearly and daily variability were estimated correctly within a factor less than 2×.Considering that G5NR generates its own meteorology, G5NR correlated very well (r ≈ 0.75) with the observations.Similarly, the OSSE was surprisingly good at simulating the overall pattern of observational coverage (frequency of AOT observation).Results were not as good for AAOT but still impressive.Yearly AAOT variability was slightly underestimated while daily AAOT variability was severely underestimated.
The latter is possibly related to the absence of diurnal anthropogenic emission profiles in G5NR.For representativity studies that take diurnal variations into account, see Schutgens et al. (2016aSchutgens et al. ( , 2017)).In addition, the OSSE tended to overestimate the frequency of AAOT observations per site (although this was shown to have no impact on representation errors).
Both yearly and monthly representation errors are provided for observations from ground sites that attempt to represent larger areas (from 0.5 o to 4 o in size).The monthly representation errors are shown to be strongly correlated throughout the year.For some sites this is an expression of a bias but that is not universally the case.In any case, monthly representation errors can not be treated as independent and this has (negative) consequences for the reduction of representation errors in multiyear averages.Other conclusions are: 1) AERONET derived climatologies allow for substantial representation errors (yearly collocation allows errors of typically 20% globally); 2) AEROCOM evaluation protocol is sub-optimal (daily collocation allows errors of typically 25% in coherent regional patterns).Instead hourly collocation was advocated.Also, the representativity of AERONET and GAW sites was shown to be not very different, although AERONET sites seem to be more affected by nearby sources while GAW sites seem more affected by their altitude.Finally, a subjective ranking (Kinne et al., 2013) of the spatial representativity of sites was analysed and shown to broadly agree with the current study, although it appears to overestimate represented spatial domain sizes and judges several sites as less representative than the current analysis.A new objective ranking is also presented.
https://doi.org/10.5194/acp-2019-767Preprint.Discussion started: 23 September 2019 c Author(s) 2019.CC BY 4.0 License.and observations.Simulated data were nevertheless collocated to the time of the observations (within the hour) to ensure the same temporal sampling throughout the days, the months and the year.The mean and standard deviation in AOT and AAOT per site are shown in Fig. 1, top row.In general, simulated AOT shows good agreement with the observations with correlations around 0.75 and slopes around 0.84.Simulated AAOT does not agree The top row of Fig 1 was created using only sites that provide a minimum of 100 real observation throughout 2006.The lower row shows how this criterium affects results.As the minimum number of observations per site increases, so do the correlations, probably due to a reduction in statistical noise (partly due to different simulated and actual meteorologies).But the overall bias also increases.This criterium selects for sites with lower cloudiness (higher number of observations) until predominantly northern African and Saudi Arabian sites are left for a minimum of 500 observations per site.The increase in bias is thus likely due to the overestimation of dust AOT that was mentioned earlier.Note that AAOT is here evaluated with L1.5 data.The L2.0 data have a minimum AOT threshold of ∼ 0.25 which results in less observations and less available sites overall.Although L1.5 is considered a less reliable product, the evaluation with L2.0 https://doi.org/10.5194/acp-2019-767Preprint.Discussion started: 23 September 2019 c Author(s) 2019.CC BY 4.0 License.

Figure 5
Figure5shows yearly representation errors for AERONET DirectSun L2.0 AOT observations, for three different collocation protocols (see Table4), as a function of model grid-box size.Hourly collocation yields the smallest representation errors, and this is more pronounced for smaller grid-box sizes.As grid-box size changes from 4 o to 0.5 o , hourly collocation errors are more than halved from 13% to 5% while those for daily collocation change only from 17% to 12% .By construction, hourly 12. For large grid boxes of 4 o (∼ 450 km near equator)), the impact of ranking on representation error is quite small.While there is a visually arresting change in the error distribution for r > 1 (wide flanks are changed into a broader center), the mean sign-less error barely changes.This suggests thatKinne et al. (2013) overestimated the size of the domains (>= 500 km for r > 1) for which their sites were representative.On the other hand, for a grid-box of 1 o a substantial reduction in representation error can be seen for r > 1 sites.However, this only occurs for the hourly collocation:Kinne et al. (2013) did not consider the temporal sampling of the observations which causes large representation errors.A new ranking of representativity will be introduced in Sect.6. Sofar the area averages used to calculate representation errors have been derived for the entire grid-box, both the clear and cloudy parts.Under certain circumstances, it may be more realistic to use only the clear part.Examples are the evaluation https://doi.org/10.5194/acp-2019-767Preprint.Discussion started: 23 September 2019 c Author(s) 2019.CC BY 4.0 License. of aggregated satellite products with AERONET (like AERONET, satellites can not observed aerosol when there are clouds), or the evaluation of certain models that explicitly calculate clear-sky AOT (usually by estimating clear-sky humidity from grid-box averaged humidity).Representation errors for clear-sky parts of grid-boxes are improved for the yearly and daily collocation protocols, see Fig. 13.
. As for AOT, representation errors decrease with decreasing grid-box sizes, although the decrease is small for for yearly collocation.Sgnificant positive biases can be seen for all protocols and large grid-box sizes.These biases are partly due to the AOT > 0.25 criterium for valid observations, which translates into an AAOT = (1 − SSA)AOT > 0.025 criterium for SSA = 0.9.However, other reasons for the positive bias are the proximity of AERONET sites to sources of absorbing aerosol and the impact of orography (e.g.seeSchutgens et al. (2017)) https://doi.org/10.5194/acp-2019-767Preprint.Discussion started: 23 September 2019 c Author(s) 2019.CC BY 4.0 License.and Sect.6. Unsurprisingly the hourly collocation protocol shows the smallest positive bias and reduces it faster for decreasing grid-box size. .

5. 4
Comparison to recent results from Wang et al. '18RecentlyWang et al. (2018) suggested that the observed underestimation of AAOT by AEROCOM models(Bond et al., 2013) may be due to spatial representation errors.Their analysis found that AERONET Inversion L1.5 AAOT representation errors exhibit a global bias of 30% for 2 o × 2 o model grid-boxes, which would help explain the aforementioned underestimation by the global models.As AERONET sites need to be serviceable, they are often found near roads and urban build-up, i.e. near sources of absorbing aerosol.Compared to the larger area of global model grid-boxes, these sites would quite naturally observe larger AAOT.Thus, Wang et al. (2018) concluded that at least part of the underestimation of modelled AAOT is an artefact, created by the location of the AERONET sites.Wang et al.'s idea is quite persuasive and indeed one can see evidence of such positive representation errors in Fig.21where sites in major cities like London, Paris, Madrid and Barcelona clearly exhibit positive representation errors.(For another example, see Fig.3bin S17 concerning surface black carbon concentrations).But Wang's study found such biases for the majority of AERONET sites, not just a few located in big cities.As a matter of fact, the current study shows no evidence of this global bias of 30%.Instead it finds a global bias of only 9%, dominated by a few sites with large positive representation errors (median bias over all sites: 4%).Wang et al. (2018) performed an analysis very much like the one in this study with one crucial difference.As they did not have a global simulation at high resolution like G5NR, they downscaled results from a standard global simulation at 2.5 o × 1.27 o resolution.The downscaling was accomplished with the help of a high-resolution (0.1 o × 0.1 o ) black carbon emission map(Wang et al., 2016).It is possible to simulate this procedure using the high-resolution G5NR black carbon emission maps and AAOT simulations and explain the different results inWang et al. (2018) and the current study.

Figure 22
Figure22shows AAOT spatial representation errors as estimated by the current study and by Wang's methodology as simulated with G5NR data.A global bias of 25%, not very different from the original 30% mentioned inWang et al. (2018), https://doi.org/10.5194/acp-2019-767Preprint.Discussion started: 23 September 2019 c Author(s) 2019.CC BY 4.0 License.Remote sensing observations from the AERONET and GAW networks are intermittent in time and have a limited field-ofview.Consequently such observations have limited ability to represent AOT or AAOT over larger areas.The resulting spatiotemporal representation error is here analysed using a high-resolution simulation of global aerosol (GEOS5 Nature Run, ∼ 7 km resolution near equator).Using G5NR, an OSSE was constructed that simulates the frequency of AERONET observations taking SZA, cloud fraction and AOT values into account.
https://doi.org/10.5194/acp-2019-767Preprint.Discussion started: 23 September 2019 c Author(s) 2019.CC BY 4.0 License.9% which is strongly influenced by a few sites with large positive representation errors due to their proximity to black carbon sources.Judiciously excluding those sites significantly reduces the bias even further.The large positive representation errors found by Wang et al. are shown to be due to methodological choices that limit the realism of their OSSE.

Figure 1 .
Figure 1.Evaluation of the G5NR simulation of AOT and AAOT with AERONET data.The top row shows evaluation against three different datasets.Each dot represents statistics for a single AERONET site (with at least 100 observations in 2006); the mean value is shown in red and the standard deviation in blue.The coloured text summarizes the statistics over all data points in the figure.In the bottom row, the impact of the minimum required number of observations per site is shown.

Figure 3 .
Figure 3. Evaluation of the G5NR simulation of daily variation in AOT and AAOT with AERONET data.Each dot represents statistics for a single AERONET site (with at least 100 observations in 2006).The grey text summarizes the statistics over all data points in the figure.

Figure 4 .Figure 5 .Figure 6 .Figure 7 .Figure 8 .Figure 9 .Figure 10 .Figure 11 .Figure 12 .Figure 13 .Figure 14 .Figure 15 .
Figure 4. Evaluation of the observational coverage predicted by the OSSE with AERONET observations.Each dot represents statistics for a single AERONET site (with at least 100 observations in 2006, at least 30 observations for Inversion L2.0).The grey dots are real AERONET data, the red dots are simulated by the methodology described in Sec. 3. The numbers in the graph are temporal coverages estimated by hemisphere.

Figure 21 .
Figure 21.Black carbon emissions over France, Europe, with the representation errors in AAOT from Inversion L1.5 AERONET superimposed.The representation errors use the same colour bar as in Fig. 7 and runs from −25% (blue) to +25% (red).

Figure 22 .Figure 23 .
Figure 22.Yearly representation errors for AAOT from Inversion L1.5 AERONET as estimated in this paper or using the methodology from Wang et al. (2018) and a model grid-box size of 2 o .The representation error shown is the spatial representation error (Schutgens et al., 2017), i.e. temporal sampling of observations is ignored.

Table 2 .
Mapping from G5NR data to data used in this study AERONET data files.The maximum cloud-fraction was slightly tuned to obtain similar temporal coverage of observations as real AERONET data (see Sect. 4 and Fig.4but the impact is small).

Table 3 .
Conditions for valid AERONET observations as simulated in this study

Table 5 .
Correlation in modelled and observed yearly site AOT

Table 6 .
(Kinne et al., 2013)ONET sites in(Kinne et al., 2013) representation errors at any site be decomposed in a bias and random error (possibly with temporal correlations over several months)?; 5) what are representation errors like in multi-year averages?
Tao, S.: Estimation of global black carbon direct radiative forcing and its uncertainty constrained by observations, Journal of Geophysical Research, 121, 5948-5971, https://doi.org/10.1002/2015JD024326,2016.https://doi.org/10.5194/acp-2019-767Preprint.Discussion started: 23 September 2019 c Author(s) 2019.CC BY 4.0 License.Evaluation of the G5NR simulation of AAOT with L2.0 AERONET data.Each dot represents statistics for a single AERONET site (with at least 30 observations in 2006); the mean value is shown in red and the standard deviation in blue.The coloured text summarizes the statistics over all data points in the figure.