Characterization of satellite-based proxies for estimating nucleation mode particles over South Africa

Proxies for estimating nucleation mode number concentrations and further simplification for their use with satellite data have been presented in Kulmala et al. (2011). In this paper we discuss the underlying assumptions for these simplifications and evaluate the resulting proxies over an area in South Africa based on a comparison with a suite of ground-based measurements available from four different stations. The proxies are formulated in terms of sources (concentrations of precursor gases (NO2 and SO2) and UVB radiation intensity near the surface) and a sink term related to removal of the precursor gases due to condensation on pre-existing aerosols. A-Train satellite data are used as input to compute proxies. Both the input data and the resulting proxies are compared with those obtained from ground-based measurements. In particular, a detailed study is presented on the substitution of the local condensation sink (CS) with satellite aerosol optical depth (AOD), which is a columnintegrated parameter. One of the main factors affecting the disagreement between CS and AOD is the presence of elevated aerosol layers. Overall, the correlation between proxies calculated from the in situ data and observed nucleation mode particle number concentrations (Nnuc) remained low. At the time of the satellite overpass (13:00–14:00 LT) the highest correlation is observed for SO2/CS (R 2 = 0.2). However, when the proxies are calculated using satellite data, only NO2/AOD showed some correlation with Nnuc (R 2 = 0.2). This can be explained by the relatively high uncertainties related especially to the satellite SO2 columns and by the positive correlation that is observed between the ground-based SO2 and NO2 concentrations. In fact, results show that the satellite NO2 columns compare better with in situ SO2 concentration than the satellite SO2 column. Despite the high uncertainties related to the proxies calculated using satellite data, the proxies calculated from the in situ data did not better predict Nnuc. Hence, overall improvements in the formulation of the proxies are needed.


Introduction
Aerosol particles are key constituents in the Earthatmosphere system that can alter climate through their direct and indirect effects on the Earth's radiation budget. Aerosols affect the radiation budget directly by scattering and absorbing solar radiation and indirectly by acting as cloud condensation nuclei or ice nuclei and modifying clouds' radiative properties and lifetimes. However, the quantification of the aerosol effects on climate is very complex and large uncertainties still exist due to the high spatial and temporal variability of aerosol mass and particle number concentrations (e.g. IPCC, 2013). Besides the climatic effects, aerosols affect human life by reducing the air quality and visibility as well as affecting human health, especially in urban areas. Particulate air pollution has been associated with adverse cardiovascular and pulmonary diseases and even with rises in the numbers of deaths among older people (e.g. Seaton et al., 1995;Utell et al., 2000;Schnelle-Kreis, 2009).
Primary aerosol particles, e.g. sea spray aerosol, desert dust, aerosol generated from biomass burning, and fossil fuel combustion, are emitted directly into the atmosphere. Secondary particles are formed from precursor gases through gas-to-particle conversion. The formation of new particles is strongly connected to the presence of sulphuric acid and other vapours of very low volatility, as well as the magnitude of solar radiation (e.g. Kulmala et al., 2005;Kulmala and Kerminen, 2008). However, pre-existing aerosol particles act as a sink for the vapours, inhibiting new aerosol formation (e.g. Kulmala and Kerminen, 2008). These new nanometresize aerosol particles grow through condensation and coagulation to sizes where they may act as cloud condensation nuclei (particle diameter D p >∼ 50 nm) or where they are large enough (D p >∼ 100 nm) to scatter solar radiation and thus affect the Earth's radiation budget.
Several studies have shown that nucleation occurs frequently in the continental boundary layer and free troposphere from clean to polluted environments (Kulmala et al., 2004;Kulmala and Kerminen, 2008, and references therein). Laakso et al. (2008) and Vakkari et al. (2011) have studied new particle formation over moderately polluted savannah ecosystems in South Africa and found that nucleation takes place in the boundary layer almost every sunny day throughout the year with a frequency of as high as 69 % of all analyzed days (Vakkari et al., 2011). Hirsikko et al. (2012) extended the studies in South Africa to a polluted measurement site and found an even higher frequency for the nucleation event days (86 %), which is among the highest event frequencies reported in the literature so far. Hirsikko et al. (2013) also studied the causes for two or three consecutive daytime nucleation events, followed by subsequent particle growth during the same day. They concluded that the multiple events were associated with SO 2 -rich air from industrial sources.
Satellite instruments have been providing global observations of the Earth's atmosphere for 3 decades (e.g. Lee et al., 2009;Kokhanovsky and de Leeuw, 2009;Burrows et al., 2011). Information about the spatial distribution of aerosols and trace gases can be obtained from multiple instruments with various temporal and spatial resolution and coverage. Passive remote sensing instruments such as NASA's Ozone Monitoring Instrument (OMI) on-board the AURA platform or the Moderate Resolution Imaging Spectroradiometer (MODIS) on-board the Terra and Aqua platforms use solar radiation to detect either trace gases or aerosol and cloud properties. Trace-gas remote sensing techniques using OMI are based on the trace-gas absorption features in the UV region (wavelength λ ∼ 200-400 nm), whereas the remote sensing of aerosol particles is mainly based on measurements in the UV/visible and near-infrared regions (λ ∼ 500-2000 nm). Since the aerosol measurements utilize only the optically active size range of the solar spectrum, the detectable aerosol sizes are limited to particles with diameters greater than about 100 nm. Nucleation mode particles (smaller than about 25-30 nm in diameter), therefore, cannot be detected directly using satellite instruments. In 2011, Kulmala et al. introduced proxies, i.e. parameterizations, for estimating the number concentrations of nucleation mode (N nuc ) simplified for use with satellite data. These simplifications were made assuming that in situ parameters could be replaced with satellite-based observations. Their study was the first attempt to estimate the global nucleation mode aerosol concentrations using data derived from satellite measurements. The proxies were defined in terms of sources and sinks. The nucleation source terms consist of precursor gas column densities (NO 2 or SO 2 ) and UV-radiation intensity near the surface (all from OMI as opposed to in situ data in the initial proxies) whereas the sink term, i.e. the condensation sink (CS) in the original proxy formulation related to the aerosol surface area concentration, is assumed to be proportional to the aerosol optical depth (AOD, from MODIS). More recently Crippa et al. (2013) formulated a new proxy algorithm for ultrafine particle number concentrations based on satellite-derived parameters. They used a multivariate linear regression approach to derive the proxy, which the source terms consisted of SO 2 , UV (from OMI), and NH 3 (from Tropospheric Emission Spectrometer). The sink term was formulated using MODIS (collection 5.0) AOD and the Ångström exponent, which expresses the spectral dependence of AOD on the wavelength. However, there are issues with the Ångström coefficient (e.g. Mielonen et al., 2011), and thus this parameter is no longer included in the most recent MODIS collection 6.0 land parameters (Levy et al., 2013).
In this work we evaluate the simplifications and underlying assumptions of the method introduced in  to estimate the number concentration of nucleation mode particles from satellite-derived data. The study area is the north-eastern part of South Africa (25-28 • S, 25.5-30.5 • E, Fig. 1). Even though the area is not very large, it comprises lots of contrasts from the emission point of view: the cities of Johannesburg and Pretoria, as well as highly industrialized areas especially east from the cities, vs. a very clean background in the western part of the study area. The study period considered is January 2007-December 2010. There are also four different measurement stations located within the region of interest, where observations of various in situ parameters were available.
This work comprises of two parts: 1. A detailed investigation of replacing the condensation sink (defined below in Eq. 8), a local parameter evaluated from in situ observations, with the AOD, a columnintegrated aerosol property available from satellite.  2. The estimation of how well satellite data can be used to compute proxies for nucleation mode particle number concentrations. This comprises the analysis of both the satellite-and in-situ-based proxy components and the proxies, as well as the comparison of the proxies with the measured concentration of nucleation mode particles. The influence of the uncertainties in the satellitederived quantities on the proxy is also evaluated.

Data
In this study, a variety of data was used from satellite instruments and ground-based stations (see Table 1 for a summary). Satellite data used originate from NASA's Afternoon-Train (A-Train) constellation. The A-Train constellation consists of seven satellites that are on a same polar-orbiting track and follow each other closely, enabling near-simultaneous observations of a variety of atmospheric parameters. The equatorial overpass for the A-Train satellites is around 1:30 p.m. local time. In this study we use OMI Level 2 products, i.e. the NO 2 tropospheric column (Bucsela et al., 2013), the SO 2 planetary boundary layer (PBL) product (Krotkov et al., , 2008, and the 310 nm irradiance (UVB) at the surface at local noon (Tanskanen et al., 2006). It is noted that the OMI SO 2 PBL product describes the SO 2 concentration integrated over the whole atmospheric column, and PBL refers to the a priori profile assumed in the retrieval of this product. The OMI L2 products are provided with a nominal spatial resolution of 13 × 24 km 2 . For the current study they were re-gridded onto a 3 km × 3 km geographical grid as in Fioletov et al. (2011). In this way the effective spatial resolution could be increased despite the instrument resolution being coarser than the grid. For NO 2 and SO 2 , only those observations where the (radiative) cloud fraction was below 20 % were used. According to Lamsal et al. (2014), and references therein, the uncertainty in the OMI NO 2 tropospheric column concentrations is about 0.75 × 10 15 molec cm −2 , whereas Krotkov et al. (2008) report that the SO 2 PBL product could be associated with noise as high as 1.5 DU. However, averaging the SO 2 columns over longer a period and/or over a larger spatial area could reduce the noise to 0.3-0.6 DU. For OMI UVB irradiance the relative uncertainty is on average 7 % but could be higher, e.g. due to some episodic aerosol plumes (Tanskanen et al., 2006).
The AOD used in this study is the MODIS Aqua collection 6.0 AOD product at 3 km spatial resolution (Levy et al., 2013). The relative uncertainty for the MODIS AOD over land is reported as 0.05 + 15 %. For selected cases, vertical aerosol extinction profiles from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) (Winker et al., 2007) are also used.
The in situ data used in this study are collected at four different stations in South Africa: Elandsfontein (ELA), Marikana (MAR), Botsalano (BOT), and Welgegund (WEL). All of these stations are located in the north-eastern part of the country shown in Fig. 1. Depending on the station, the measured parameters included e.g. particle size distribution, extinction coefficient, and trace-gas concentrations. More detailed description of the in situ measurements at the Marikana station can be found e.g. in Venter et al. (2012), at the Welgegund station in Beukes et al. (2013), at the Elandsfontein station in Laakso et al. (2012), and at the Botsalano station in Vakkari et al. (2013). Also data from the Aerosol Robotic Network (AERONET, http://aeronet.gsfc.nasa.gov, Holben et al., 1998) at the Elandsfontein station are used. AERONET is a global ground-based sunphotometer network, providing observations of aerosol optical, microphysical, and radiative properties that are available in a public domain. The aerosol optical properties in the total atmospheric column are derived from the direct and diffuse solar radiation measured by the Cimel sunphotometers. Kulmala et al. (2011) derived the N nuc proxies for regionalscale nucleation and nucleation from primary emissions. The proxies were determined as the ratio of a source and a sink term. Regional-scale nucleation is associated with photochemistry and typically occurs over a spatial scale of hundreds of kilometres, whereas nucleation from primary emissions occur in the vicinity of local sources such as industrial or urban areas , and references therein). On a regional scale it was assumed that sulphuric acid acts as the driver of the regional nucleation process. Sulphuric acid is formed by oxidation of sulphur dioxide (SO 2 ) with the hydroxyl radical (OH), which, however, is mainly formed via photolysis of ozone and UV radiation. The main sink for sulphur acid is collisions with pre-existing aerosols. Petäjä et al. (2009) derived the proxy for the ambient sulphuric acid as UV·[SO 2 ]/CS, which was considered as the source term in the regional-scale nucleation proxy. Taking into account that in addition to sulphuric acid, the pre-existing aerosols are also the sink for the newly formed particles (N nuc ), the regional-scale nucleation proxy is determined as

Proxies
where CS denotes the condensation sink of pre-existing aerosols. Nucleation from primary emissions can be an extremely rapid process. The source term of the corresponding proxy is related to the concentration of nitrogen dioxide (NO 2 ) or sulphur dioxide while the sink term is determined by the condensation sink. For nucleation from primary emissions, two proxies are defined as In each of the proxies the source terms are estimated from the satellite measurements by replacing the SO 2 and NO 2 concentrations at the surface with the column densities from the satellite. The amount of global UV radiation is also available from satellite measurements e.g. as a local noon irradiance at 310 nm wavelength (UVB radiation) at the surface. For the sink parameter (CS), Kulmala et al. (2011) proposed to use the AOD, which describes the total aerosol extinction in the atmospheric column. The relation between the CS and the AOD will be discussed in the following section. By replacing CS with AOD the simplified proxy for using satellite data for primary nucleation becomes (5) For regional nucleation the proxy expressed in terms of satellite data becomes In addition we also considered as a potential proxy for the number concentration of nucleation mode particles. This proxy corresponds to the case shown in Kulmala et al. (2011), where the sulphur dioxide concentration was assumed to be constant. In this work the proxy defined in Eq. (7) is considered mainly to study how large effect the satellite-based SO 2 has on the performance of the regional-scale nucleation proxy.

Condensation sink and aerosol extinction
As indicated in the previous section, Kulmala et al. (2011) proposed AOD as a substitute for CS. Both parameters are also roughly proportional to the aerosol surface area distribution. According to e.g. Lehtinen et al. (2003) the condensation sink is defined as where D p is the particle radius, n(D p ) is the particle number size distribution function, ρ diff is the diffusion coefficient of the condensing vapour, and β M (D p ) is the transitional correction factor for mass flux (Fuchs and Sutugin, 1971). Aerosol optical depth describes quantitatively the columnintegrated extinction of solar light caused by atmospheric aerosols and is one of the standard aerosol parameters retrieved from the satellite radiance observations. At a height z and for a wavelength λ the aerosol extinction is defined as where Q ext is the extinction efficiency describing aerosols' ability to scatter and absorb solar light. At a fixed wavelength the extinction efficiency is a complex function of aerosol size and complex refractive index m (which in turn depends on the aerosol particle composition). The particle shape also affects somewhat Q ext , but this is not considered in this study.
If the particles are assumed to be spherical, Q ext can be calculated using a computer code based on the Lorenz-Mie theory (Mishchenko et al., 2002). AOD is obtained by integrating σ ext over the total atmospheric column. The sensitivity of CS and aerosol extinction coefficient to different particle sizes. In the left panel, the aerosol size distribution that is used to calculate CS and σ ext is calculated for two wavelengths (0.55 and 0.45 µm) assuming spherical particles with a refractive index of m = 1.48 + 0.001i. In the right panel the contribution of each particle size to the total CS and σ ext is shown. The σ ext is calculated for two wavelengths (0.55 and 0.45 µm) assuming spherical particles with a refractive index of m = 1.48 + 0.001i.
The differences between CS and σ ext (at a certain height) as a function of particle size are illustrated in Fig. 2. Both parameters are derived using the same aerosol size distribution (Fig. 2, left panel). The σ ext is calculated using a refractive index of m = 1.48 + 0.003i and wavelengths of 0.55 and 0.45 µm. As Fig. 2 shows, particles with D p about 0.05-0.1 µm have the largest contribution to CS, whereas for σ ext the largest contribution is coming from particles with D p about 0.2-0.8 µm. The notable difference between the two quantities is that particles D p < 0.1 µm can have a contribution to CS which is several orders of magnitude larger than that to σ ext . However, σ ext is significantly more sensitive to particles with D p > 1.0 µm than CS. It is clear that a large change in number concentration of the smaller particle sizes would change the value of total CS when integrated over the size distribution but would have a minor effect on the value of σ ext , and vice versa; if e.g. the number concentration of large particles increased, there would be little effect on CS. It is noted that in addition to the theoretical differences the possibility of elevated aerosol layers could affect the columnintegrated values of σ ext , i.e. the AOD, which must be considered when comparing the satellite-based AOD with in situ CS.
The response of σ ext to changes in the particle size distribution depends to a certain extent on the particle composition and the measurement wavelength. If the particle absorption is high (i.e. the imaginary part of m ∼ 0.1i), the contribution of particles D p < 0.1 µm to σ ext would be somewhat higher than in Fig. 2. Shorter wavelengths increase the sensitivity to smaller particles, but as Fig. 2 illustrates, a 0.1 µm decrease in wavelength does not improve the sensitivity significantly. . Comparison between condensation sinks derived from particle size distributions, as described in the text, and nephelometer scattering coefficients measured at Elandsfontein station in 2010 for the warm (January-April, November-December) and cold (May-October) seasons. CS has been corrected to the ambient relative humidity but the scattering coefficient was measured from dry particles. The data are colour-coded according to ambient relative humidity, of which the strong influence on the relation between CS and scattering coefficient is evident.

Figure 4.
Comparison between AOD at 500 nm available from AERONET (see text) and in situ scattering coefficients measured at the Elandsfontein station. The AOD is the column-integrated value of aerosol extinction (scattering + absorption) obtained from sunphotometer measurements. The in situ scattering coefficient is measured with a nephelometer.
Much shorter wavelengths would be needed to increase the sensitivity of σ ext to particles D p < 0.1 µm, but such measurements could not be carried out in a real atmosphere.

Results
The proxies as defined in Sect. 3 are formulated in terms of parameters which are either obtained from ground-based in situ measurements (Eqs. 1-3) or from satellite data . In this section the performance of these proxies is critically evaluated and in particular each of the satellite-based parameters is critically examined.

Comparison of condensation sink and aerosol optical depth
Replacing CS with AOD is perhaps the most crucial assumption when determining the proxies using satellite data, as indicated in Kulmala et al. (2011). Apart from the sensitivity of these parameters for different particle sizes discussed in Sect. 3.1, other differences play a role, such as the vertical variation of the aerosol concentrations, the particle size range considered, and the dependence of aerosol particle size on relative humidity. CS is determined from measured dry particle size distributions with a correction for ambient humidity. CS at Botsalano and Marikana has been estimated from submicron size distribution while at Elandsfontein size distributions up to 10 µm were used. In contrast, the AOD is an integrated quantity with contributions from all optically active aerosols throughout the whole atmospheric column. To assess the effect of these different factors on the relation between the AOD and CS, the following comparisons are made: -in situ CS with nephelometer aerosol scattering coefficient -in situ nephelometer aerosol scattering coefficient with AOD from AERONET -in situ CS with AOD from both AERONET and satellite measurements.
Coincident measurements of size distributions to derive the CS and aerosol scattering coefficients from a nephelometer are only available from the Elandsfontein measurement station. The comparison between CS and scattering coefficient serves to eliminate effects of the vertical variation of the aerosol concentrations on the comparison. The nephelometer measures the dry particle scattering at 0.525 µm wavelength and the results are presented at standard temperature and pressure for the atmosphere. The maximum particle size is limited to D p ∼ 10 µm. It is noted that the nephelometer considers only aerosol scattering and not the total extinction, which would also require information on absorption. However, the contribution of absorption to the total aerosol extinction is generally much smaller than scattering. Laakso et al. (2012) reported that at Elandsfontein the absorption was increased during the coldest months (May-October) due to biomass burning (domestic burning of coal for heating and cooking) contributing about 15-20 % to the total aerosol extinction, whereas during the warmer months (November-April) absorption contributed ∼ 10 % of the total aerosol extinction. To take the seasonal variation of absorption into account, the CS and the scattering coefficients were compared separately for the periods May-October and November-April. The results in Fig. 3 show that, for both periods, scattering coefficients and CS were well correlated with R 2 = 0.67 for November-April and R 2 = 0.71 for May-October. The R 2 values were somewhat higher than those from measurements at a clean continental boreal forest measurement site in Hyytiälä, southern Finland (R 2 = 0.62, Virkkula et al., 2011). The next step is to compare the nephelometer scattering coefficient to the AOD to evaluate effects of the possible occurrence of elevated aerosol layers and/or boundary layer mixing. Also the presence of large dust particles might have some effect on the comparison due to the limited particle size in the nephelometer inlet. In this comparison we first compare with AERONET measurements of AOD at Elandsfontein, which are more accurate than those retrieved from satellite data. As Fig. 4 shows, the correlations between the AERONET AOD and the in situ scattering coefficient (warm season R 2 = 0.46, cold season R 2 = 0.24) are lower than those between the CS and the scattering coefficient. This indicates that the elevated aerosol layers and boundary layer mixing might affect more than the theoretical differences when estimating the sink of pre-existing aerosols by using the AOD.
For the comparison of CS with the AOD retrieved from MODIS, daily AOD values were used which are spatial averages of the observations within a 3 km radius from each measurement station. As Fig. 5 shows, the CS vs. satellite AOD data are scattered all over the graph and although there is a tendency of increasing CS with increasing AOD there is no apparent correlation (0.03 ≤ R 2 ≤ 0.06). As an alternative, a bivariate method (York et al., 2004) was applied to account for the uncertainties associated with both CSs and MODIS AODs in the fitting. For CS the uncertainty was assumed to be 10 % (Petäjä et al., 2013) and for MODIS AOD an uncertainty of 0.05 + 15 % was used (Levy et al., 2013). This means that for low AOD the relative uncertainty is rather high; e.g. for AOD = 0.1 the relative uncertainty would be 65 %. As Fig. 5, shows the bivariate method gave very different results than least squares linear fitting.
At Marikana and Elandsfontein the largest observed AODs are not related to largest CS, which could be due to the presence of elevated aerosol layers. In a recent study by Giannakaki et al. (2015) data from a ground-based lidar at Elandsfontein are analyzed and the results show that the mean contribution of elevated aerosol layers to the AOD is 46 %. To estimate the effect of elevated aerosol layers on the CS-AOD comparison at Marikana, CALIPSO observations of aerosol vertical extinction profiles are used. All CALIPSO daytime overpasses between 8 February 2008 and 17 May 2010 within 50 km from the Marikana station were considered. Due to the small CALIPSO swath width only 48 days of data are available. At Marikana the median MODIS AOD is 0.15 for the whole measurement period and, as Fig. 5 shows, the CS values are less scattered when AODs are smaller than the median. Therefore the vertical aerosol extinction profiles from CALIPSO are studied separately for the cases where MODIS AOD ≤ 0.15 and AOD > 0.15. As Fig. 6 shows, for higher AODs the median extinction profile indicates an elevated aerosol layer, which supports the result that high AODs also at Marikana are likely to be associated with an elevated aerosol layer.

Proxies defined from the in situ data and comparison with N nuc
The proxies are first computed using in situ measurements from Marikana and Elandsfontein following Eqs.
(1)-(3) to evaluate how well each of them could predict the nucleation mode number concentration within our study area. It is noted percentiles. It is noted that CS at Elandsfontein is defined with particles D p < 10 µm and at Marikana with particles D p < 1 µm. N nuc at Marikana represents particles D p < 30 nm while at Elandsfontein N nuc represents particles D p 10-30 nm.
that due to different instrumentation, N nuc from Marikana consists of particles with D p < 30 nm, but at Elandsfontein N nuc consists of particles with D p 10-30 nm. In addition, CS at Marikana is defined from submicron particles whereas at Elandfontein CS is defined from particles with D p < 10 µm. Figure 7 shows the diurnal variation of each of the in situ proxy components and the number concentration of nucleation mode particles. At Marikana the N nuc median peaks about 10 a.m. and at Elandsfontein about 1 h later. At the time of the satellite overpass the median of N nuc is lower than before noon at both locations and about the same order of magnitude. The diurnal variation of NO x -NO and SO 2 concentrations shows somewhat different characteristics at Marikana than at Elandsfontein. The morning and evening peaks of NO x -NO at Marikana are most likely associated with household combustion and traffic, whereas the single SO 2 peak in the morning is most likely related to the industrial emissions and the break-up of the inversion layers that form quite regularly in the South African Highveld (Venter et al., 2012). At Elandsfontein, where the major emission source is heavy in-dustry, an increase in the NO x -NO and the SO 2 concentration medians is seen at about 10 a.m. The median of SO 2 concentration decreases in the late afternoon while the median of NO x -NO concentration does not vary much. At the time of the satellite overpass the NO x -NO and SO 2 medians are much higher at Elandsfontein than at Marikana. Results show also that at the time of the satellite overpass NO x -NO and SO 2 are positively correlated: at Elandsfontein R 2 = 0.58 and at Marikana R 2 = 0.32. At Elandsfontein CS does not show any clear diurnal variation and it is systematically lower than at Marikana. Also at Marikana the diurnal variation of the CS is rather weak during the daytime but a peak in the median is seen in the evening. Figure 8 shows the diurnal variation of the in situ proxies at Marikana and Elandsfontein. The comparison of the diurnal variation of the proxies and N nuc indicates that the proxy-N nuc relation depends on the time of the day. At the time of the satellite overpass (13:00-14:00 LT) the highest correlation with N nuc at Marikana is obtained with the SO 2 /CS-proxy (R 2 = 0.22, Fig. 9), but at Elandsfontein the  correlation remains below 0.1. At Marikana the correlation of N nuc with SO 2 ·UV/CS 2 proxy (Eq. 1) is not as good at the time of the satellite overpass, but at 9-10 a.m. R 2 = 0.25. The (NO x -NO)/CS and UV/CS 2 proxies do not perform well in predicting N nuc . Also, it is noted that at the time of the satellite overpass all the proxy values show much higher me- dian values at Elandsfontein than at Marikana while the median for N nuc is about the same at both locations. At Elandsfontein somewhat better correlations with N nuc are observed when only the source terms of the proxies are considered. For example, the values of R 2 between N nuc and SO·UV are 0.35 at 10:00-11:00 LT and 0.14 at 13:00-14:00 LT, but when the sink-term CS 2 is included in the proxy there is no correlation. At Marikana CS does not have as high an influence on the proxy performance as at Elandsfontein.
This differs from the results reported for southern Finland  in that SO 2 in our study has a strong effect on the performance of the proxy: without SO 2 the UV/CS 2 term does not correlate with N nuc . Given that the satellite data are associated with much higher uncertainties than the in situ measurements, these in-situ-based results can be considered as upper limit for the overall performance of the proxies computed using satellite data (Eqs. 4-7).

Spatial pattern of the satellite-based proxies
Each of the satellite-based parameters is analyzed from January 2007 to December 2010. Figure 11 shows the 4-year medians of SO 2 and NO 2 column densities obtained from the OMI instrument as well as the AOD at 550 nm from MODIS Aqua observations. Daily satellite data are used to define the satellite-based proxies over the study area (Eqs. 4-7). Figure 12 shows the 4-year median spatial patterns for the four satellite-based proxies. The spatial patterns of these four proxies are quite different and in particular there is a large difference between the spatial variation of the regional proxies and that of the proxies for nucleation from primary emissions. As expected, the latter strongly reflects the spatial distributions of the precursor gases with high concentrations over the Highveld industrial area, where the values of NO 2 and SO 2 columns are high and the sink (AOD) is low. For the NO 2 /AOD proxy, elevated values are also observed over the Johannesburg-Pretoria area while for the other proxies a local minimum occurs over these cities.
All the four satellite proxies show larger values at Elandsfontein than at Marikana, which is consistent with the results obtained for the in situ proxies. Based on the in situ results the SO 2 -related proxies are expected to predict N nuc at the time of the satellite overpass better than the other proxies. A comparison of the spatial patterns of each proxy calculated using satellite data in the vicinity of the in situ measurement stations shows that there is not very much difference between the spatial pattern of SO 2 -and NO 2 -related proxies.
The propagation of relative uncertainty associated with the proxies using satellite data can be estimated by comparing the uncertainties related to each satellite parameter (Sect. 3) and the observed median values shown in Fig. 11. On the one hand, over background areas where both AOD and SO 2 are low the SO 2 ·UVB/AOD 2 proxy can have an uncertainty of over 90 %. On the other hand, over source areas where both NO 2 and AOD are slightly elevated the NO 2 /AOD proxy would have an uncertainty of about 50 %. Generally over South Africa the uncertainty in satellite-based proxies is high, especially over areas where low values of NO 2 , SO 2 , and AOD are frequently observed.

Comparison of satellite and in situ proxy components
Before evaluating the performance of the proxies using satellite data, first the quality of the parameters used in these proxies should be examined. The CS/AOD comparison was discussed in Sect. 4.1. Here we compare satellite data for NO 2 , SO 2 and UVB with in situ data at each of the measurement stations. The satellite data for each station are collected within a 12 km (NO 2 , SO 2 , UVB) or a 3 km (AOD) radius from the station and the results are compared with hourly means of the in situ data extracted between 13:00 and 14:00 LT, i.e. ±30 min within the approximate satellite overpass.
The satellite NO 2 column densities and the in situ NO x -NO concentrations are reasonably well correlated as are the satellite UVB irradiances and the global radiation measured at each station. The highest correlation for NO 2 were obtained at Marikana (R 2 = 0.55) and the lowest at Elandsfontein (R 2 = 0.26). For UVB and global radiation the correlations were 0.61 ≤ R 2 ≤ 0.77. In Kulmala et al. (2011) a constant value was assumed for the satellite-based SO 2 when defining the global proxy maps, because the SO 2 product they used (middle tropospheric SO 2 ) did not show a reasonable spatial pattern. In this study the middle-troposphere SO 2 data were replaced by the OMI boundary layer product (Sect. 3), which improved the characterization of the SO 2 spatial variation (Fig. 10). However, the relative uncertainty in the satellite-based SO 2 remains still high, unless the data are averaged over a long time period/large spatial area. At all three stations a lower correlation between the satellite-and in-situ-based SO 2 measurements was obtained than for the other source parameters; at Marikana there is practically no correlation. Similar results were obtained when the satelliteand in-situ-based proxies were compared (Table 2, figures in the Supplementary Material). Overall large differences exist between the satellite proxies and in situ proxies.
Since at Marikana and Elandsfontein the in situ data showed correlation between the NO x -NO and the SO 2 concentrations, the satellite NO 2 column density is also compared with the in situ SO 2 . Results show that in fact the OMI NO 2 compares better with the in situ SO 2 than the actual OMI SO 2 product. At Elandsfontein R 2 = 0.25 and at Marikana R 2 = 0.31, obtained between the satellite NO 2 column and in situ SO 2 concentration.

Comparison of satellite-based proxies with N nuc
To further evaluate the performance of the satellite-based proxies, they are compared to the in situ N nuc . Only data from Elandsfontein and Marikana are included in the comparison since the number of coincident N nuc and satellite proxy ob- Figure 12. The comparison between the number concentration of nucleation mode particles and NO 2 /AOD calculated from the satellite data at Marikana and at Elandsfontein stations. The number concentrations are 1 h averages (13:00-14:00 LT) representative of the satellite overpass time. It is noted that at Elandsfontein N nuc represents particles with D p 10-30 nm and at Marikana particles with D p < 30 nm. N obs denotes the number of coincident observations. servations was too low at the other stations. As expected, neither of the two satellite-based SO 2 proxies are able to predict N nuc . Interestingly, the only case where weak correlation is obtained between a proxy using satellite data and N nuc is for NO 2 /AOD (Fig. 12). This result is very different than what is expected based on the comparison of the in situ proxies and N nuc . In fact, the connection between NO 2 /AOD and N nuc is most probably related to the correlation between the satellite NO 2 column density and the in situ SO 2 concentration. If the source term in the SO 2 ·UVB/AOD 2 proxy was replaced by NO 2 ·UVB, the correlation with N nuc at Elandsfontein would be R 2 = 0.23 and at Marikana would be R 2 = 0.06. This implies that over areas where SO 2 and NO 2 are affected by some common factors, e.g. emission sources, the satellite NO 2 could be a better estimate for the source term than SO 2 .

Conclusions
This work explores the use of proxies using satellite data to obtain information on the concentration of nucleation mode aerosol particles (N nuc ). These proxies have been formulated using relations derived from data on ground-based nucleation and precursor gases, which were simplified for the use of satellite data in Kulmala et al. (2011). The simplifications and associated assumptions are critically examined. In this study data were used over part of South Africa where ground-based observations are available from four experimental sites for comparison with both the satellite-based parameters used in the proxy formulations and for comparison of the proxies with ground-based measurements of the nucleation mode aerosol particle number concentrations. For the computation of the proxies, data from the A-train satellites are used. The NO 2 , SO 2 , and UVB radiation are obtained from the OMI instrument and AOD from the MODIS instrument. The NO 2 and UVB data are the same as those used in Kulmala et al. (2011), but the AOD was upgraded to the newest collection 6, 3 km product. Also, the SO 2 product was changed to the planetary boundary layer product (OMI SO 2 PBL) that represents the total column values with a priori assumption that the emissions are mainly in the boundary layer. The satellite observations are also extensively compared with in situ data.
Based on the proxies derived from the in situ data it is expected that the SO 2 -related proxies would be the best predictors of N nuc within the study area at the time of the satellite overpass (13:00-14:00 LT). It is also noted that even though the in situ NO 2 /CS proxy did not do well in predicting N nuc , a positive correlation between the SO 2 and NO 2 concentrations is found at the measurement stations (at 13:00-14:00 LT). The R 2 between in situ SO 2 /CS and N nuc is 0.22 and this value could be considered as some kind of "upper limit" for the satellite proxies, for which uncertainties are much higher than for the in situ proxies. Using ground-based data, Kulmala et al. (2011) reported that SO 2 had only moderate influence on the performance of the SO 2 ·UV/CS 2 proxy in southern Finland. The overall correlation between this proxy and N nuc over South Africa was even lower (R 2 = 0.13) than over southern Finland (R 2 = 0.29), yet our results clearly indicate a strong influence of SO 2 on the performance of the proxy. If the SO 2 was excluded from the proxy, no correlation with in situ proxies and N nuc was found. Kulmala et al. (2011) emphasized that the most crucial assumption in deriving the satellite-based proxies was the replacement of the CS with AOD. This assumption is further evaluated in the current study using several tests. A fundamental reason for differences between CS and AOD is the intrinsic dependence on different aerosol size ranges, with CS more sensitive to very small particles (smaller than about 200 nm) and AOD more sensitive to particles larger than that. Yet, good correlation is obtained between measured scattering coefficients for dry aerosol and CS evaluated from collocated particle size distribution measurements. When the in situ scattering coefficients or CS are compared with collocated AOD measurements, the correlation decreases. This may be due to several effects. In particular the presence of elevated aerosol layers and/or large dust particle increases the AOD but does not affect the CS. However, overall the AOD is rather low (< 0.1) over the major part of the study area; this means that these values are also associated with substantial relative uncertainty, which needs to be accounted for when deriving the satellite-based proxies.
Even though the OMI SO 2 PBL data product showed a distinct improvement in describing the spatial patterns of SO 2 as compared to the data set used in Kulmala et al. (2011), the satellite-based SO 2 did not describe well the day-to-day variations at the measurement stations. In addition, the observed SO 2 column values were often close to the noise level associated with a single column retrieval reported by Krotkov et al. (2008). The only relation between a satellite-based proxy and N nuc was obtained for NO 2 /AOD (at Elandsfontein R 2 = 0.24 and at Marikana R 2 = 0.09). The result is different than what was expected based on the in situ proxies. The most probable explanation is the positive correlation between the ground-based NO 2 and SO 2 concentrations within the study area. It is found that in fact the satellite NO 2 column correlates better with in situ SO 2 concentration than the satellite SO 2 column, where no correlation was found.
Overall this study shows that the uncertainties related to the satellite products remain a major issue in this satellitebased proxy approach, especially over areas like South Africa, where the AOD and the SO 2 , and NO 2 concentrations are generally relatively low. Throughout the whole study the relative uncertainties related to the satellite-based proxies were well above 50 %. For the NO 2 /AOD proxy the largest relative uncertainties were often related to AOD. Otherwise SO 2 was clearly the most uncertain component in the proxies calculated using satellite data. Despite these uncertainties related to the satellite data, the in situ data did not do significantly better in predicting N nuc within our study area. This indicates that overall improvements in the formulation of the proxies are needed.
The Supplement related to this article is available online at doi:10.5194/acp-15-4983-2015-supplement.