A statistical proxy for sulphuric acid concentration

Gaseous sulphuric acid is a key precursor for new particle formation in the atmosphere. Previous experimental studies have confirmed a strong correlation between the number concentrations of freshly formed particles and the ambient concentrations of sulphuric acid. This study evaluates a body of experimental gas phase sulphuric acid concentrations, as measured by Chemical Ionization Mass Spectrometry (CIMS) during six intensive measurement campaigns and one long-term observational period. The campaign datasets were measured in Hyyti älä, Finland, in 2003 and 2007, in San Pietro Capofiume, Italy, in 2009, in Melpitz, Germany, in 2008, in Atlanta, Georgia, USA, in 2002, and in Niwot Ridge, Colorado, USA, in 2007. The long term data were obtained in Hohenpeissenberg, Germany, during 1998 to 2000. The measured time series were used to construct proximity measures (“proxies”) for sulphuric acid concentration by using statistical analysis methods. The objective of this study is to find a proxy for sulfuric acid that is valid in as many different atmospheric environments as possible. Our most accurate and universal formulation of the sulphuric acid concentration proxy uses global solar radiation, SO 2 concentration, condensation sink and relative humidity as predictor variables, yielding a correlation measure ( R) of 0.87 between observed concentration and the proxy predictions. Interestingly, the role of the condensation sink in the proxy was Correspondence to: S. Mikkonen (santtu.mikkonen@uef.fi) only minor, since similarly accurate proxies could be constructed with global solar radiation and SO 2 concentration alone. This could be attributed to SO 2 being an indicator for anthropogenic pollution, including particulate and gaseous emissions which represent sinks for the OH radical that, in turn, is needed for the formation of sulphuric acid.


Introduction
Sulphuric acid has been shown to be a key precursor for atmospheric particle nucleation (Weber et al., 1996;Kulmala et al., 2004;Kulmala and Kerminen, 2008;Kerminen et al., 2010;Kuang et al., 2010;Sipilä et al., 2010) and a major contributor on the growth of freshly formed particles (Fiedler et al., 2005;Stolzenburg et al., 2005;Wehner at al., 2005;Kulmala et al., 2006;Laaksonen et al., 2008) along with aminium salts and other organic compounds (Kuang et al., 2010;Smith et al., 2010).In the atmosphere, the number concentration of freshly nucleated particles is found to have a strong dependency on sulphuric acid levels (Weber et al., 1997;Riipinen et al., 2007;Kuang et al., 2008).In addition, recent work by Zhao et al. (2010) and Jiang et al. (2011) demonstrate the connection between sulfuric acid and the neutral nucleated clusters.A comprehensive understanding of the impacts of particle nucleation and growth on atmospheric chemical processes, geochemical cycles, and global climate is currently hampered by data availability, as gas S. Mikkonen et al.: A statistical proxy for sulphuric acid concentration phase sulphuric acid concentrations are difficult to measure.First measurements of atmospheric gas-phase sulfuric acid have been made on stratospheric balloons and research air craft by MPIK-Heidelberg, using PACIMS (Passive Chemical Ionization Mass Spectrometry), an innovative method developed by the MPIK-group (Arnold and Fabian, 1980;Arnold, et al., 1982;Arnold and Bührke, 1983;Viggiano and Arnold, 1983;Heitmann and Arnold, 1983).The first measurements in lower troposphere air have been made using active CIMS (Eisele and Tanner, 1993;Berresheim et al., 2000;Fiedler et al., 2005;Sorokin and Arnold, 2007).However, the Chemical Ionization Mass Spectrometers (CIMS) used in these lower troposphere measurements are still relatively rare.In addition, the challenges associated with these measurements combined with subtle differences between CIMS instruments have resulted in variations in the measurement results (Paasonen et al., 2010).
Several studies have provided evidence that high SO 2 and radiation levels contribute significantly to particle formation (Hyvönen et al., 2005;Mikkonen et al., 2006;Paasonen et al., 2009;Petäjä et al., 2009) and growth (Boy et al., 2005;Sihto et al., 2006;Mikkonen et al., 2011), most probably due to their effect on the concentration of H 2 SO 4 (Weber et al., 1997).Hamed et al. (2010) provided evidence that lowered SO 2 concentrations reduced the frequency and intensity of new particle formation (NPF) events in Melpitz, Germany.In addition, Jaatinen et al. (2009) found that in polluted areas SO 2 concentrations are higher on days when NPF occurs, and it was proposed to be due to the fact that SO 2 is the main precursor of gaseous sulphuric acid.In contrast, their findings showed that in a clean environment, Hyytiälä, Finland, SO 2 concentrations were lower on days when NPF occurred.In Hyytiälä, NPF usually appears to take place when the condensation sink is low, i.e. when air is clean.Boy et al. (2005) introduced a pseudo-steady state chemical box-model to calculate sulphuric acid and OH concentrations.The model was described and successfully verified against measured sulphuric acid data in Hyytiälä.Petäjä et al. (2009) derived three proxies for the sulphuric acid concentrations by using EUCAARI (European Integrated project on Aerosol Cloud Climate and Air Quality Interactions) 2007 campaign data and found that measured concentrations correlated well with proxies derived as well as with detailed pseudo-steady state chemical model results.However, the authors recognized that the proxies might be site-specific and should be verified against measurements prior to utilization in other environments.
The purpose of this study is to analyze data from six different measurement sites and find a single proxy for sulphuric acid concentration that can be applied over a greater range of environments than that developed by Petäjä et al. (2009).The robustness of the analysis results will be tested for different datasets in order to find a proxy that can be used in places where direct H 2 SO 4 measurements have not been made.

Data
In total seven datasets, consisting of six campaign datasets and one long term dataset, were analyzed for this study.Locations of the measurement sites can be seen in Fig. 1 and exact coordinates and times of the campaigns are listed in Table 1.
Two of the datasets presented in this study were obtained at the SMEAR II (Station for measuring Forest Ecosystem-Atmosphere Relations) located in Hyytiälä, Southern Finland.The site is situated in a boreal forest environment; detailed information about the continuous measurements and the infrastructure can be found in Hari and Kulmala (2005).The first set of measurements was made during the spring 2003 QUEST measurement campaign (Boy et al., 2005;Laaksonen et al., 2008).The second set of measurements were done as a part of 2007 EUCAARI project field campaign (Kulmala et al., 2009;Petäjä et al., 2009).
San Pietro Capofiume is located in Northern Italy, in a flat rural area in the eastern part of the Po Valley (Hamed et al., 2007).The distance to the closest cities, Bologna and Ferrara, is about 40 km.The Po Valley is the largest industrial, trading and agricultural area in Italy, with a high population density and substantial anthropogenic gaseous and particulate emissions from diffuse sources such as industry, domestic heating and traffic.However, during the measurements reported here, uncommonly clean conditions were encountered with the frequent influence of air masses from the Adriatic Sea (Paasonen et al., 2010).
Melpitz is a rural atmospheric research site in eastern Germany, operated by the Leibniz-Institute for Tropospheric Research (IfT).The site is situated on flat meadow grasslands surrounded by agricultural pastures and forests.Even though Melpitz is a rural observation site, the levels of anthropogenic pollution such as sulphur dioxide are higher than, for instance, Hyytiälä, Finland, or at the Hohenpeissenberg site (Hamed et al., 2010).The present set of sulphuric acid measurements was collected in May 2008 during the intensive measurement period of EUCAARI.The meteorological situation during that period was unusual in that continental air masses, containing high amounts of anthropogenic particles and trace gases, prevailed most of the time (Hamburger et al., 2011).This influx of pollution provides, however, a useful contrast to the other data sets, for example Hyytiälä.For more information on atmospheric measurements at Melpitz including the climatology of particle and trace gas concentrations, see Engler et al. (2007), Birmili et al. (2008), Birmili et al. (2009a), andSpindler et al. (2010).
The Meteorological Observatory Hohenpeissenberg (HPB) is a GAW (Global Atmosphere Watch) site operated by the German Weather Service (DWD).It is situated in rural southern Germany about 30 km north of the Alpine mountain ridge.The observatory stands on top of Hohenpeissenberg Mountain at an altitude of 985 m a.s.l. and about 300 m above the surrounding countryside.At night, HPB  usually resides above the nocturnal surface layer inversion.
In winter time, HPB may even reside above the daytime boundary layer.The surroundings of the mountain are mainly meadows and forests.For more information on previous aerosol and trace gas measurements at HPB, see Birmili et al. (2003) and Paasonen et al. (2009).Niwot Ridge (NWR) is a forested station located on an east-west oriented ridge in the Front Range of the Rocky Mountains approximately 35 km west of Boulder, Colorado, USA, with the entire study site lying above 3000 m elevation.The site sits in a broad saddle bounded by low rounded hills and is flanked by an alpine tundra ecosystem.Winds are typically westerly at night (downslope drainage) bringing relatively clean air from the continental divide; whereas daytime heating creates easterly (upslope) flow, bringing air from the Denver-Boulder metropolitan area (Boy et al., 2008).
Atlanta, Georgia, USA, is an urban site where high relative humidity (RH) in the morning may influence on sulphuric acid measurements (F.Eisele, personal communication, 2010).Measurements were made during the 2002 Aerosol Nucleation and Real-time Characterization Experiment (ANARChE) at Jefferson Street Station (JST), which is located about 4 km northwest of downtown Atlanta and about 9 km southeast of a coal-fired power station, the latter providing a rich source of H 2 SO 4 (McMurry et al., 2005).
Key variables of the study are compared in Table 2. Atlanta is the most polluted site, with the highest SO 2 concen-trations (median 1.54 ppb) and condensation sink (CS, median 1.51 × 10 −2 s −1 ), whereas the cleanest sites are NWR and Hyytiälä where median values of both pollution markers are almost one order of magnitude lower than in Atlanta.In Melpitz, SO 2 measurements between 8-27 May 2008 are removed from the analysis due to instrumental failure.The highest magnitudes of global radiation were measured in SPC, with a median value of 376 W m −2 , whereas Hyytiälä 2003 campaign was early in the spring so the median radiation was only 90 W m −2 .Only times when it was not completely dark were counted in the comparison of the Radiation entries.RH was highest in Melpitz (median 74 %) and lowest in NWR (median 52 %).Ozone concentrations were highest in Niwot Ridge (median 56.3 ppb) and lowest in Atlanta (median 30.8 ppb).Sulphuric acid concentrations were highest in Melpitz (median 2.94 × 10 6 molec cm −3 ) and lowest in Hyytiälä 2007 (median 1.86 × 10 5 molec cm −3 ).The uncertainty in sulphuric acid measurements, caused by differences in measurement procedures which in the worst case might lead up to 50 % differences between the instruments used in different sites (Paasonen et al., 2010), has to be taken into account when comparing the sulphuric acid concentrations.Note that standard deviations of [H 2 SO 4 ] and [SO 2 ] are really large.This is due to diurnal variation of [H 2 SO 4 ] and occasional pollution events, either from local sources or from long range transport, which cause high peaks to [SO 2 ].

Experimental
A proxy for sulphuric acid concentration is based on the currently accepted mechanism of atmospheric SO 2 oxidation (Finlayson-Pitts and Pitts Jr., 2000): First laboratory investigations of Reaction (R3), yielding a realistic quantitative rate coefficient and an identification of the product H 2 SO 4 , were conducted by the MPIK-Heidelberg group (Reiner andArnold, 1993, 1994)   CIMS-method, which allows sensitive and fast measurements of the reagent gas-phase SO 3 and the gas-phase product H 2 SO 4 .As Reactions (R1-R3) show, the production of sulphuric acid is defined by [OH] and [SO 2 ].It is mainly removed by condensation, so the time rate of change of sulphuric acid concentration can be written as where CS is condensation sink (e.g.Pirjola et al., 1999;Dal Maso et al., 2002) and k is a temperature dependent reaction constant (DeMore et al., 1997;Sander et al., 2002).Integrating Eq. ( 1) gives the sulphuric acid concentration at a given time.The Condensation sink is given by, where D pi describes the diameter of the particle in the size class i and N i is the particle number concentration in the respective size class.D is the diffusion coefficient of the condensing vapour, and β m the correction factor for the transition and the free molecular regimes (Fuchs and Sutugin, 1970).The reaction rate constant k is given by, To simplify the problem, it can be assumed that the H 2 SO 4 production and loss are in steady-state.Validity of the assumption will be analyzed later in Sect.3.1.Applying this assumption to Eq. ( 1) leads to a proxy function given by In order to find an easy-to-use proxy for sulphuric acid, it is not practical to explicitly include [OH] as it is even more  1 0 1 2 1 4 1 6 1 8 2 0 2 2   0 difficult to measure than [H 2 SO 4 ].Recent studies have suggested that the OH radical concentration is strongly correlated with the intensity of ultraviolet radiation (Rohrer and Berresheim, 2006) despite the complex OH chemistry in the atmosphere.In datasets from Hyytiälä, where ultraviolet radiation was measured, it was found that UV correlated strongly with global radiation.Therefore, due to lack of UV data for most of the campaign data sets, we use the measurements of global radiation as a proxy for OH, which gives us the following function for the steady state proxy: ] peak at the same times when the intensity of radiation is at its highest, which supports the proposition that the main driving force of sulphuric acid production in all sites is radiation.
The aerosol condensation sink determines how rapidly molecules will condense onto pre-existing aerosols (Kulmala et al., 2005), but according to our tests it may not fully account for the losses of sulphuric acid.The CS used in this study is calculated from the dry mobility diameter of the particles.Laakso et al. (2004) and Birmili et al. (2009b) provided an independent, but equivalent formulae that corrects the CS as a result of hygroscopic particle growth.Both studies, however, are based on experiments made at Hyytiälä.Birmili et al. (2003) provided, on the basis of similar hygroscopicity measurements, another simple parameterisation for hygroscopic particle growth at HPB.However, such parameterisations are not available for all measurement sites to date.Also, hygroscopic particle growth is expected to differ between different measurement sites and is, moreover, a function of season and air mass.A careful consideration of the hygroscopic growth effect on CS would thus require considerable additional effort.
As an alternative solution, we used a CS calculated under dry conditions (i.e. as measured in the particle size spectrometers) in all datasets.Sensitivity tests for Hyytiälä data indicate that the hygroscopicity correction is not of significant magnitude to remarkably improve the calculated sulphuric acid proxies, i.e. the RH-corrected CS did not give signif-icantly better results than the dry size CS. Figure 3 shows the connection between sulphuric acid and CS in all campaign datasets and no statistically significant correlation can be seen.Taking account the effect of relative humidity by multiplying CS with RH (data not shown) gives somewhat better correlations.For example in Melpitz the Spearman correlation for [H 2 SO 4 ] and CS is −0.03 but when CS is multiplied by RH the correlation strengthens to −0.38.Similar behaviour is seen in every other dataset except NWR, where the correlation between [H 2 SO 4 ] and CS is 0.56 and weakens below 0.1 when CS is multiplied with RH.
Figure 4  significantly.The only exception is NWR, where the average behaviour is exactly opposite.The behaviour can be attributed to the afternoon up-sloping urban air, which brings with it precursors of H 2 SO 4 as well as increased aerosol loadings.In Atlanta, in addition to afternoon events, plume events were seen mostly in the morning, which is shown in Fig. 4f.When CS is multiplied with RH the changes in the average curve seems to follow even more clearly the sulphuric acid curve, even in NWR. Figure 5 from Hyytiälä 2007 shows that on some days the condensation sink has almost the same diurnal cycle as sulphuric acid but on many days there is no such connection.As the Spearman correlations suggested, in some cases multiplying CS with relative humidity (RH) gives better agreement between the fluctuations.

Proxy construction
First we made tests with a linear fitting procedure in order to test the different proxy functions, introduced in   4. In SPC and both Hyytiälä datasets, the best linear proxy was L3, with Radiation • [SO 2 ] 0.5 , where correlation R between the observed [H 2 SO 4 ] and predicted values given by the proxy were 0.88, 0.74 and 0.86, respectively.The square root dependence of [SO 2 ] suggests that it acts also as an indicator for particulate pollution, which acts as a sink of sulphuric acid or OH radical, which is needed in the formation of sulphuric acid.When all campaign datasets were combined Spearman correlation between [SO 2 ] and CS was 0.57.Besides SO 2 , OH reacts with several different trace gases of anthropogenic origin e.g.CO, VOCs, CH 4 and NO x (e.g.Austin et al., 2002) which can be highly correlated with SO 2 in certain urban settings.This suggestion is supported by the result that in NWR, where the air is the cleanest, the power of [SO 2 ] in the best proxy is 1 (R = 0.67).In Atlanta high relative humidity in mornings may affect the sulphuric acid concentrations which have to be taken account in the proxy.Here the best prediction was gained with Proxy L5 (R = 0.82) but Proxy L3 also performed well (R = 0.80).In Melpitz Proxy L4 with RH as the loss term gave the best prediction but the results of L2 and L3 were not significantly weaker.Note that Proxy L4 outperformed Proxy L1 also in Hyytiälä, NWR and Atlanta, which suggests that in these data RH might be better indicator for removal process of [H 2 SO 4 ] than CS.
The observation that proxy L3 gives the best overall approximation using this linear type fitting suggests that the steady state assumption could be somewhat unrealistic in atmospheric conditions, and thus the linear fitting procedure may not be optimal for proxy construction.However, based on simultaneous measurements of SO 2 , OH and H 2 SO 4 , Eisele and McMurry (1997) have shown that at least in remote areas away from urban sources the steady-state assumptions should hold.This observation is confirmed by our own simple box model simulations with UHMA code (Korhonen et al., 2004) for several of the Hyytiälä 2007 campaign days.In these simulations, measured values of SO 2 , global radiation (used as a proxy for OH), particle size distribution as well as temperature and relative humidity were read into the model every 10 min, and the concentration of H 2 SO 4 was calculated according to Eq. (1) (using a model time step 1 s).These model runs indicated that the steady-state assumption holds well for typical atmospheric conditions.
In order to find the optimal parameterization for the proxy, a nonlinear least squares fitting procedure (Bates and Watts, 1988) was applied to all datasets, with fit functions given by Table 5. Nonlinear regression is usually needed when there are physical reasons for believing that the relationship between the response and the predictors follows a particular functional form.The general form of nonlinear regression model is given by where y i are the measured response observations, f is a known nonlinear function of the measured predictor variables x i , β are the estimated parameters of the model, ε i are the random residuals of the model which are usually assumed to be uncorrelated with mean zero and constant variance.The advantage of using the nonlinear approach is that the non-equilibrium conditions are taken account by estimating individual powers for proxy variables from the data.
In Table 5, a −f are parameters obtained from the fit to the data, k is temperature-dependent reaction constant given in Eq. ( 3), which is scaled by multiplying it with 10 12 in order Atmos.Chem. Phys., 11, 11319-11334, 2011 www.atmos-chem-phys.net/11/11319/2011/ to get more interpretable estimates for a.Again, all observations are 10 min averages of the variables and only data points with "Radiation" higher than 10 W m −2 and [SO 2 ] higher than 0.1 ppb were used in the analysis.The computation was made with R-software (R Development Core Team, 2010).
If the steady state assumption applies without any additional chemistry, then in Proxy N1 parameters b and c should be unity and d should be −1, and as seen from results of Proxy L1 in some cases it turns out to be an adequate approximation.However, the fitting procedure results in Table 6 show that the powers vary a lot for the best predictive models and that they are quite far from the theoretical values; for Proxy N1 the powers b, c and d vary in ranges 0.17-1.41,0.48-0.88 and −0.58-0.41,respectively.Parameter a is mainly a scaling factor, which partly takes into account the use of global radiation instead of [OH], while including the uncertainty of the H 2 SO 4 measurements, and thus varies greatly between sites.Power b for global radiation seems to be near unity in almost all datasets, varying, with few exceptions, between 0.8-1.4.This behaviour is independent of the other parameters in the model as expected from the theory, and suggests that global radiation is good indicator for OH concentration in this parameterization.This parameter can also be approximated with 1 without drastic reduction in the estimating ability of the proxy (results not shown here).Power c for [SO 2 ] is less than unity in almost all of the cases, which strengthens the assumption that SO 2 concentration acts also as an indicator of air pollution, i.e. factors that are sinks for OH and thus reduce the sulphuric acid production.The power c is nearest to unity in Niwot Ridge data, where the air is the cleanest.At other sites, power c lies in the range 0.48-0.81and in many cases it can be approximated with 0.5 as done in Proxy L3.According to theory, power d for CS should be −1 but in our datasets it seems to be closer to zero.Fitted values vary between −0.58 for Hohenpeissenberg to 0.41 for NWR with a median value of −0.15.In addition, the prediction capability of the Proxy N1, where the effect of CS is taken into account, is not significantly better than Proxy N2 with only radiation and SO 2 included (Table 6).This fact indicates that CS is probably not the best possible sink term for our proxy.If power c is fixed to unity, then power d approaches somewhat the theoretical value of −1; ranging from −0.34 at Melpitz to −0.9 at Atlanta, except for NWR where d stays positive, but the predictive ability of the proxy is reduced in all datasets.Relative humidity is an important factor in the loss process of sulphuric acid; high RH may increase the sticking probability of molecules to existing particles and it increases the CS because of uptake of water to the particles.Hamed et al. (2011) showed that RH is inversely correlated with radiation above 60 %, which may affect sulphuric acid formation.We first used RH as individual term in Proxy N3 with power e, but it did not make the proxy significantly better.Table 6 shows that including RH into the sink term together with CS in Proxies N4 and N5 gives a stronger sink term to the proxy and the measured and approximated values of [H 2 SO 4 ] come closer to each other with R varying between 0.68 for Hohenpeissenberg and 0.9 for SPC and Atlanta.Table 6 shows that taking RH into account in the proxy makes the fits significantly better; the best proxy for all datasets is given by Proxy N4 where RH is given an individual power e but the difference between N4 and N5 is almost negligible.
Performance of the proxies varies slightly between the sites.The best correlations between observed and predicted sulphuric acid concentrations (R > 0.9) were found with Proxy N4 for SPC and Atlanta, where the air is the most polluted and with Proxies N3-N5 for Melpitz.The lowest correlations in general were found for NWR, which is the cleanest of the sites and maybe the most difficult to model because it is impacted by advection of anthropogenic pollutant.Long term data from Hohenpeissenberg is the most difficult to predict due to seasonal variations of meteorological parameters and opposite seasonal variations of [H 2 SO 4 ] and [SO 2 ]: sulphuric acid concentration is at its highest in the summer, when solar radiation is at its highest, but SO 2 concentrations are at their highest in winter time.Still, the correlation between observed and predicted values with proxies N4 and N5 can reach almost 0.7, which can be considered a good result for a dataset spanning such a long time and containing such varying conditions.In SPC, Melpitz and the Hyytiälä 2003 datasets the differences between the predictive abilities of the proxies are negligible, which indicates that in these data Proxy N2, with only "Radiation" and [SO 2 ], is capable of explaining most of the variation of the sulphuric acid concentration and no further parameters are needed.

Combining campaign datasets
A commonly recognized problem in sulphuric acid measurements is that the measurement procedure is not standardized, which causes variation in the measured values between different campaign datasets (e.g.Paasonen et al., 2010).This variation makes it complex to combine the datasets in order to define a common parameterization for all data.After several tests we found that the data-specific variability can be taken into account in the proxy construction by the addition of a constant.Equation (N5c) reflects this fact with the addition of the constant l i to Eq. (N5), where l i defines specific constant value for each campaign dataset i used in the analysis.
This addition gives common powers a ,b ,c and d for all data and the resulting proxy can be used in prediction for different datasets since in any other data l i = 0. Correlation between observed and predicted values from Proxy N5c in campaign data was 0.87 and Fig. 6 shows that the predictive ability of the combined proxy is excellent, except in data points where "Radiation" or [SO 2 ] values are small, i.e. the proxy value is small.To see if the constructed proxy is able to predict sulphuric acid concentrations in other datasets, we used the proxies calculated with the combined campaign data (i.e.all datasets except the long term Hohenpeissenberg data set) to predict the [H 2 SO 4 ] from Hohenpeissenberg.The correlations between observed and predicted values with best two linear (L1 and L3) and best two nonlinear (N4 and N5) proxies are shown in Table 7. Examples of the nonlinear proxies used in the calculation are given in Eqs. ( 7) and ( 8), which are derived from proxies N4 and N5 respectively.Note that term l i , introduced in Proxy N5c is now zero, since the Hohenpeissenberg data was not used in proxy construction.
where k varied within range 0.8959-1.1740.Proxy N5 was slightly better overall than N4 and was able to predict sulphuric acid concentrations in HPB quite well (Fig. 7), especially with higher values of radiation.Correlation between observed and predicted values was 0.64 for the whole Hohenpeissenberg data set (Table 7), which is almost as good as the prediction made using the individual proxy constructed for HPB.These correlations reached up to 0.78 in spring (March-May) and 0.81 in summer months (June-August).Correlation in winter (December-February) and in autumn (September-November) is worse than in summer and in spring due to high radiation dependency of the www.atmos-chem-phys.net/11/11319/2011/proxy (b = 1.44), which causes an underestimation of sulphuric acid in low radiation seasons.Due to this underestimation and our earlier finding that it is possible to use a fixed value b = 1 for the campaign datasets without drastic weakening of prediction ability of the proxy, a test was made with proxy given in Eq. ( 9), where the dependency from "Radiation" is reduced. [ Surprisingly, as seen in Table 7, this proxy gave the best predictions for almost all datasets, which indicates that the proxy with the lower radiation dependence is more robust for changes in atmospheric conditions.Proxy L1 yielded the best in predictions for the seasonal subsets of winter and spring.This is most probably due to the lower radiation dependence and the higher [SO 2 ] dependence.During the measurement period opposing seasonal changes between [H 2 SO 4 ] and [SO 2 ] were seen, which decreases the underestimation of sulphuric acid in months with high [SO 2 ] and low "Radiation".Notable is the fact that Proxy L3, with only "Radiation" and the square root of [SO 2 ] as predictors, performed almost equally well as the best nonlinear proxies.
In order to estimate the uncertainty of the proxies we calculated the average absolute errors relative to the dependentvariable mean, given by Willmot et al. (2009), where n is the number of observations, y i are the observed [H 2 SO 4 ] values, ŷi are the predicted values given by the proxy and ȳ is the mean of the observed values.The relative errors for the proxies in the final inspection are shown in Table 8.Relative erros for nonlinear proxies given by Eqs.(7-9) were 40 %, 40 % and 42 %, respectively, which can be considered as good values.The relative errors for the linear Proxies L1 and L3 were 59 % and 48 %, respectively, which indicates that the uncertainty related to the linear type proxies is somewhat higher than for nonlinear proxies.

Conclusions
We were able to construct a proximity measure ("proxy") for tropospheric sulphuric acid concentrations using experimental data from multiple observation sites and spanning a wide range of conditions (10 4 < H 2 SO 4 < 4× 10 8 ).The proxy which described the overall data set best is an expression based on global solar radiation, SO 2 concentration, condensation sink and relative humidity.The best predictive proxy was given by • (CS • RH) −0.13 (11) The proxy was additionally validated using long term measurement data from Hohenpeissenberg by comparing the measured sulphuric acid concentrations to predicted values given by the proxy.Note that the data from Hohenpeissenberg was not used in the proxy construction.Tests suggest that this universal proxy is suitable for the prediction of sulphuric acid concentration under a wide range of atmospheric conditions.Note, however, that the proxy has not been tested for e.g.marine, Arctic or desert conditions.Details of the conditions can be seen in Table 2.The correlation between predicted and observed concentrations was 0.66 in Hohenpeissenberg and higher than 0.9 for campaign data recorded in SPC.This proxy is especially useful in studies of new particle formation proxy, since in times when new particles are formed there is enough radiation to ensure that the proxy is accurate; the proxy is constructed for data where Radiation > 10 Wm −2 but the predictive ability is significantly better when Radiation > 50 Wm −2 .The lower predictive ability for the long term data at Hohenpeissenberg indicates that changes in atmospheric conditions caused by the changing of the seasons has to be taken account in the analysis.An additional source of uncertainty in Hohenpeissenberg is the mountain site location of the station, where it is generally above the nocturnal boundary layer and in winter time also occasionally above the daytime boundary layer.
The reason for the better performance of proxies with power b of [SO 2 ] lower than 1 can only be speculated, but the most probable explanation is that the SO 2 concentration acts also as an indicator for pollution or some other parameter involved in the process but not present in our data.This fact indicates it has two roles in the proxy; as production term and as loss term.Surprisingly, it was also shown that it is possible to gain an approximation almost equal to the result of the best proxy with only radiation and SO 2 concentrations, without the use of condensation sink.However, the uncertainty related to the linear type proxies is slightly higher than for the nonlinear proxies.This simple version of the proxy can be written by Note that reaction coefficient k is scaled in the same manner as done for nonlinear proxies.Development of a proxy without a condensation sink term enables its use also for situations when no particle size distribution data is available.

Fig. 1 .
Fig. 1.Locations of the six observation sites for tropospheric sulphuric acid.

Fig. 2 .
Fig. 2. Average diurnal variation of sulphuric acid, SO 2 (left axis) and Radiation (right axis) in local time.

Table 1 .
Measurement places and times of the campaigns.

Table 2 .
Mean, median, 5-95 % percentiles and standard deviation (sd) of 10 min averages of the key variables of the study.Detection limits for [H 2 SO 4 ] and [SO 2 ] measurements are 10 4 molec cm −3 and 0.1 ppb, respectively.In global radiation entries, only times when it was not completely dark are counted.
SO 2 ] starts to drop at same time as [H 2 SO 4 ] rises, which indicates that the air is diluted by the rise of the boundary layer.Note that the measured values of [H 2 SO 4 as the [H 2 SO 4 ] production term of the proxy.Figure 2 illustrates how these terms follow the diurnal variation of [H 2 SO 4 ].In SPC, Melpitz, NWR and Atlanta, [SO 2 ] has a significant diurnal cycle similar to the cycle of [H 2 SO 4 ].In urban areas diurnal cycle of [SO 2 ] is caused by traffic and industry and in NWR it is due to up-slope flow, bringing air from the Denver-Boulder metropolitan area in daytime.In Hyytiälä, [

Table 3 .
Proxy functions for the linear fitting procedure.

Table 4 .
Linear fit with fixed parameters.B is the slope of the proxy, R is the correlation between observed [H 2 SO 4 ] and the predicted values given by the proxy and R 2 is the coefficient of determination calculated from the sums of squares in the linear fit procedure.Best correlations for each site are highlighted.

Table 6 .
Nonlinear fit results.Parameters a − f indicate the powers of proxy functions and R is the correlation between observed [H 2 SO 4 ] and the predicted values given by the proxy.Best correlations for each site are highlighted.

Table 7 .
Correlations of observed sulphuric acid concentrations in different datasets vs. values predicted with the proxy calculated from the combined campaign data.
Observed [H 2 SO 4] in combined campaign data vs. predicted values given by Proxy N5.Diagonal line represents the perfect fit.

Table 8 .
Relative erros of the proxies.

Table A1 .
Values l i (10 6 m −3 ) for the proxies derived with the combined dataset.