Evaluation of the Accuracy of Analysis Tools for Atmospheric New Particle Formation

Several mathematical tools have been developed in recent years to analyze new particle formation rates and to estimate nucleation rates and mechanisms at sub-3 nm sizes from atmospheric aerosol data. Here we evaluate these analysis tools using 1239 numerical nucleation events for which the nucleation mechanism and formation rates were known exactly. The accuracy of the estimates of particle formation rate at 3 nm (J 3) showed significant sensitivity to the details of the analysis, i.e. form of equations used and assumptions made about the initial size of nucleating clusters, with the fraction of events within a factor-of-two accuracy ranging from 43–97%. In general, the estimates of the actual nu-cleation rate at 1.5 nm (J 1.5) were less accurate, and even the most accurate analysis setup estimated only 59% of the events within a factor of two of the simulated mean nucle-ation rate. The J 1.5 estimates were deteriorated mainly by the size dependence of the cluster growth rate below 3 nm, which the analysis tools do not take into account, but also by possible erroneous assumptions about the initial cluster size. The poor estimates of J 1.5 can lead to large uncertainties in the nucleation prefactors (i.e. constant P in nucleation equation J 1.5 = P × [H 2 SO 4 ] k). Large uncertainties were found also in the procedures that are used to determine the nucle-ation mechanism. When applied to individual events, the analysis tools clearly overestimated the number of H 2 SO 4 molecules in a critical cluster for most events, and thus associated them with a wrong nucleation mechanism. However, in some conditions the number of H 2 SO 4 molecules in a critical cluster was underestimated. This indicates that analysis of field data that implies a maximum of 2 H 2 SO 4 molecules in a cluster does not automatically rule out a higher number of molecules in the actual nucleating cluster. Our analysis also suggests that combining data from several new particle formation events to scatter plots of H 2 SO 4 vs formation rates (J 1.5 or J 3) and determining the slope of the regression line may not give reliable information about the nucleation mechanism. Overall, while the analysis tools for new particle formation are useful for getting order-of-magnitude estimates of parameters related to atmospheric nucleation, one should be very cautious in interpreting the results. It is, for example, possible that …


Introduction
Recent ion cluster measurements have indicated that atmospheric new particle formation via nucleation initiates at a cluster size of ∼1.5 nm in diameter (Manninen et al., 2009).However, the majority of instruments measuring the size distribution of neutrally charged atmospheric aerosol can currently detect only particles larger than 3 nm.This limitation severely complicates the analysis of the first steps of new particle formation since an accurate quantification of nucleation rates at the initial cluster size and their dependence on the nucleating compounds would be crucially important for identifying the atmospheric nucleation mechanism(s).
Motivated by this, previous studies have developed a set of analysis tools to estimate the actual nucleation rate (J 1.5 ) based on the measured size distribution and gas phase data.The foundation of these tools, originally presented in Fiedler et al. (2005) and Sihto et al. (2006), lies in the observation that the diurnal profiles of sulphuric acid (H 2 SO 4 ) concentration and nucleation mode particle concentration follow each other closely with a typical time shift of 0-4 h (Sihto et al., 2006;Riipinen et al., 2007;Kuang et al., 2008).Since H 2 SO 4 is currently thought to be the key nucleating vapour, this time delay has been assumed as the time it takes for a cluster formed at 1-1.5 nm to grow to the detectable size of 3 nm.This assumption makes it possible to estimate the cluster growth rate below 3 nm and, together with information about the coagulation scavenging of the clusters to background particles, it can be used to estimate the fraction of formed clusters that survive to the detectable sizes (Kerminen and Kulmala, 2002;Lehtinen et al., 2007).This information is in turn used to extrapolate the actual nucleation rate at 1.5 nm (J 1.5 ) from the measured particle formation rate at 3 nm (J 3 ) (Kerminen and Kulmala, 2002).
The J 1.5 estimate has been used to provide information about the atmospheric nucleation mechanism.Based to the nucleation theorem, the exponent k in the equation is often interpreted as the number of vapour C molecules in the nucleating cluster (Oxtoby and Kashchiev, 1994).In the analysis of field measurements, the exponent linking J 1.5 and [H 2 SO 4 ] is typically found to be between 1 and 2 (Weber et al., 1996;Sihto et al., 2006;Riipinen et al., 2007;Kuang et al., 2008).
In this study, we test the validity of these commonly used nucleation event analysis tools and their ability to identify the correct nucleation mechanism by applying them to output from aerosol microphysics model simulations.In these simulations the nucleation mechanism as well as nucleation and new particle formation rates (J 1.5 and J 3 , respectively) are known, and thus the predictions of the analysis tools can be directly evaluated.

Aerosol microphysics model
We used an aerosol microphysics box model to simulate new particle formation in a variety of atmospheric conditions.A fully moving sectional grid described the evolution of the particle size distribution through nucleation, condensation and coagulation.The pre-existing particle population at the beginning of the simulation was described with 100 sections, and a new section was created for the newly nucleated particles of diameter 1.5 nm at every nucleation time step (60 s).Since the new particle formation rate deviated from zero for 8 h during each run, the number of size sections at the end of simulation was 580.
The microphysical subroutines for condensation and coagulation were based on those in previously published UHMA model (Korhonen et al., 2004), which has been successfully used in studies of new particle formation (Grini et al., 2005;Tunved et al., 2006;Komppula et al., 2006;Vuollekoski et al., 2009;Sihto et al., 2009).To capture the growth of sub-3 nm particles accurately, condensation and coagulation were solved with a time step of 10 s when particles smaller than 4 nm in diameter were present; otherwise the microphysical time step was 60 s (same as nucleation time step).These comparatively long time steps were chosen to balance the accuracy and computation time of the model, the latter of which is in a box model framework determined mainly by the number of size sections and the length of the time step in the coagulation routine.Comparison to sensitivity simulations that used shorter time steps (10 s for all aerosol processes; or a 30-s nucleation time step with a 5-s microphysics time step) indicated that the chosen time steps do not lead to significant inaccuracy and that the simulated J 3 values are very close to the accurate solution.
Table 1 presents the parameters that were varied in the model simulations.We simulated four sulphuric acid nucleation mechanisms, i.e. (2) where A, K, T and Q are constant prefactors called nucleation coefficients.All four mechanisms were simulated with five different nucleation coefficients whose values covered two orders of magnitude (Table 1).For the first two mechanisms, which are often called activation and kinetic nucleation, the chosen ranges of nucleation coefficients are consistent with the reported values from field measurements (Riipinen et al., 2007;Kuang et al., 2008).
The concentration profile of the nucleating vapour H 2 SO 4 was a down-facing parabola peaking at noon and departing from zero from 08:00 a.m. to 04:00 p.m. Another condensing vapour, a non-specified organic compound, had either a constant concentration profile throughout the simulation, or showed parabolic time behaviour with the same constraints as described above for H 2 SO 4 .The peak concentrations of both of these vapours were varied over approximately one order of magnitude.Whereas H 2 SO 4 was assumed totally non-volatile in all simulations, the organic vapour was given a saturation pressure in some of the model runs.All the simulations were carried out for three pre-existing aerosol distributions.
Altogether, this resulted in 3240 simulations.However, to ensure that the simulated events were strong enough to form a distinct nucleation mode, events for which J 3 did not reach the value 1 cm −3 s −1 at any point of the model run were excluded from further analysis.Furthermore, we excluded all events for which J 3 exceeded 100 cm −3 s −1 .This is because such high new particle formation rates have never been observed during regional nucleation episodes (Kulmala et al.,  H 2 SO 4 concentration at noon (cm −3 ) 4×10 6 8×10 6 1.6×10 7 Organic vapour concentration profile constant parabola concentration at noon (cm −3 ) 2×10 6 10 7 5×10 7 saturation pressure (cm −3 ) 0 10 5 10 6 Pre-existing condensation sink (s −1 ) In Eq. ( 2), P corresponds to A and k = 1.In Eq. ( 3), P corresponds to K in and k = 2.In Eq. ( 3), P corresponds to T in and k = 3.In Eq. ( 3), P corresponds to Q in and k = 4.

2004
).After applying these two criteria, 1464 events were left for further analysis.
In each simulation, the nucleation rate (J 1.5 ) was obtained from one of Eqs.(2-5).New particle formation rate (J 3 ) was calculated at each microphysics time step as the sum of rates at which particles grew over the 3 nm threshold diameter due to coagulation and condensation.Of these two processes, coagulation was solved first.
The modelled size distribution, vapour concentrations as well as J 1.5 and J 3 values (both instantaneous and 10 min averages) were outputted every 10 min.In order to evaluate the analysis tools in conditions that resemble as much as possible atmospheric size distribution measurements, the size distribution in the range of 2.8-556 nm was regridded to 32 channels corresponding to the Differential Mobility Particle Sizer (DMPS) instrument at Hyytiälä measurement station in Southern Finland.This regridded data is hereafter referred to as DMPS-gridded distribution and it is the size distribution data used as input in the analysis below.Figure 1a shows an example of a DMPS-gridded distribution from one model run.It is worth noting that while the simulated event resembles measured atmospheric events closely in most respects, the modelled data is much smoother and lacks noise that is present in typical atmospheric data due to instrumentation and inhomogeneities in the measured air mass.The smoothness of the modelled data is evident also in Fig. 1b which presents the simulated nucleation and particle formation rates together with the scaled concentration of 3-6 nm particles (N 3−6 ).Note that while the modelled N 3−6 is used as an input in the analysis described below, the simulated J 1.5 and J 3 are used only for comparison with the respective predicted values.

Baseline analysis of modelled events
Each simulated new particle formation event was analysed with the procedure commonly used to quantify nucleation rates and mechanisms from atmospheric measurement data.The baseline analysis follows for the most parts the methods outlined in Sihto et al. (2006), in addition to which we performed several sensitivity tests detailed in Sect.2.3.The baseline analysis consisted of the following 5 steps: 1.The time delay t N 3−6 was determined from the time shift between the N 3−6 (number concentration of particles in the diameter range 3-6 nm) and [H 2 SO 4 ] b curves (0.1 ≤ b ≤ 10).It was obtained by a fit searching a combination of the time delay and exponent b that maximized the correlation coefficient between the curves N 3−6 and [H 2 SO 4 ] b .The fitting procedure is illustrated in Fig. 1c, which depicts the simulated H 2 SO 4 (blue line) and N 3−6 (red line) concentrations.In this specific case, when the H 2 SO 4 curve is delayed by 60 minutes and raised to the power 2.31 (black dashed line), it is evident that it correlates very closely with the simulated N 3−6 .In the baseline analysis, the fitting was done over the whole time period when N 3−6 was clearly above zero.The obtained time delay is interpreted as the time it takes for the newly formed clusters to grow to the detectable size of 3 nm.
2. The analysed particle formation rate at 3 nm (J 3 ) was calculated from the DMPS-gridded distribution using the balance equation Here Coag 4 is the coagulation sink of 4 nm particles and was calculated from the simulated particle size  distribution.The time derivative of N 3−6 was obtained by fitting a parabola to the simulated N 3−6 and by differentiating the obtained parabolic function.This approach is beneficial especially in the case of noisy field measurement data as it smoothes fluctuations in the N 3−6 data and thus leads to a more stable derivative.The growth rate of 6 nm particles, GR 6 , was assumed to be the same as that of newly formed clusters in the 1.5 to 3 nm size range.This growth rate can be estimated using equation where t N 3−6 is the time delay determined in step 1.
3. The analysed nucleation rate at 1.5 nm (J 1.5 ) was estimated from the analytical formula (Kerminen and Kulmala, 2002) where CS is the condensation sink (in units m −2 ) and ϒ is a coefficient with value 0.23 m 2 nm 2 h −1 .Here GR was again calculated using Eq. ( 7).
Note that Sihto et al. (2006) assumed, in accordance with the theoretical understanding of the time, that nucleation initiates at 1 nm and thus calculated J 1 values.However, improvements in measurement techniques in recent years have indicated that the likely diameter of critical clusters is ∼1.5 nm and therefore this value is used in the current study.
4. The best fit exponent b was calculated by determining the highest correlation coefficient between the modelled [H 2 SO 4 ] b (0.1 ≤ b ≤ 10) and modelled N 3−6 or analysed J 1.5 (from Eq. 8).Note that for N 3−6 the best fit exponent was determined simultaneously with time delay t N 3−6 (see step 1 and Fig. 1c).Based on the nucleation theorem, this best fit exponent is often interpreted as the number of H 2 SO 4 molecules in a critical cluster.
5. The nucleation coefficients A and K for activation and kinetic type nucleation (as shown in Eqs. 2 and 3), respectively, were determined by a least square fit between the analysed J 1.5 given by Eq. ( 8) and modelled H 2 SO 4 concentration to the power of 1 or 2. To doublecheck the obtained results, the same fitting for nucleation coefficients was done also for J 3 .Here the J 3 estimated from sulphuric acid concentration (using Eq. ( 8) in the reverse direction) was optimized against J 3 obtained from DMPS-gridded data (Eq.6).The A and K coefficient estimates from these two fits were typically almost identical and their mean value was taken as the nucleation coefficient presented below.
Note that the coefficients A and K were both fitted for all events irrespective of the simulated nucleation mechanism.This is because such fitting has been previously done for atmospheric data (Sihto et al., 2006;Riipinen et al., 2007;Kuang et al., 2008) without exact information about the nucleation mechanism.We will investigate both (a) how accurately the analysis predicts the coefficients when the assumption about the nucleation mechanism is correct, and (b) whether the correctness of the nucleation mechanism assumption affects the range of A and K values obtained from the fitting.

Sensitivity tests
The analysis tools outlined in Sect.2.2 follow the procedure presented in Sihto et al. (2006).However, some of the other previous analyses of atmospheric new particle formation events have used slightly modified versions of these tools, and therefore their results may not be directly comparable to each other.For example, Kuang et al. (2008) calculated the time delay used in Eq. ( 7) by fitting only over the duration of the nucleation event (i.e. the increasing part of N 3−6 curve) and concluded that their results were very sensitive to the length of the fitting time interval.Furthermore, they used slightly different versions of Eqs. ( 6) and ( 8) to calculate the new particle formation rate and actual nucleation rate.Riipinen et al. (2007), on the other hand, obtained the growth rate of 6 nm particles (GR 6 ) from lognormal fits to the DMPS data in the size range of 3-7 nm, instead of using the growth rate of 1 to 3 nm particles.
To test the sensitivity of the results to the assumptions of the procedure, the modelled events were reanalysed using the following three set-ups: 1. Set-up t short tests how much the length of the interval over which t N 3−6 is fitted affects the analysed results.We recalculated t N 3−6 using two other definitions of fitting periods, i.e. fitting from the start of the event until one hour ( t short 1h ) or two hours ( t short 2h ) after the maximum N 3−6 concentration was reached.Apart from the fitting interval, this set-up followed the procedure described in Sect.2.2.
2. Set-up d crit tests how sensitive the analysis is to knowing the exact size of the nucleating cluster.Previous analyses of field data have often assumed a 1 nm diameter for the critical cluster, whereas the most recent atmospheric measurements suggest a roughly 1.5 nm size.An incorrect assumption of the initial size affects the cluster growth rate calculation (Eq.7) as well as the exponent term in Eq. ( 8).The analysis was repeated for two assumptions of the cluster size: 1 nm (d crit = 1 nm) and 2 nm (d crit = 2 nm).Note that the analysed model events were the same as in all the other set-ups (i.e.nucleation initiated at 1.5 nm size) and that in all other respects the set-up followed the procedure outlined in Sect.2.2.
3. Set-up Kuang tests how sensitive the analysis is to the exact formulation of equations predicting J 3 and J 1.5 .
In this set-up, we used the formulations suggested by Kuang et al. (2008) (instead of Eqs. 6 and 8), i.e. and Here k b is the Boltzmann constant, T temperature, ρ aerosol particle density and A Fuchs is the Fuchs surface area calculated from where c is the monomer mean thermal speed and D the vapour diffusivity.In all other respects, including the calculation of time delay t N 3−6 , this set-up followed the procedure described in Sect.2.2.Therefore it is important to note that this set-up does not strictly follow that of Kuang et al. (2008) since we calculate the time delay t N 3−6 over the whole peak of N 3−6 whereas they calculated it only over the ascending part of N 3−6 .
The performance of the set-ups was measured by calculating (1) the fraction of analysed events for which the estimated quantity is not within a factor of two of the accurate simulated value (approximate measure of the relative accuracy of the set-ups), (2) the normalised mean absolute error and ( 3) the normalised mean bias where A i is the analysed value and S i is the actual simulated value in case i.We use NMAE as a measure of the absolute accuracy of the set-ups and NMB as an indicator of low or high bias (i.e.overall under-or overestimation).
Fig. 2.An example of a simulated activation nucleation event in which N 3−6 peaks earlier in the day than H 2 SO 4 and thus the analysis yields a negative time delay t N 3−6 .Also shown are the simulated nucleation and new particle formation rates (J 1.5 and J 3 solid lines) as well as the estimates obtained using a cluster growth rate from lognormal fits to the 3-7 nm size range (dashed lines).

Time delay t N 3−6 and cluster growth rate
The cluster growth rate (Eq.7) was calculated from the time delay between N 3−6 and [H 2 SO 4 ] b profiles.This approach assumes that N 3−6 follows [H 2 SO 4 ] b with a time shift t N 3−6 , which is the case if the growth from initial nucleation size to 3 nm were dominated by condensation with a constant growth rate and if the coagulation sink of the clusters remained fairly constant for the duration of the event.
However, our aerosol model simulations indicate that the time delay approach can be problematic in the case of strong particle formation events that produce a high concentration of nucleation mode particles.This is because the nucleation mode (i.e.first formed clusters that have grown to detectable sizes above 3 nm) can act as a significant additional coagulation sink for the small clusters that form later during the event and thus prevent their growth to 3 nm.As a result, the N 3−6 peak can be skewed to earlier in the day than in a case of purely condensation controlled formation of N 3−6 , and can in some cases occur at the same time or before the H 2 SO 4 peak.
Figure 2 depicts one such case for activation nucleation.The H 2 SO 4 concentration, and thus the nucleation rate J 1.5 , peak at noon (red solid line).The initial increase in N 3−6 (blue solid line) starts about 20 min after the increase in H 2 SO 4 ; however, due to the additional coagulation sink from the growing nucleation mode, N 3−6 peaks about 35 min before H 2 SO 4 .When fitting over the whole N 3−6 peak (i.e.roughly 08:30 a.m. to 05:00 p.m.), an optimum fit between N 3−6 and [H 2 SO 4 ] b is now obtained with a negative time delay.
All in all, the analysis yielded a zero or negative time delay for 15.3% of the 1464 analysed events.For these events the growth rate of the clusters could not be estimated using Eq. ( 7).For the case depicted in Fig. 2, we tried approximating the cluster growth rate with that of the nucleation mode in the detectable size region.This growth rate was obtained by fitting lognormal modes to the DMPS-gridded data in the size range of 3-7 nm (Riipinen et al., 2007).Figure 2 shows this approach was not able to predict the timing or the magnitude of J 3 and J 1.5 curves correctly (black and red dashed lines, respectively).This is because during strong particle formation events self-coagulation can significantly increase the growth rate of clusters smaller than 3 nm, while this effect is much weaker for larger nucleation mode particles.Therefore, using the growth rate of 3-7 nm particles underestimates the growth rate of sub-3 nm clusters, which can be seen from the later appearance of the J 1.5 estimate peak compared to the actual J 1.5 .The underestimated cluster growth rate explains also the overestimation of the analysed J 1.5 peak value.The slower the clusters grow, the larger fraction of them is scavenged by coagulation before reaching the detectable size range.Thus when the growth rate is underestimated, Eq. ( 8) overcorrects for the coagulation loss and yields too high an estimate for J 1.5 .
Since the cluster growth rate could not be reliably established for events for which the time delay t N 3−6 was zero or negative, we excluded these events from further analysis.As a result, the final analysis below consists of 1239 simulated events, out of which 289 are based on the nucleation mechanism represented by Eq. (2) (activation nucleation), 362 on that by Eq. (3) (kinetic nucleation), 334 on that by Eq. ( 4), and 254 on that by Eq. ( 5).Note that this set of events may still include cases in which coagulation of the clusters to the growing nucleation mode skews the N 3−6 curve as long as the time delay remains positive.In these cases the time delay is underestimated and the growth rate calculated from it is an overestimate of the simulated growth rate.
Following Sihto et al. (2006), we made the time delay fitting over the whole N 3−6 peak.However, Kuang et al. (2008) found that their analysis of atmospheric new particle formation events was highly sensitive to the time period over which the time delay was fitted.Therefore, we repeated the fitting procedure for two other fitting periods: until one hour or two hours after the maximum N 3−6 concentration (set-ups t short 1h and t short 2h , respectively).The baseline analysis and set-up t short 2h gave the same time delay in 67.2% of the 1239 analysed cases.In all other cases apart from 18 events, the baseline analysis gave a longer time delay (maximum difference 30 min when using 10 min increments) and thus predicted a slower growth rate than the sensitivity setup.On the other hand, out of the 18 events when the baseline line analysis gave a shorter time delay, the difference in the predicted time delays was over 30 min in 5 cases.Further shortening the fitting period to one hour after the maximum N 3−6 concentration reduced the percentage of identical time delays to 34.4%.For the non-identical events, the baseline analysis gave again longer time delays apart from 25 cases.However, even now the absolute difference from the baseline analysis was ≤30 min in all but 39 cases (maximum difference 3 h 10 min).
It should be noted that even relatively small changes in time delay can lead to large changes in growth rate and thus deteriorate the predictions of J 1.5 and J 3 .Unfortunately, it is impossible to give a general recommendation on the optimal length of the fitting period.A comparison of the actual simulated mean growth rates to those from the time delay analysis in 67 activation nucleation cases revealed that any of the three fitting periods (baseline, t short 1h or t short 2h ) can give the most accurate, or alternatively a clearly inaccurate, growth rate estimate depending on the simulation conditions.Overall, however, the shortest fitting period ( t short 1h ) gave worse growth rate estimates than the other two periods.Furthermore, the time delay between J 3 and H 2 SO 4 curves ( t J 3 ) should not be used to estimate the cluster growth rate as it systematically overestimates the growth.

Nucleation and new particle formation rates,
J 1.5 and J 3 Next, we tested how well Eqs.( 6) and ( 8) capture the simulated event mean values of new particle formation (J 3 ) and nucleation rates (J 1.5 ), respectively.Figure 3a shows that the predictions of J 3 are fairly accurate with 81.8% of all events within a factor-of-two margin of the accurate value in the baseline analysis.There is, however, a tendency to overestimate the mean formation rate J 3 , especially at the high end of the particle formation rates.Analysing one simulated event in detail, Vuollekoski et al. (2010) concluded that the single most significant factor deteriorating the prediction of J 3 is the poor approximation of the size distribution function at 6 nm in the last right-hand term of Eq. ( 6), i.e.
Following the suggestion of Vuollekoski et al. (2010), we reanalysed the new particle formation rates replacing Eq. ( 14) with and thus using for the particle formation rate the equation where N 5−7 is the number concentration of particles in the diameter range 5-7 nm.This formulation improves our predictions of mean J 3 significantly with only 2.8% of events not falling within a factor of 2 of accurate values (compared to 18.2% in the baseline analysis, Table 2).We therefore recommend using Eq. ( 16) over Eq. ( 6) in all future analyses of new particle formation; however, to be consistent with previous analyses of field data (Sihto et al., 2006;Riipinen et al., 2007), we continue to use Eq. ( 6) throughout the remainder of this study.
As could be expected, the mean nucleation rate (J 1.5 ) is predicted less accurately than J 3 (Fig. 3b) with 40.8% of the events falling outside a factor-of-two margin of the simulated rate in the baseline analysis.Furthermore, the nucleation rate is underestimated by over an order of magnitude in 77 cases (6.2% of all events).Note that the largest discrepancies in J 1.5 are underestimates, while the opposite is true for J 3 .Therefore, improvements in the prediction of J 3 are likely to deteriorate the overall J 1.5 prediction using Eq.(8).For example, the use of Eq. ( 16), which improves the J 3 analysis, increases the fraction of J 1.5 values outside a factor of 2 range from 40.8% to 46.2% (Table 2).
The reason for the poorer prediction capability of J 1.5 lies in the built-in assumptions of Eq. ( 8).It is assumed that (1) intramodal coagulation in the nucleation mode is negligible, and (2) growth rate between 1.5 and 3 nm is constant.The former has been found a good assumption as long as J 1.5 /Q < 10 −2 , where Q is the formation rate of condensable vapours (Anttila et al., 2010).In our simulations this corresponds roughly to cases in which J 1.5 is less than 10 2 -10 3 cm −3 s −1 .Neglecting self-coagulation in Eq. ( 8) leads in theory to underestimation of J 1.5 , which is consistent with the results in Fig. 3b at high nucleation rates when the effect should be the strongest.Note, however, that the majority of the very strong nucleation events were excluded from the analysis in Sect.2.1 due to unrealistically high J 3 values and in Sect.3.1 due to negative time delays.
On the other hand, the assumption of a constant growth rate in the size range 1.5-3 nm is never strictly true.For nonvolatile vapours such as H 2 SO 4 , molecular effects lead to an enhancement of condensation flux in the smallest particle sizes (Lehtinen and Kulmala, 2003;Sihto et al., 2009;Nieminen et al., 2010).For vapours whose saturation pressure deviates from zero (such as the organic vapour in most of our simulations), the Kelvin effect works in the opposite direction and decreases the growth rate of the smallest clusters.Furthermore, in our simulations the condensing vapour concentration is not constant, but H 2 SO 4 has a parabolic time profile in all and the organic vapour in half of the simulations.These factors lead to a significant deviation from the constant growth rate assumption.Since the coagulation loss rate of the formed clusters is strongly dependent on their size, lowered growth rate right after their formation leads to faster scavenging and thus to a smaller fraction of clusters that survive to the detectable size, and vice versa.Note also that while we simulate only sulphuric acid and one condensing organic compound, in the atmosphere there may be several others (e.g., amines, several organic compounds with different properties) contributing to the early stages of cluster www.atmos-chem-phys.net/11/3051/2011/Atmos.Chem.Phys., 11, 3051-3066, 2011 Table 2. Performance metrics for the different analysis set-ups when estimating the mean new particle formation (J 3 ) and actual nucleation rates (J nuc ).The columns show the percentage of analysed events for which the estimate is not within a factor of two of the simulated rate (>factor 2), the normalised mean absolute error (NMAE) and the normalised mean bias (NMB).Note that in sensitivity set-ups d crit = 1 nm and d crit = 2 nm the analysis tool calculates J 1 and J 2 , respectively, and these values are compared to the simulated J 1.5 .
J growth (e.g., Smith et al., 2010).Their combined effect could cause even a stronger deviation from the constant growth rate assumption than simulated in this study.
Table 2 summarises the performance of the sensitivity tests.All but the Kuang set-up give fairly large positive normalized mean bias (NMB) values for J 3 , i.e. generally overestimate the mean new particle formation rate.Set-up Kuang gives clearly lower normalised mean absolute error (NMAE) and NMB values (55.3% and −20.4%, respectively) compared to the baseline analysis (68.5% and 66.4%, respectively) but performs the worst out of all the set-ups in terms of events that are predicted within factor of 2 accuracy (56.7% of cases not meeting this criterion).This apparent discrep-ancy is due to the fact that the set-up underpredicts especially the lowest formation rates (<2 cm −3 s −1 ) for which the absolute difference in analysed and simulated values (which is used to calculate NMAE and NMB) is very small.Shortening the fitting time window (set-ups t short 2h and t short 1h ) deteriorates the accuracy of the results, especially in terms of absolute error and bias.On the other hand, the assumption of the critical cluster size has an even larger effect.Assuming a too small initial cluster size (set-up d crit = 1 nm) clearly deteriorates and a too large cluster size (set-up d crit = 2 nm) clearly improves the estimate.This is because the baseline set-up tends to overestimate J 3 and thus sensitivity set-ups, such as set-up d crit = 2 nm, that underestimate the growth rate (and thus the last term of Eq. 6) lead to more accurate prediction, and vice versa.
The actual nucleation rate J 1.5 is captured most accurately in the baseline analysis and set-up t short 2h (Table 2).Further shortening the fitting time window (set-up t short 1h ) or using Eq. ( 16) instead of Eq. ( 6) to calculate J 3 slightly increase both the absolute and relative errors.On the other hand, the other set-ups perform clearly poorer especially in terms of events that are captured within a factor-of-2 accuracy.Note that the incorrect assumption that nucleation initiates at 1 nm size (set-up d crit = 1 nm) leads generally to overestimation (i.e.positive NMB) of mean nucleation rate (in this sensitivity case assumed to be J 1 instead of J 1.5 ), while all the other set-ups tend to underestimate the actual nucleation rate.This is because set-up d crit = 1 nm overestimates the size range that the cluster needs to grow to become detectable and thus overestimates the scavenging of sub-3 nm particles.As a result, Eq. ( 8) overcorrects for the coagulation loss and thus leads to an overestimation of the nucleation rate.

Nucleation mechanism
Previous analyses of field data have used the method of least squares or calculated correlation coefficients between Sihto et al., 2006;Riipinen et al., 2007) or Kuang et al., 2008;Riipinen et al., 2007), and interpreted the exponent b giving the best fit as the number of sulphuric acid molecules in the critical cluster.Therefore, for example exponents falling close to 1 or 2 have been taken as evidence for activation and kinetic nucleation, respectively.Here we test the approach separately for the four simulated nucleation mechanisms.
Figure 4 shows the frequency distribution of the best fit exponents that were obtained in the baseline analysis by calculating the highest correlation coefficient between N 3−6 and [H 2 SO 4 ] b profiles (0.1 ≤ b ≤ 10).It is evident that for the majority of the events the analysis yields exponents that are clearly higher than the number of H 2 SO 4 molecules in the critical cluster.Depending on the nucleation mechanism, only in 17.3-25.1% of the events the predicted exponent falls into the roughly correct range (defined here as k ±0.5, where k is the simulated nucleation exponent) (Table 3).On the other hand, in 58.7-82.7% of cases the exponent is overestimated.This result is consistent with the modelling study of Sihto et al. (2009) which found that the size dependence of the sub-3 nm particle growth rate often skews the best fit exponent for N 3−6 ∼ [H 2 SO 4 ] b high.Shortening the period over which the time delay is calculated (set-up t short ) shifts the predicted exponents to even higher values and thus deteriorates the analysis results (Table 3).
Figure 5 shows the frequency distribution for the best exponent fit between analysed J 1.5 (from Eq. 8) and simulated [H 2 SO 4 ] b profiles (0.1 ≤ b ≤ 10) in the baseline analysis.Again, the analysis tends to overestimate the nucleation exponent, and places only 19.1-33.2% of the events in the correct exponent range.Now, however, also the fraction of underestimated exponents is significant at 10.7-41.3%(Table 4).Overall, the results are not very sensitive to the length of the fitting period or the assumption of the initial cluster size (Table 4).However, using the analysis equations in setup Kuang (i.e.Eqs. 9 and 10 instead of Eqs. 6 and 8) shifts the distribution of best fit exponents to significantly larger values.Using this set-up, 56.3-82.4% of the cases are overestimated and the fraction of events for which the exponent is predicted correctly either decreases or increases depending on the nucleation mechanism (Table 4).Note that our set-up Kuang differs from the baseline analysis only with respect to the equations used to calculate J 3 and J 1.5 .Therefore, the higher nucleation exponents found in Kuang et al. (2008) compared to some other analyses (Sihto et al., 2006;Riipinen et al., 2007) are likely to be partly due to the different analysis equations used and not only the chosen fitting period.
Several points are worth noting: First, fitting J 1.5 ∼ [H 2 SO 4 ] b gives overall more accurate results than N 3−6 ∼ [H 2 SO 4 ] b despite the fact that J 1.5 is estimated using Eq. ( 8), which has several potential error sources, whereas N 3−6 is obtained directly from measurement data.Second, some previous studies have classified events based on the correlation coefficients of N 3−6 ∼ [H 2 SO 4 ] and so that larger coefficient for the former is interpreted as activation nucleation and for the latter kinetic nucleation (Sihto et al., 2006;Riipinen et al., 2007).If this classification were applied to the events analysed here using N 3−6 , 82.7% of the activation events would be classified kinetic.Using J 1.5 , on the other hand, would classify 56.1% of activation events as kinetic and 19.1% of kinetic events as activation.Third, Tables 3 and 4 show that under some conditions the best fit correlation exponent gives too low a number of molecules in the critical cluster.Therefore, field data that typically shows correlation exponents in the range 1-2 do not automatically rule out more than two sulphuric acid molecules in a critical cluster.
In this study, we followed the procedure of Sihto et al. (2006) and determined the best fit exponents b based on the highest correlation coefficient.In some of the analysed cases several exponent values gave very similar correlation coefficients, thus complicating the determination of the best fit.In their modelling study, Sihto et al. (2009) attributed this to the smoothness of the simulated curves.Figure 6, which illustrates three nucleation events each simulated using nucleation mechanism J 1.5 = Q× [H 2 SO 4 ] 4 (Eq.5), shows however that the flat peak of a correlation coefficient curve is typically a problem only in cases for which the best fit exponent is significantly overestimated (blue line), whereas in cases that are classified correctly (red line) or underestimated (black line) the curve has a distinct peak.Furthermore, even in the case of the flat curve (blue line) the correct exponent, The accuracy is given as percentage (%) of analysed events in each of the following three classes: events for which the analysis predicts roughly the correct nucleation mechanism (k − 0.5 ≤ b ≤ k + 0.5, where k is the nucleation exponent in the simulation and b is the best fit exponent from the analysis); events for which the exponent is clearly underestimated (b < k − 0.5); and events for which the exponent is clearly overestimated (b > k + 0.5).i.e. b = 4, has a clearly lower correlation coefficient than the curve maximum.
Since the correlation method does not actually minimise the difference between the curves being fitted, we recalculated the time shift t N 3−6 and best fit exponents applying the method of least-squares.With this method, we minimised the difference between the N 3−6 and [H 2 SO 4 ] b curves with respect to the exponent b and time delay t N 3−6 , and between the J 1.5 and [H 2 SO 4 ] b curves with respect to the exponent b.The results obtained for the best fit exponents were very similar to those using the correlation method (not shown), and therefore we do not expect the chosen fitting method to affect the conclusions of this study.
In addition to examining individual new particle formation events, previous studies have searched for indications of the nucleation mechanism by plotting several events in a logarithmic plot of H 2 SO 4 versus J 1.5 or of H 2 SO 4 versus J 3 (Sihto et al., 2006;Riipinen et al., 2007;Kuang et al., 2008).The slope of the regression line drawn to such plot has been thought to give the number of H 2 SO 4 molecules in the critical cluster.
For the modelled data, we find that the obtained slope is very sensitive to the subset of events plotted.However, typical features for consistently selected subsets from the four nucleation mechanisms are that (1) the slope increases with Table 4. Accuracy of best fit exponent b calculations when correlating J 1.5 ∼ [H 2 SO 4 ] b .The accuracy is given as percentage (%) of analysed events in the same three classes as in Table 3. the number of H 2 SO 4 molecules in the simulated critical cluster, and (2) the slope may correspond quite closely to the simulated cluster molecule number for one or two of the mechanisms, but not for all four.As an example, Fig. 7 shows the H 2 SO 4 versus J 1.5 plots separately for the four nucleation mechanisms but only for events that were simulated using the middle value of the five nucleation coefficients (Table 1) and assuming a non-volatile organic compound.While the obtained slope represents well the number of H 2 SO 4 molecules in the critical cluster in the case of activation nucleation (slope 1.1 versus 1 simulated molecule), for all the other nucleation mechanisms the slope clearly underestimates the critical cluster size (slope 1.6 versus 2 simulated molecules, 2.1 versus 3, and 2.6 versus 4).On the other hand, taking into account only events with the same nucleation coefficient but assuming that the organic saturation pressure is 10 5 cm −3 , gives slopes 2.9, 3.4, 3.6 and 4.1 for the four mechanisms, respectively.Furthermore, calculating the slope for all events of a certain nucleation type gives slopes 1.4, 1.9, 2.2 and 2.6, respectively.
www.atmos-chem-phys.net/11/3051/2011/Atmos.Chem.Phys., 11, 3051-3066, 2011 It is possible that the slope analysis using measured field data is not as sensitive to the selection of the subset of events as the analysis of modelled data.This is because at a given location it is likely that many of the environmental conditions, such as the condensing organic vapour properties (e.g., saturation pressure) and approximate level of background condensation sink, are relatively constant during nucleation event days.Furthermore, the fact that the modelled sulphuric acid concentration follows one of three prescribed parabolas limits the scatter of H 2 SO 4 in model-based plots such as Fig. 7 (resulting in vertical stripes), which may affect the slope from the modelled data.Despite these differences between the field and modelled data, our analysis suggests that the slopes from H 2 SO 4 versus J 1.5 or of H 2 SO 4 versus J 3 plots should be interpreted with caution also in the case of field data.

Nucleation coefficients A and K
Finally, Fig. 8 compares the simulated nucleation coefficients A and K for activation and kinetic type nucleation (Eqs. 2 and 3) to the coefficients obtained by determining the best fit between analysed J 1.5 and simulated [H 2 SO 4 ] or [H 2 SO 4 ] 2 concentration profiles.In this figure the events are classified to activation and kinetic types according to the simulated (i.e.known) nucleation mechanism and not based on the classification given by the analysis (see Sect. 3.3).
For activation nucleation (Fig. 8a), the analysis estimates the coefficient A within a factor of 2 from the correct simulated value in 72.3% of the cases.Coefficient K for kinetic nucleation is analysed less accurately with only 55.5% of the events within a factor of 2 (Fig. 8b).On the other hand, the coefficients are off by more than an order of magnitude in 4.8% of activation and 8.0% of kinetic events.The largest discrepancies are seen for the highest nucleation coefficients.As expected, these results follow closely those of analysed J 1.5 (Sect.3.2) that they were calculated from.The most accurate results are given by the baseline analysis and set-up t short , although the NMAE and NMB values for set-up t short 1h are deteriorated by 6 events whose absolute A value is greatly overestimated (Table 5).The other three set-ups give clearly poorer estimates, especially in terms of relative error, i.e. events outside a factor of 2 from the actual simulated nucleation coefficient.Apart from estimation of A coefficient with set-up t short 1h , set-up d crit = 1 nm is the only one that generally leads to overestimation of coefficients (positive NMB).The reason for this behaviour is given in Sect.3.2.
Note that in the atmosphere the actual nucleation mechanism is not known during the new particle formation analysis.However, A and K coefficients have still been calculated from the atmospheric data.Our results indicate that the range of nucleation coefficients obtained from the analysis is not highly dependent on the correctness of the nucleation mechanism assumption.The range of anal- ysed A coefficients for all events (regardless of the simulated mechanism) was 8.4 × 10 −8 -7.0 × 10 −5 s −1 , whereas for the subset of activation type events following Eq.( 2) it was 8.4 × 10 −8 -1.3 × 10 −5 s −1 (actual simulated range 10 −7 -10 −5 s −1 ).Similarly, the range of analysed K coefficients for all events was 5.7 × 10 −15 -1.4 × 10 −11 cm 3 s −1 , whereas for the subset of kinetic type events following Eq.(3) it was 1.9 × 10 −14 -1.0 ×10 −11 cm 3 s −1 (actual simulated range 10 −13 -10 −11 s −1 ).

Conclusions
We have evaluated the accuracy of the mathematical tools commonly used to analyse atmospheric new particle formation events in 1239 cases in which the nucleation mechanism and rate as well as the particle formation rate at 3 nm were known.The simulated particle size distributions in the range 2.8-556 nm were gridded to a typical size and time resolution of DMPS instruments (i.e.32 size channels and 10 min intervals) in order to mimic the analysis of atmospheric nucleation events as closely as possible.
We find that calculating the growth rate of sub-3 nm clusters from the time delay between H 2 SO 4 and N 3−6 curves can lead to overestimation of the growth rate during strong particle formation events.This is because coagulation scavenging of the formed clusters to the growing nucleation mode can skew the N 3−6 peak to earlier in the day.In extreme cases this can lead to apparent negative time delays; however, more problematic for the analysis are the cases in which the time delay remains positive but is shortened compared to time delay corresponding to the actual growth rate.It is Table 5. Performance metrics for the different analysis set-ups when estimating the nucleation factor A for activation events and factor K for kinetic events.The columns show the percentage of analysed events for which the estimate is not within a factor of two of the simulated rate (>factor 2), the normalised mean absolute error (NMAE) and the normalised mean bias (NMB).therefore recommended to exclude from the analysis events during which the coagulation sink caused by the nucleation mode is not negligible compared to the background sink.
The time delay obtained from the analysis was in many cases sensitive to the period over which it was fitted.While the differences in the estimates from the three fitting intervals in this study (over whole N 3−6 peak, or from event start until 1 or 2 h after the N 3−6 maximum concentration) were ≤30 min in all but 24 cases, the corresponding differences in growth rates were as high as 7.5 nm h −1 .While it is impossible to make a general recommendation on the optimal length of the fitting period, our overall results indicate that the fitting period should extent to at least two hours after the N 3−6 peak.On the other hand, the time delay between J 3 and H 2 SO 4 curves ( t J 3 ) should not be used to estimate the cluster growth rate as it systematically overestimates the growth.
The new particle formation rate at 3 nm (J 3 ) was estimated most accurately in terms of both relative and absolute error with the formulation of Vuollekoski et al. (2010).We recommend this formulation to be used in all future analyses of new particle formation, with the reservation that improving J 3 estimates tends to deteriorate the analysis of actual nucleation rates (J 1.5 ).In our study, the accuracy of the J 1.5 analysis was only satisfactory with 37-59% of events within a factor-of-two of the simulated value.The main factors deteriorating the estimates were the assumption of a constant cluster growth rate (currently made in all formulations) and possible erroneous assumptions concerning the initial size at which nucleation occurs.It is worth noting that several previous analyses of field measurements have assumed nucleation to initiate at 1 nm size, whereas recent ion instrument data suggests a size ∼1.5 nm.In our analysis, this erroneous assumption in initial cluster size increased the normalised mean absolute error (NMAE) from 65% to 135% and biased the nucleation rate values high (whereas a correct assumption about the size biased the rates low).It is therefore possible that the nucleation coefficients A and K derived in previous analyses of field data (Sihto et al., 2006;Riipinen et al., 2007;Kuang et al., 2008) overestimate the atmospheric values.On the other hand, all the analysis set-ups tested in this study resulted to an order-of-magnitude accuracy for at least 93% of the A coefficients and 89% of K coefficients.This can be considered a reasonable accuracy since the coefficients derived from atmospheric data typically exhibit a variation of 1-3 orders of magnitude (Riipinen et al., 2007).Thus, it is likely that this high variation of observed A and K coefficients is not a consequence of inaccuracies in the analysis methods, but a real phenomenon caused by (so far unknown) environmental factors.
Large uncertainties were found when the analysis tools were used to determine the nucleation mechanism in terms of the number of H 2 SO 4 molecules in a critical cluster.When applied to individual events, the best fit exponents from both N 3−6 ∼ [H 2 SO 4 ] b and J 1.5 ∼ [H 2 SO 4 ] b fittings were generally clearly higher than the actual number of H 2 SO 4 molecules in the simulated critical cluster in the majority of the cases.Out of the two fitting approaches, the exponents from the N 3−6 fit were higher and thus typically more biased.Decreasing the length of the fitting period or using the analysis equations of Kuang et al. (2008) led to further overestimation of the nucleation exponent.This indicates that the higher exponents found in Kuang et al. (2008) compared to some other analyses (Sihto et al., 2006;Riipinen et al., 2007) may in part be due to different analysis equations, and not only to the chosen fitting period.Although our results suggest that in general the analysis tools tend to overestimate the number of H 2 SO 4 molecules in the critical cluster, also significant underestimation was found in up to 41% of the cases.This indicates that one cannot automatically rule out more than 2 sulphuric acid molecules in a critical cluster even if field data shows nucleation exponents in the range 1-2.
Despite the general overestimation of nucleation exponents for individual events, the regression lines drawn to logarithmic plots of J 1.5 versus H 2 SO 4 of several events tend to underestimate the number of molecules in the critical cluster.However, we found the accuracy of the regression line analysis to be highly sensitive to the analysed subset of simulated events.It is not currently known how well this sensitivity of the modelled data reflects the situation with the field data.Overall, however, we conclude that interpretation of nucleation mechanism from J 1.5 ∼ [H 2 SO 4 ] b , N 3−6 ∼ [H 2 SO 4 ] b and regression line analyses contain many potential sources of error and should be done with great caution also for field measurements.
Overall, we conclude that the analysis tools have built-in assumptions which can cause uncertainties in the event analysis.While this uncertainty is in most cases within an acceptable order-of-magnitude limit, it is important to be careful when interpreting the data and drawing conclusions about e.g., nucleation mechanisms or temperature dependence of nucleation prefactors, etc.Unfortunately, quantifying the error that the analysis tools have caused in previous analyses of atmospheric data is not straightforward since we do not know which of the simulated events resemble closest the atmospheric ones.Since the tools perform very well for some individual simulated events and quite poorly for others, it is equally possible that the tools have introduced only minor error in atmospheric analyses or alternatively that they have misdirected our theoretical understanding regarding e.g. the nucleation mechanism.Currently, we cannot know if either is the case; however, our study raises the point that large errors are possible and thus caution should be practiced when interpreting the atmospheric data.
Finally, it should be noted that this study investigated only the errors resulting from the mathematical analysis tools and used smooth simulation data as an input.In typical atmospheric measurements, on the other hand, variations in atmospheric conditions and in air mass directions as well as the measurement instruments themselves result in significant noise in the data.This noise is likely to cause further uncertainty in the analysis of atmospheric new particle formation events.

Fig. 1 .
Fig. 1.An example of a simulated kinetic nucleation event.(a) DMPS-gridded size distribution.(b)The simulated nucleation (J 1.5 ) and new particle formation (J 3 ) rates together with the concentration of 3-6 nm particles (N 3−6 ).(c) Illustration of the fitting procedure for the time delay t N 3−6 and best fit exponent b (baseline analysis step 1).The simulated H 2 SO 4 concentration (here normalised by 2.5 × 10 3 ) and N 3−6 concentration are shown in solid lines.The highest correlation is obtained when the H 2 SO 4 curve is shifted 60 min in time and raised to the power of 2.31 (here normalised by 2.8 × 10 12 ) as shown by the dashed line.Thus for this event, the analysis yields t N 3−6 = 60 min and b = 2.31.

Fig. 3 .
Fig. 3. Comparison of baseline analysis predictions of mean (a) new particle formation rates (J 3 ), and (b) nucleation rates (J 1.5 ) to the simulated values.All four nucleation mechanisms are included.Shown are also 1:1 line (solid) as well as 1:2 and 2:1 lines (dotted).

Fig. 6 .
Fig. 6.Correlation coefficient as a function of exponent b when fitting N 3 ∼ [H 2 SO 4 ] b for three example cases each simulated using nucleation mechanism J 1.5 = Q× [H 2 SO 4 ] 4 .The legend indicates the best fit exponent, i.e. value of b that has the highest correlation coefficient, in each case.

Fig. 8 .
Fig. 8.Comparison of predicted and simulated nucleation coefficients for (a) activation nucleation events only and (b) kinetic nucleation events only.Shown are also 1:1 line (solid) as well as 1:2 and 2:1 lines (dotted).

Table 1 .
Parameters used in the model simulations.

Table 3 .
Accuracy of best fit exponent b calculations when correlating