Refined classification and characterization of atmospheric new particle 1 formation events using air ions 2

Atmospheric new particle formation (NPF) is a world-wide observed phenomenon that affects the human health 13 and the global climate. With the growing network of global atmospheric measurement stations, efforts towards investigating 14 NPF have increased. In this study, we present an automated method to classify days into four categories including NPF events, 15 non-events and two classes in between, which then ensures the reproducibility and minimizes the man-hours spent on manual 16 classification. We applied our automated method to 10 years of data collected at the SMEAR II measurement station in 17 Hyytiälä, southern Finland using a Neutral and Air Ion Spectrometer (NAIS). In contrast to the traditionally-applied 18 classification methods which categorize days into events, non-events and ambiguous days as undefined days, our method is 19 able to classify the undefined days as it accesses the initial steps of NPF at sub-3 nm sizes. Our results show that on ~24% of 20 the days in Hyytiälä, a regional NPF event occurred and was characterized by a ‘nice weather’ and favorable conditions such 21 as a clear sky and low condensation sink. Another class found in Hyytiälä is the transported event class, which seems to be 22 NPF carried horizontally or vertically to our measurement location and it occurred on 17% of the total studied days. 23 Additionally, we found that an ion burst, where the ions apparently fail to grow to larger sizes, occurred on 18% of the days 24 in Hyytiälä. The transported events and ion bursts were characterized by less favorable ambient conditions than regional NPF 25 events, and thus experienced interrupted particle formation or growth. Non-events occurred on 41 % of the days and were 26 characterized by a complete cloud cover and high relative humidity. Moreover, for the regional NPF events occurring at the 27 measurement site, the method identifies the start time, peak time and end time, which helps us focus on variables within an 28 exact time window to better understand NPF in a process level. Our automated method can be modified to work in other 29 measurement locations where NPF is observed. 30


1
Introduction 34 New particle formation (NPF) is an atmospheric phenomenon that results in a big addition to aerosol load in the global 35 troposphere (Spracklen et  In order to comprehend the phenomenon of NPF in a specific location, we first need to understand its frequency and 43 characteristics as well as particle formation and growth rates associated with it. With the growing number of global stations 44 (Kulmala, 2018), an automatic method is needed to classify the days into events and non-events. In addition to minimizing 45 the effort of manual event classification, an automated method tends also to reduce any human error. In this study, we present 46 an automated method which classifies days into four classes according to the observed characteristics of 2-4 nm sized air ions 47 and 7-25 nm sized particles. The original classification method of days as events, non-events and undefined days was proposed 48 by Dal Maso et al. (2005), and later modified by Kulmala et al. (2012), and is based on particle measurements starting from 49 about 3 nm in particle mobility diameter, thus missing the initial steps of NPF. With the increased development of 50 instrumentation, we are able to access sub-3 nm clusters and refine our classification method to account for the very initial 51 steps of NPF. The classification proposed here divides days into regional events, transported events, ion bursts and non-events, 52 thus excluding any 'undefined' days, which minimizes the number of days usually excluded from further data analysis. 53 Furthermore, our automated method identifies the start, peak and end time of daytime regional events or ion bursts. By 54 identifying the start and end times, we are able to concentrate on the conditions present during the actual NPF time window. 55 Our study focuses on the NPF occurring in Hyytiälä, a boreal forest site in southern Finland where the SMEAR II (Station for  56 Measuring Forest Ecosystem-Atmosphere Relations) measurement station is located . The dataset 57 collected at the station sums up more than 22 years of particle, meteorological and gas data, making extensive analyses of 58 NPF and related parameters possible. Besides studying NPF occurrence in Hyytiälä, our method can be applied to other 59 locations where NPF is observed, enabling scientists studying particle formation to focus on specific time windows by which 60 active NPF occurs. Our specific aims in this study are i) to automatically classify days in Hyytiälä according to their initial 61 NPF steps, ii) to minimize the number of undefined days by refining the classification, iii) to investigate different 62 characteristics of classified days, iv) to identify the start, peak and end times of regional events and, thereby, v) to create a 63 time series which allows us to focus on the exact time period during which a Based on the concentrations of 2 -4 nm ions, we are able to detect the initial steps of cluster formation (see Leino et al. 97 (2016)),which would not be possible using the DMPS system alone and the traditional classification. This small size window 98 available from the NAIS operating in ion mode gives an additional opportunity to investigate sub-3 nm clusters. Accordingly, 99 we are able to estimate whether a regional NPF event occurred within the air mass in which the observations were made, or 100 elsewhere and then carried to our measurement location. Similarly, undefined days are identified based on their sub-3 nm 101 characteristics. We present in Figure 1 our refined classification decision tree and apply it to Hyytiälä data in this study. In 102 order to attain this classification, we rely on the initial steps of cluster formation and their further growth, which we monitor 103 using an automatic method. Since in our study we are interested in daytime NPF, we chose the time window between 06:00 104 and 19:00 when monitoring aerosol number concentrations. However, the automated method can be tweaked to include 105 evening or night time event classification in places where these event types are present. 106 Our decision tree ( Figure 1) first examines 2-4 nm ion concentrations representing the initial step of new particle formation. 107 A notable increase in their concentration is interpreted as ion clustering on site. To be accounted as an increase, the number 108 concentration of ions after 06:00 must increase above a relative threshold and persist for more than 1 hour. This threshold is 109 calculated from ion concentration averaged over the time period 00:00-04:00 multiplied by a scaling factor ( Figure  ions/cm 3 should be reached and should last for at least 1 hour. We chose the aforementioned value as it has been found to be 113 an indicator for NPF in Hyytiälä (Leino et al., 2016). If this criterion is met, these ions are expected to either grow into bigger 114 sizes and lead to regional NPF events (RE), or fail to grow further, in this case the events are identified as ion bursts (IB) that 115 do not form new particles. 116 To decide whether the particle growth is observed, particle concentrations in the size range of 7 -25 nm are examined. These 117 particles represent the growth phase of freshly-formed clusters. Since in Hyytiälä growth rates of 4 -7 nm particles is reported 118 to lie between 0.8 and 17 nm/h (Average 3.8 nm/h ) (Yli-Juuti et al., 2011), we considered a time delay of 1 to 8 hours between 119 the initial increase of ion (2 -4 nm) concentrations and particle (7 -25 nm) concentrations. To be considered as an increase, 120 the particle number concentration should exceed a relative threshold which in this case is the number concentration averaged 121 over the time period of 03:00-05:00 ( Figure 2B). We determined the background time window by comparing the automatic 122 method to a manual classification that we performed for the years 2013-2014 from our data set. The increase in concentration 123 should last for ~1.5 hr (100 minutes) and reach a peak of at least 3000 particles/cm 3 . On one hand, if both 2 -4 nm ions and 124 7 -25 nm particles are present, the time period is considered as a regional event (RE). On the other hand, if the 2 -4 nm ions 125 are present but they do not grow to form 7 -25 nm particles, the time period is classified as an ion burst (IB). Moreover, if 2 126 -4 nm ions are not present, but we observe an increase in the particles, this leads to the assumption that the NPF event did 127 not occur at the measurement location but was carried horizontally or vertically to our site (Leino et al., 2018). The latter has 128 been previously described as a tail event (Buenrostro Mazon et al., 2009) or a transported event (TE). However, if neither 129 criterion is met, which means that neither 2 -4 nm ions nor 7 -25 nm particles are present in sufficient concentrations, the 130 time period is then classified as a non-event (NE). 131

2.4
Description of the automated method 132 Our automatic method selects the start time, peak time and end time of negative NAIS ions in the size range 2 -4 nm. The 133 growth to an event is confirmed by an accompanying peak in the 7 -25 nm particles measured by the NAIS. The outcome of 134 the automatic method is the classification of days into the four classes, as well as a time series that identifies the time period 135 of regional events and ion bursts in Hyytiälä (Pathways RE and IB in Figure 1). Once the ion and particle data are smoothed 136 and the precipitation time stamps are eliminated, using the new automated method, the classified time series is generated 137 within couple of minutes with a click of a button, in comparison to the manual method which could use several hours and at 138 least 2 people in order to classify one year of data. 139 First, to investigate the appearance of 2 -4 nm ions, the precipitation time stamps are excluded from our analysis as they 140 interfere with the ion data (Leino et al., 2016), resulting in misinterpretations. After that, the ion concentrations are smoothed 141 using Savitsky-Golayfilter (Orfanidis, 1995). We then search for an increase in the ion concentration that lasts for 12 142 consecutive points (5 minutes each) above a threshold value and reaches values greater than 20 cm -3 (Leino et al., 2016). A 143 maximum of 3 drops below the threshold value are allowed ( Figure 2A). Finally, the method looks for a peak in the 7 -25 nm 144 particle concentration to identify the appearance of a growth phase ( Figure 2B). The peak requires 15 consecutive points (5 145 minutes each) having concentrations larger than the threshold value and reaching a value larger than 3000 cm -3 . Also, a 146 maximum of 3 drops below the threshold value are allowed. Accordingly, each time stamp is classified. 147

Start time, peak time and end time determination 148
The start time, peak times and end times for regional events and ion bursts are defined based on the 2 -4 nm ion concentration 149 as follows: i) The start time is the first crossing of the threshold line which lasts for more than 12 consecutive points, ii) the 150 peak time is when the concentration reaches the maximum and iii) the end time is the first trough after crossing the threshold 151 line into lower concentrations which remains below the threshold for more than 3 consecutive points. An example day is 152 demonstrated in Figure 2A. The threshold is taken as the 2 -4 nm ion concentration averaged over the time period 00:00-153 04:00 multiplied by a scaling factor of 7. Our scaling factor was determined after we did a comparison with the manual 154 classification of the data for the years 2013-2014. 155

3
Results and Discussion 156

Event Classification 157
Our classification categorizes the days in Hyytiälä into four different categories following the pathway chart in Figure 1. Type 158 RE, or regional NPF events, are those which are initiated over a large area including the measurement location and the particles 159 continue to grow to bigger sizes. The type TE, or transported events (also known as tail events by Buenrostro Mazon et al. 160 (2009)), are events whose beginning is not detected as it does not occur at the immediate vicinity of our measurement site. 161 Such events could be attributed to events that were initiated outside our measurement site and transported to Hyytiälä (Leino 162 et al. 2018). The aforementioned hypotheses could explain the observation that TE typically occur at around midday or later 163 in the afternoon, while RE tend to occur concurrent with sunrise. The type IB, or ion bursts, are attempts of NPF, during 164 which clusters form in Hyytiälä, however, they do not grow beyond a few nanometers in diameter. Changes in atmospheric 165 conditions that could cause the limited, or interrupted, growth of the clusters are assessed in more detail in section 3.3. Finally, 166 non-events (NE) are days for which we do not observe a forming mode of 2 -4 nm ions nor a growing mode of 7 -25 particles. half-hour averages of each variable between 7:00 and 12:00 during spring (March -May). We chose this season in order to 194 capture the maximum NPF events and this time window in order to be consistent between all four studied classes. As expected, 195 the median CS observed on RE was 1.7 x 10 -3 s -1 which is a factor of 2 lower than CS observed on TE days or on NE days (3 196 x 10 -3 s -1 ). To our understanding, high CS inhibits NPF, so that its higher values during the days classified as TE forbid the 197 initial formation of particles at the measurement site. IB, on the other hand, are potential regional events whose growth has 198 been interrupted. Since the median CS during IB was not high (2.5 x 10 -3 s -1 ), it does not explain the discontinuous growth of 199 the clusters during these events. We proceed to study the effect of T on the occurrence of each class of events. Since the data 200 in Figure 5 are measurements during spring, the median value of temperature (2-7 o C) was rather similar on all days and no 201 specific trend or exception could be found. 202 In addition to CS and T, RH and cloudiness (P) play an important role in the occurrence of NPF (Dada et al., 2017;Hamed et 203 al., 2011). A regional NPF event is more likely to occur on a clear-sky day rather than on a cloudy day. This conclusion is 204 demonstrated nicely in Figure 5 which shows that the median value of P was close to 0.8 on the RE days and closer to 0.3 on 205 NE day. TE usually took place when the conditions within the boundary layer were not favorable for a regional NPF to occur. 206 However, the particle growth was much less sensitive to environmental conditions: a particle growth was often observed 207 during all times of day and in every season, also on days (and nights) when NPF did not take place (Paasonen et al., 2018). 208 Combined with a higher CS, the value of P was much lower on TE days than on RE days, describing a semi-cloudy day 209 unfavorable for NPF to occur within the boundary layer, which could result in the occurrence of a TE in locations where the 210 conditions are conducive enough to NPF. It is, however, important to mention that it is possible to have a regional NPF episode 211 taking place simultaneously with a transported one, and when the latter is transported it gets mixed with the regional NPF so 212 that this situation will be classified as a RE. Finally, since ion bursts are attempts of an event but do not grow, an interrupted 213 clear sky could explain this phenomenon: for instance a sudden appearance of a cloud would result in the interruption of NPF 214 (Baranizadeh et al., 2014), which then remains as an ion burst only. Finally, the RH, which in general correlates with 215 cloudiness, showed a nice pattern between the event classes: RH was the lowest for RE and the highest for NE, and it fairly 216 reflects cloudiness. 217

Start times, peak time and end time of RE 218
Our method makes it possible to detect the start, peak and end times of every regional event classified during our study period. 219 Although several previous studies state that the occurrence of NPF starts with sunrise and peaks around midday, very few 220 investigations have considered occurrence times accurately. We derived the start, peak and end times from 2 -4 nm ions 221 automatically, as mentioned in sections 2.4 and 2.5. During spring, when most of the NPF events occur, our results ( Figure  222 6) show that indeed RE occur after sunrise and prior to noon, with the maximum number of days occurring between the 223 sunrise and 5 hours past sunrise. The peak times of the events had the most frequent occurrence at 5 to 6 hours after the 224 sunrise, which is between 10:30 and 11:30 local time, complementing our previous assumption that NPF peaks before noon. 225 Finally, the ending times of the events had the most frequent occurrence at 9 to 11 hours after sunrise. During summer the 226 events tend to start, peak and end later than in spring, and they show lower variability in comparison to spring. This observation 227 could be attributed to longer daylight hours and less clouds. Whereas in autumn, the events, start, peak and end earlier than 228 in spring. Exceptionally, during winter, ion concentrations might be affected by the accumulation of snow on or around the 229 inlets. Overall, the variability of the event start, peak and end times can be affected by the solar cycle, degree of cloudiness 230 and seasonality. The importance of the identification of the exact start and end times of the process helps to increase our 231 understanding on the processes governing the NPF phenomenon. More specifically, they allow forming a time series where 232 NPF is separated from non-event times, making it possible to compare the parameters responsible for the NPF process within 233 appropriate time frames. 234

Comparison to previous classification 235
In order to estimate the goodness of our automatic method, it is crucial to compare our results with the previous classifications 236 ( y-axis as a fraction of each original class. For example, 65% of the originally-classified event days (event days make 25% of 239 the total days in Hyytiälä according to the original classification) were found to be RE, 10% were TE and 14% were IB. The 240 remaining 11% were considered as misclassified or bad data (by manual classification) and were excluded from the plot. In 241 total, our automatic method was able to classify 89% of the original NPF events into some of the new event classes (RE, TE  242 or IB). The original non-events (which made 40% of the total days) were split between the TE (20%), IB (19%) and NE 243 (53%). The remaining 8% were bad data according to the manual classification and were excluded from the plot. 244 Finally, undefined days, which according to the traditional classification were 35% of the total days, were split between all 245 the classes. Our results show that 17% of those were RE, 21% were TE, 19% were IB and 42% were non-events. which all in all were related to unfavorable conditions for regional NPF. The interruption mechanisms may include appearance 250 of clouds (Baranizadeh et al., 2014; Dada et al., 2017), resulting in decreased radiation essential for particle formation and 251 growth (Jokinen et al., 2017) ), or a change in the origin of arriving air masses from a clean to a rather polluted sector. 252 (Sogacheva et al., 2005). Our automated method fails sometimes as the result of the simultaneous appearance of an ion burst 253 and a pollution plume. While the misjudgment of these days as regional events is largely minimized by correcting for the 254 background concentrations of 7-25 nm particles, erroneous classification is still possible in some cases. 255

4
Conclusions 256 Using 10 years of measurement using the NAIS at SMEAR II station, we were able to create an automated method to classify 257 days into 4 classes based on their ion (2 -4 nm) and particle (7 -25 nm) number concentrations, including regional events, 258 transported events, ion bursts and non-events. Our method minimizes the efforts used in manual day-by-day classification as 259 well as the errors due to human bias. In addition, our method allows for the complete classification (sub-3 nm) of all days, i.e. 260 reduces the number of previously known 'undefined days', which have always been excluded from previous analyses. 261 Our results show that on ~ 40% of the days during spring in Hyytiälä, a regional NPF event occurs and is characterized by a 262 set of favorable conditions, such as a clear sky, low condensation sink, medium temperature and low relative humidity. On 263 the contrary, NE were ~25 % of the days and were characterized by a complete cloud cover, high RH and high CS. 264 Interestingly, TE and IB fall in the category between RE and NE in this respect. While IB are interrupted growth of initially 265 started RE due to a probable change to polluted air mass or an appearance of a cloud, TE occurred on days when there was 266 little chance for the cluster to form within our measurement location but still they had a chance to grow if reaching our site.