New particle formation event detection with Mask R-CNN

Abstract. Atmospheric new particle formation (NPF) is an important source of climate-relevant aerosol particles which has been observed at many locations globally. To study this phenomenon, the first step is to identify whether an NPF event occurs or not on a given day. In practice, NPF event identification is performed visually by classifying the NPF event or non-event days from the particle number size distribution surface plots. Unfortunately, this day-by-day visual classification is time-consuming and labor-intensive, and the identification process renders subjective results. To detect NPF events automatically, we regard the visual signature (banana shape) which has been observed all over the world in NPF surface plots as a special kind of object, and a deep learning model called Mask R-CNN is applied to localize the spatial layouts of NPF events in their surface plots. Utilizing only 358 human-annotated masks on data from the Station for Measuring Ecosystem–Atmosphere Relations (SMEAR) II station (Hyytiälä, Finland), the Mask R-CNN model was successfully generalized for three SMEAR stations in Finland and the San Pietro Capofiume (SPC) station in Italy. In addition to the detection of NPF events (especially the strongest events), the presented method can determine the growth rates, start times, and end times for NPF events automatically. The automatically determined growth rates agree with the manually determined growth rates. The statistical results validate the potential of applying the proposed method to different sites, which will improve the automatic level for NPF event detection and analysis. Furthermore, the proposed automatic NPF event analysis method can minimize subjectivity compared with human-made analysis, especially when long-term data series are analyzed and statistical comparisons between different sites are needed for event characteristics such as the start and end times, thereby saving time and effort for scientists studying NPF events.


P. Su et al.: New particle formation event detection with Mask R-CNN

Introduction
Atmospheric aerosols have profound impacts on air quality, human health, ecosystems, weather, and climate (Asmi et al., 2011a;Hirsikko et al., 2011;Joutsensaari et al., 2018;Chu et al., 2019;Lee et al., 2019). New particle formation (NPF) is an important source of atmospheric aerosols, which has been observed in a variety of locations in the world such as different types of forests, semi-polluted or heavily polluted cities, high-altitude sites, coastal sites, and polar regions (Kulmala et al., 2004;Kuang et al., 2010;Kulmala et al., 2012;Nieminen et al., 2018;Dada et al., 2018;Lee et al., 2019). In addition to the spatial scale, on the temporal scale, NPF events have also been observed in sites built a long time ago (Dal Maso et al., 2005;Järvi et al., 2009;Asmi et al., 2011b) and newly built sites Chu et al., 2019;Liu et al., 2020;Yan et al., 2021).
To analyze NPF events, the first step is to determine whether an NPF event has occurred or not (Kulmala et al., 2012). Previous studies on detecting NPF types can be roughly divided into three categories: vision-based, rulebased, and data-driven. Vision-based methods visually classify the NPF types day by day according to some criteria based on surface plots of the size distribution time series (Mäkelä et al., 2000;Dal Maso et al., 2005;Hirsikko et al., 2011). The advantage of vision-based methods is that experts can explicitly tell which region in a surface plot is thought of as the evidence of an NPF event, and the drawbacks of vision-based methods are that they are labor-intensive and time-consuming and the classification process is subject to human bias. Rule-based methods classify NPF types with several explicit steps where some thresholds on the particle number concentrations are used as prior knowledge (Kulmala et al., 2012;Dada et al., 2018). Rule-based methods can classify NPF types automatically, but the drawback of these methods is that the particle number concentrations can vary a lot between different environments, meaning that the prior knowledge used in one site may fail in other sites or complex situations. Data-driven methods utilize the measured particle number size distributions and annotated NPF types (labels) to establish a model which can identify NPF types. For instance, neural networks (NNs) have been used to classify NPF types no matter whether handcrafted features (Nanni et al., 2017) are used (Zaidan et al., 2018) or not (Joutsensaari et al., 2018). The advantages of data-driven or NN-based methods are that they do not need any specific threshold on particle number concentration and the classification process is automatic. However, annotated NPF labels are required to train the NNs, and since the label annotation process is subjective, the trained NNs also "learn" the biases in the labels, which impedes the application of NN-based methods to different sites. Considering the increasing number of global observation stations (Kulmala, 2018), an automatic NPF detec-tion method that applies to NPF datasets collected in different sites is necessary.
Although not all NPF events show signs of growth (Dal Maso et al., 2005) or have the commonly known "banana" shape, in this work, we only focus on the regional (bananatype) NPF events which are the most common type of event observed globally and whose formation signature is the continuous formation and subsequent growth of nucleation mode (sub-25 nm) particles. We observe that there are some similarities between recognizing NPF events in surface plots and other objects in digital images. Taking cats as an example, no unique mathematical criterion or definition for NPF events or cats can be found. However, humans can easily distinguish whether an NPF event occurs in a surface plot or whether a cat occurs in a digital image in most cases. Inspired by this observation, we regard the banana-type NPF events as a special kind of object, and thus the object detection techniques for detecting cats can be used to detect the banana-shaped NPF events. For simplicity, we use NPF images to represent surface plots without axes. Though surface plots have clear physical meanings, we can apply different image transformations to NPF images without any restriction. In this study, we use an instance segmentation method called Mask R-CNN , a deep learning model, to localize the NPF events by predicting a mask that can cover the spatial layout (the banana shape) of each NPF event. In other words, we try to answer the NPF classification problem by directly localizing the visual signature of NPF events. Since Mask R-CNN only focuses on the banana shape that has been observed globally, it can be used on datasets collected from different sites automatically. For more information about object detection and instance segmentation, please refer to Appendix A.
To verify the generality of the presented method, we test the Mask R-CNN model on three SMEAR stations (Station for Measuring Ecosystem-Atmosphere Relations I, II, and III) in Finland and one station located in San Pietro Capofiume in the Po Valley basin in Italy (SPC station). The datasets collected in the four stations sum up approximately 73 years of measurements. Besides the classification problem, the accurate location of events makes it easier to determine the growth rates, start times, and end times automatically. Our code at https://github.com/cvvsu/maskNPF (last access: 20 January 2022) has been released to test it on datasets collected in other sites and facilitate future research. Our aims in this study are (1) to automatically localize the globally observed visual signature (banana shape) for regional NPF events, which can identify NPF types (events occur or not, especially for the strongest events), and determine the growth rates, start times, and end times and (2) to investigate the statistical characteristics of growth rates, start times, and end times for the strongest NPF events for the three SMEAR stations in Finland and the SPC station in Italy.

Measurement sites
We utilized aerosol size distribution data from three observation sites in Finland and one in Italy. All the sites operated similar instrumentation, and the observations followed guidelines set by the Aerosols, Clouds, and Trace gases Research InfraStructure Network (ACTRIS) for in situ aerosol number size distribution measurements (Wiedensohler et al., 2012). The observation sites and instruments are briefly described below.
The SMEAR I station is located at the Värriö Subarctic Research Station of the University of Helsinki (67 • 46 N, 29 • 36 E; 390 m a.s.l.) in northern Finland. The station is surrounded by 70-year-old Scots pine (Pinus sylvestris) boreal forest at Kotovaara hill, and some small lakes and mires exist in valleys 60 m lower and more than 1 km away. The measurements of particle number size distribution started in 1997 in SMEAR I. For more details about the site and measurements, please refer to Vana et al. (2016), Kyrö et al. (2014), and Hari et al. (1994). The analyzed particle number size distribution dataset collected in Värriö covers 8189 d from 10 December 1997 until 14 January 2021 (8436 d in total, and the days with no data were omitted from this study).
The SMEAR II station is located in the Hyytiälä Forestry Field Station of the University of Helsinki in central Finland (61 • 51 N, 24 • 17 E; 130 m a.s.l.), within pine-dominated boreal forest with some deciduous birch (Betula pubescens) and aspen (Populus tremuloides) trees. Comprehensive measurements including particle, radiation, gas, meteorological, and complementary data have been made for more than 20 years (Hari and Kulmala, 2005;Dada et al., 2017Dada et al., , 2018. The location is considered a semi-clean boreal forest environment according to the level of anthropogenic pollutants (Nieminen et al., 2015;Dada et al., 2018;Zaidan et al., 2018). A detailed overview of the site and measurements can be found in Hari et al. (2013). The analyzed particle number size distribution dataset collected in Hyytiälä covers 8642 d from 31 January 1996 until 21 January 2020 (8756 d in total).
The SMEAR III station is located in the Kumpula campus of the University of Helsinki in southern Finland (60 • 12 N, 24 • 58 E; 26 m a.s.l.). The station has accumulated approximately 17 years of measurements such as air pollution and meteorological and turbulent exchange (Järvi et al., 2009). The location is within an urban environment surrounded both by campus buildings, busy streets, and open bedrock and by parklands of deciduous forest, such as birch, aspen, and maple (Acer pseudoplatanus). For more details about the site and measurements, please refer to Järvi et al. (2009) and Dada et al. (2020b). The analyzed particle number size distribution dataset collected in Kumpula covers 5775 d from 1 January 2005 until 14 January 2021 (5857 d in total).
The San Pietro Capofiume measurement station (SPC station) is located in a rural area (44 • 39 N, 11 • 37 E; 11 m a.s.l.) in Po Valley, which is the largest industrial, trading, and agricultural area in Italy (Joutsensaari et al., 2018). The particle number size distribution measurements started in March 2002 and were carried out continuously, except for occasional system malfunctions, until 2017. A detailed overview of the site and measurements can be found in Joutsensaari et al. (2018). The analyzed particle number size distribution dataset collected in SPC covers 4177 d from 24 March 2002 until 16 May 2017 (5534 d in total).
The aerosol particle number size distributions were measured by differential mobility particle sizer (DMPS) systems (Aalto et al., 2001) at all four stations (Fig. 1). The particle number size distribution datasets collected from the four stations are termed the Värriö dataset, Hyytiälä dataset, Kumpula dataset, and SPC dataset. The DMPS systems installed in different stations have different detection ranges for particle sizes, and particle sizes ranging from 3 to 1000 nm are considered in this work. Note that the detected particle size does not have to reach 1000 nm for all DMPS systems.

NPF types
According to the guidelines reported in previous studies, the particle number size distributions can be classified into six different types (Dal Maso et al., 2005;Kulmala et al., 2012;Joutsensaari et al., 2018): -Class Ia events. Ia-type events show clear and strong formation of small particles (especially 3-6 nm), with few or no pre-existing particles in the smallest size ranges (Fig. 2a).
-Class Ib events. Ib-type events show the same behavior as class Ia but with less clarity (Fig. 2b).
-Class II events. II-type events do not show clear evidence for observing the growth. That is, the growth rate cannot be determined without a large uncertainty (Fig. 2c).
-Class Non-Event (NE). NE does not show any evidence for new particle formation in the nucleation particle size range (Fig. 2d).
-Class Undefined (Undef). Undef is a type that is difficult to be classified as events or NEs since some but not all features for events can be seen (Fig. 2e).
-Class Bad-Data (BD). The BD type is caused by instrument malfunction. Generally, too high or too low particle concentrations or missing data can be observed in the surface plots (Fig. 2f). Figure 2 shows the example surface plots for different NPF types. The banana shape can be seen clearly for Ia-type and Ib-type NPF events because they are so consistent throughout the day and are little influenced by local wind fields. Ia-type and Ib-type NPF events are usually connected with phenomena happening at large (regional) spatial scales. However, for II-type NPF events, interruptions in surface plots are often associated with more local sources of variability. The banana shape is not very clear for II-type NPF events and can be observed even in some Undef types.

Mask R-CNN
In order to fill the research gap mentioned in the Introduction, we used an object instance segmentation technique called Mask R-CNN, which can accurately localize an NPF event's spatial layout. Mask R-CNN extends the object detection method Faster R-CNN  by adding a new branch for generating segmentation masks of objects , and Faster R-CNN is an advanced version of Fast R-CNN (Girshick, 2015). The Mask R-CNN, Faster R-CNN, and Fast R-CNN models are derived from the Regions with CNN features (R-CNN) model (Girshick et al., 2014), where CNN means convolutional neural network. The architecture of Mask R-CNN is shown in Fig. 3.
The Mask R-CNN model can be seen as a learnable function f θ that is parameterized by the learnable parameters θ .
That is, where X is an input NPF image and Y contains three outputs: the class labels, bounding boxes, and masks. During training, the parameters are updated by reducing the losses between the output and annotated class labels, bounding boxes, and masks, leading to the best-fitted function f θ * (Girshick, 2015;Ren et al., 2016;He et al., 2017). The learned function f θ * is then applied to the test set to verify its generality. Similarly to Joutsensaari et al. (2018), we fine-tuned the Mask R-CNN model which had been pre-trained on the Microsoft COCO dataset (Lin et al., 2014) with only 358 annotated masks. These 358 masks were created through the labeling tool "LabelMe" (Russell et al., 2008) and were from 358 NPF images (78 Ia-type, 202 Ib-type, and 78 IItype). The 358 NPF images were generated from the Hyytiälä dataset, and the period was from 1996 to 2003. During training, 300 NPF images with masks were randomly selected as the training set, and the remaining 58 NPF images with masks were the validation set (Fig. 4). The learning rate was 5 × 10 −3 , and decreased every 3 epochs with a factor of 0.10. The stochastic gradient descent optimizer was used. We used weight decay of 5 × 10 −4 and momentum of 0.90. The Mask R-CNN model was fine-tuned for 10 epochs. All the NPF images and masks were resampled to 256 × 256 pixels, and with an NVIDIA V100 GPU, the training process lasted around 5 min. Data collected after 2003 in Hyytiälä and datasets collected in Värriö, Kumpula, and SPC are the test sets. Code and more results are available at https://github.com/cvvsu/maskNPF.git (last access: 20 January 2022).
Given a specific day, if no mask can be detected by the Mask R-CNN model, then this day will not be classified as an event day. On the other hand, if at least one mask is detected by the Mask R-CNN model, then this day will be recognized as an event day by the Mask R-CNN model. Since Mask R-CNN only focuses on the banana shape, some regions in NPF images that are not events can also be localized, resulting in more than one mask that can be detected for one NPF image (Fig. 3). For each mask, there is an objectiveness score in terms of [0,1] showing the probability of an event occurrence. In addition to the objectiveness score, a bounding box is also obtained.
Assuming the time resolution of DMPS systems are 10 min and there are 52 samples for particle sizes ranging from 3 to 1000 nm, the recorded particle number size distribution for 1 d is a data matrix with the shape of 52 × 144 (3 to 1000 nm from the bottom row to the top row and 00:00 to 00:00 the next day (local time) from the first column to the last column). We resampled the predicted masks to the size of 52 × 144, aligning to the shapes of collected data (Fig. 5).
The value of a pixel in a mask represents the probability of the pixel belonging to an event. For each predicted mask, it was binarized at a threshold of 0.50 . The left and right edges of bounding boxes determine the start and end times, respectively. The bottom and upper edges of bounding boxes automatically provide a size window that covers the related NPF event (Figs. 3 and 5).

Growth rate
The particle growth rate (GR) is the rate of change for a given particle: where D p2 and D p1 are the particle diameters at times t 2 and t 1 , respectively. The maximum concentration method and log-normal distribution function (mode fitting) method are two widely used methods to calculate the growth rate (GR) for an NPF event (Kulmala et al., 2012;Dada et al., 2020a). The GRs determined by these two methods have the same order and seasonal variations (Dal Maso et al., 2005;Hirsikko et al., 2005;Yli-Juuti et al., 2011). Since the localization of the NPF events can be detected, we can accordingly calculate the GR of an NPF event automatically using the maximum concentration method. We used the random sample consensus (RANSAC) algorithm (Choi et al., 2009) instead of ordinary least squares fitting to determine GRs. Compared to ordinary least squares fitting, the RANSAC algorithm is robust to outliers. In addition to GRs, the predicted masks can also be used to analyze the characters of start times and end times of the strongest NPF events.  (Lin et al., 2017). RPN is the region proposal network . RoIAlign (region of interest align) is the RoIAlign layer that properly aligning the features .

Classification results
According to the classification results on the Hyytiälä dataset (Table 1), changing the threshold of the objectiveness score does not affect the Ia and Ib types. However, on the SPC dataset, different thresholds have a big effect on the classification accuracy of Ia and Ib types (Table 2). Since the Mask R-CNN model was trained on the masks derived from the Hyytiälä dataset, it did not contain any information about the SPC dataset, resulting in unstable classification accuracy when changing the threshold.
According to the classification results shown in Tables 1 and 2, there is a trade-off between the classification accuracy of NPF events and the number of "misclassified" days (NE, Undef, or BD days are classified as event days by Mask R-CNN), which is controlled by the threshold. Re-training the Mask R-CNN model on masks derived from the SPC dataset may improve the classification accuracy on the SPC dataset and make the classification results stable independent of the chosen threshold. We did not re-train the Mask R-CNN model to demonstrate the generality of our method (Table 2). Once a small threshold such as 0.20 for the objectiveness score is selected, on the SPC dataset and without annotated masks or class labels, the classification accuracy is 94.80 % for Ia-type NPF events, 87.94 % for Ib-type NPF events, 90.57 % for a combination of Ia-type and Ib-type NPF events (Table 2), which are higher accuracies than the results reported in Joutsensaari et al. (2018), where an NN-based method was applied. The classification results on the SPC dataset demonstrate the idea that regarding the banana shape in NPF images as a special object is reasonable. In Table 1, some Undef, NE, and BD days are classified as event days by the Mask R-CNN model. We visualize these misclassified days in Appendix B to help readers have a better understanding of the detection results.
According to the classification results of the four datasets, for scientists who are only concerned about identifying Ia and Ib event types, this method will save them plenty of time and effort. Since the II-type events usually do not present a clear banana shape in the NPF images and Undef days are difficult to be classified as events or NEs, the Mask R-CNN model fails to distinguish some of these days (Tables 1 and  2). However, detection results of Mask R-CNN can be used as auxiliary information to help determine the II and Undef types for scientists.

Growth rate
In this study, we show that combined with the detected masks, the maximum concentration method can be used to calculate the GRs automatically (Figs. 5 and 6). If not specified, we only focus on determining the GRs, start times, and end times for the strongest NPF events.
Daytime hours between 06:00 and 18:00 (local time) were used for the traditional maximum concentration method to calculate the GRs. However, when the prior is not satisfied or particle burst is present in the surface plots, scientists need to select the start and end times manually. With the detected masks, the proposed method can automatically determine the time window (left and right edges of the bounding boxes, Figs. 3 and 5), and there is no need to manually adjust the start and end times. Usually, different size windows were ap-  plied to calculate GRs, and we selected the 3-25 nm as the size range for GR calculation (Fig. 6). However, other size ranges are also possible, and for more information, please refer to our code at https://github.com/cvvsu/maskNPF (last access: 20 January 2022). To avoid confusion, the maximum concentration and mode fitting methods are termed traditional methods in this work. As shown in Fig. 7, an obvious downtrend of GRs for the SPC station can be seen, and the medians of GRs for the SPC station are the highest compared to those for the other  stations in most of the years, which is the same with the GRs determined by the traditional methods. The traditionally determined GRs of the SPC dataset utilized two different methods: from 24 March 2002 to 18 June 2011, the maximum concentration method was used, and from 19 June 2011 to 14 August 2017, the mode fitting method was applied. The median of GRs for the Kumpula station is greater than that for the Värriö and Hyytiälä stations but smaller than that for the SPC station in most of the years, which is consistent with the observation that the GRs are highly related to the local pollution levels Hamed et al., 2007). The Pearson correlation coefficients between traditionally and automatically determined GRs are 0.59 and 0.53 for the Hyytiälä and SPC stations, respectively. The traditionally determined GRs of the Hyytiälä station were calculated by the mode fitting method, which further verified that the GRs determined by the maximum concentration and mode fitting methods should have the same order and variations (Yli-Juuti et al., 2011). The statistical results of GRs indicate the potential to utilize the automatic method to calculate GRs. Additionally, determining the GRs automatically leads to consistent results and eliminates human-made errors.

Start time and end time
In addition to the GR, with the detected mask, the start time and end time of an event can also be determined automat- ically. Start and end times of events are reported in very few publications Dada et al., 2018). Figure 8 shows the start and end times for the NPF events for different datasets. For the SPC dataset, the automatic method summarized the start times for events that occurred from 2002 to 2017, and the human-annotated results summarized the start times for events that occurred from 2011 to 2017. However, the histograms of the start times and end times determined by different methods show similar shapes (Fig. 8). Considering the end time of an event is difficult to determine in some cases, the end time of the NPF event cannot be identified as clearly as the start time.
Generally, the histograms of the start times for four datasets are bell-shaped, which may be controlled by normal distributions (Fig. 8). The histograms of end times for the SPC station also show the bell shape, but there is more than one peak in the histograms of end times for the Värriö and Hyytiälä stations (Fig. 8). For NPF events that last for more than 1 d, interactions between particles in the 2 d lead to the end times being much more difficult to determine.
The event durations for the NPF events on the SPC station are generally shorter than those for the Värriö, Hyytiälä, and Kumpula stations ( Fig. 8 and Table 3). The possible reason for this is that the atmospheric environment for the SPC station is much more polluted compared to the three SMEAR stations in Finland, making the events last for shorter times. The events in the Värriö and Hyytiälä stations have similar median durations, followed by the Kumpula and SPC stations, possibly indicating that the atmospheric environment is less polluted in the Värriö and Hyytiälä stations than in the Kumpula station and most polluted in the SPC station. Another possible reason for this is that spring has the most frequent events and all stations other than SPC are higher in latitude and thus have longer sunlight hours during spring.
The median start times are almost the same for the Hyytiälä and Kumpula stations (Table 3, in boldface), which is consistent with these two stations being located close together and further verifies that the intensity of solar radiation reaching the Earth's surface seems to be the most important factor affecting whether an NPF event occurs or not . Figure 7. Comparison of growth rates calculated by different methods. GR-T means that growth rates are determined by the traditional methods (manually selecting the start and end times when necessary), and GR-P means that growth rates are determined by the proposed automatic method. r is the Pearson correlation coefficient between GRs calculated by different methods. The density scatterplots in the bottom row show the ranges that the growth rates are usually located in.

Advantages, limitations, and future studies
There are four major advantages of using the Mask R-CNN model (or other instance segmentation models) to detect NPF events. First, the process is simple, automatic, and straightforward. Second, the Mask R-CNN model can explicitly output masks for banana-shaped events, making the calculation of growth rates convenient together with the determination of start and end times. Third, the Mask R-CNN model can be used for datasets collected in different sites. For instance, the model trained on the masks from the Hyytiälä dataset works well on the SPC dataset. Fourth, the Mask R-CNN model is insensitive to the sizes and aspect ratios of the input NPF images since the model has already "seen" the related image transformations during training. For practical usage, we can plot the NPF images with the size of 256 × 256 or 128 × 128 pixels for NPF event detection. For short-term (1-or 2-year) datasets, it is better to set the threshold as 0.00 for the objectiveness score to detect as many NPF events as possible, while for long-term datasets, a small threshold such as 0.20 or 0.40 will accelerate the detection and the statistical properties may not change if only a few event days are not included.
As mentioned above, the Undef days are difficult to be classified with 100 % certainty. Some manually classified Undef days are recognized as event days by the Mask R-CNN model (Tables 1 and 2). These misclassified Undef days can be used as auxiliary information for scientists, in terms of classifying days as the Undef type or not. On the other hand, for scientists focusing on the comparison between event and non-event types, manual work is still required to select the Undef days out. In this case, the Mask R-CNN model can only be used as an auxiliary tool.
The key to determining the correct start time and end time and the GR for an event is that the detected mask can accurately depict the spatial layout of an NPF event. Since the Mask R-CNN model used in this study was only trained on 358 annotated masks, its generality may fail on some special observation stations. Thus, scientists need to re-train the Mask R-CNN model on their special datasets. However, there is no need to manually annotate masks again since some detected masks by our pre-trained model can be used as the annotated masks for the re-training.

Conclusions
With an increasing number of global observation stations, automatic NPF detection methods are required to speed up the NPF analysis process and minimize subjectivity caused by human-made analysis. To improve the automatic level of NPF detection, we presented a method called Mask R-CNN for identifying the regional (banana-type) NPF events (especially the strongest events), and the method can also be applied to determine the growth rates, start times, and end times for events automatically. The method generalized well on different stations, and we tested the method on the SMEAR I, II, and III (Värriö, Hyytiälä, and Kumpula, respectively) stations in Finland as well as the SPC station in Italy. All together approximately 73 years of measurements for datasets collected in the four stations was processed. The proposed automatic method achieved the best classification results for Ia-type and Ib-type events for the SPC station without any annotated information, showing the potential to apply the new method to other stations. The growth rates automatically determined by the new method are consistent with the manually calculated growth rates. The start times and end times determined by the new method illustrated that the start times may be controlled by normal distributions but the end times presented more than one peak in their histograms for the Värriö and Hyytiälä stations.
In the future, the proposed method can be applied to datasets collected in different stations and over different time periods to produce comparable results, which will aid scientists in understanding the underlying mechanisms of NPF and assessing the impact of atmospheric aerosol particles on the climate.

Appendix A: Object detection and instance segmentation
Object detection is one of the fundamental and challenging tasks in computer vision. Generally, some object detection techniques focus on detecting different kinds of objects such as cats and cars, while others focus on specific scenarios such as face detection (Zou et al., 2019). With the development of deep learning, object detection achieves unprecedented improvements. The techniques can be roughly divided into one-stage detection such as single-shot multi-box detectors (Liu et al., 2016) and two-stage detection such as Faster R-CNN . Usually, one-stage detection is much faster, while two-stage detection can achieve better detection accuracy. Instance segmentation, however, tries to delineate each distinct object of interest in a more precise manner. In other words, instance segmentation segments an object according to its spatial layout. Compared with a bounding box, which needs four corner positions to cover an object, an instance segmentation model needs to find all the pixels that belong in the object.

Appendix B: Misclassified days in the Hyytiälä dataset
Example surface plots for the NE, Undef, and BD days misclassified by the Mask R-CNN model are shown (Figs. B1, B2, and B3). Misclassified means that these days were classified as NE, Undef, and BD days by scientists, while the Mask R-CNN model classified these days as event days. If the threshold for the objectiveness score is 0.90, then there are 18, 419, and 20 misclassified days for the NE, Undef, and BD types, respectively. All the NE and BD days are shown in Figs. B1 and B3, but only the first 20 Undef days are shown in Fig. B2. These misclassified days can help readers understand the detection capability of the Mask R-CNN model.   Code and data availability. Code is available at https://github. com/cvvsu/maskNPF.git (Su et al., 2022). Datasets collected in the three SMEAR stations are available at https://smear.avaa.csc.fi/ ). The dataset collected in the San Pietro Capofiume station is available from Jorma Joutsensaari on request (Joutsensaari et al., 2018).