Characteristics, primary sources and secondary formation of water soluble organic aerosols in downtown Beijing

Water soluble organic compounds (WSOC) account for a large proportion of aerosols and play a critical role in various atmospheric chemical processes. In order to investigate the primary sources and secondary production of WSOC in 10 downtown Beijing, the day and night PM2.5 samples in January (winter), April (spring), July (summer) and October (autumn) of 2017 were collected and analyzed for WSOC and organic tracers in this study. WSOC showed the highest concentration in winter and comparable levels in the other seasons, and dominated by its hydrophobic fraction (HULIS-C). Some typical organic tracers were chosen to evaluate the emission strength and secondary formation for the major sources of WSOC. According to the diurnal patterns and correlation coefficients with the key influencing factors, most SOA tracers were 15 closely related to gaseous photooxidation in summer, but mainly generated via aqueous-phase processing in other seasons. These organic tracers were applied into the positive matrix factorization (PMF) model to calculate the source contributions of WSOC as well as its hydrophobic and hydrophilic portions. The secondary sources contributed over 50 % to WSOC, with higher contributions in summer (75.7 %) and winter (67.7 %), and the largest contributor was aromatic SOC. Besides, the source apportionment results under different pollution levels suggested that controlling biomass burning and the aromatic 20 precursors would be effective to reduce WSOC during the haze episodes in cold seasons. The possible formation mechanisms of the total secondary organic carbon (SOC) as well as hydrophobic and hydrophilic SOC were also explored in this study. The aqueous-phase process appeared to dominate in the SOC formation in winter and spring, while gas-phase photooxidation played a dominant role in summer. Besides, the gaseous photooxidation played a major role in the generation of hydrophobic SOC, whereas aqueous-phase reactions posed vital effects on the formation of hydrophilic SOC. 25 https://doi.org/10.5194/acp-2020-726 Preprint. Discussion started: 10 August 2020 c © Author(s) 2020. CC BY 4.0 License.


S1 Estimation for the sampling artifacts of organic aerosols.
Sampling of organic carbon is accompanied by both positive and negative artifacts. The positive artifact is due to the adsorption of gaseous organics to the sampling filter, and the negative artifact is caused by the evaporation of collected particulate organic carbon. To eliminate the positive artifact, a denuder can be placed upstream the sample filter to remove the gaseous organics by diffusion to the adsorbent surface (Cheng et al., 2009). The use of a denuder in the sampling system has been reported in previous studies (Eatough et al., 1993(Eatough et al., , 1999Mader et al., 2001;Matsumoto et al., 2003;Viana et al., 2006;Cheng et al., 2009Cheng et al., , 2010Cheng et al., , 2012Kristensen et al., 2016). The use of a denuder may induce a larger negative artifact, however, as the removal of gaseous organics can enhance the evaporation of particulate OC. Thus a backup filter should also be included in the sampling system (Cheng et al., 2009). Besides, the flow rate passing through the denuder was very low in most studies (Matsumoto et al., 2003;Viana et al., 2006;Cheng et al., 2009Cheng et al., , 2010Cheng et al., , 2012Kristensen et al., 2016). This might be due to the significantly decreased removal efficiency of the denuder as the air flow rate increased (Cui et al., 1998;Ding et al., 2002). To collect enough samples for the accurate measurement of trace organic species, the flow rate of 1.05 m 3 min -1 was chosen in this study. The air flow rate of about 1.05 m 3 min -1 has been frequently used in the field sampling of organic aerosols (Kawamura et al., 2013;Verma et al., 2012Verma et al., , 2015Li et al., 2018;Ma et al., 2018;Huang et al., 2020). At this flow rate, a denuder with a high removal efficiency is hardly commercially available.
We estimated the sampling artifact of OC based on the literature results. Firstly, different OC fractions which have distinct volatility show different adsorption behavior. Besides, the adsorption behavior of the same OC fraction also vary with meteorological conditions. Cheng et al. (2015) compared the concentrations of different OC fractions (OC1, OC2, OC3, OC4) on bare quartz filters with those on denuded quartz filters in four seasons of Beijing, and the results are listed in Table S1.
The contributions of different OC fractions measured in this study are also shown in Table S1.
In addition, the positive artifact of OC also depends on the sampling procedure. McDow (1986) systematically investigated the effect of sampling procedure on the OC measurement. The adsorption of organic vapors on the bare quartz filters (Cpositive artifact) was a function of the sampling duration (t) multiplied by the face velocity (v) as follows: (1) where the face velocity (v, cm·s -1 ) is the ratio of the flow rate (cm 3 ·s -1 ) to the sampling area of the filter (cm 2 ), ρi is the concentration of adsorptive vapor i (g·cm -3 ), and εi is a constant which can be defined as: ( 2) where l is the effective filter thickness. The average thickness of the quartz filter used in this study was 463 μm. The other parameters are all constants.
Therefore, it can be calculated that εi > 1/l > 20 cm -1 , and 1-e -εvt ≈ 1. Hence, the positive artifact (Cpositive artifact) is inversely proportional to the product of the sampling duration and the face velocity (v×t). The face velocity of Cheng et al. (2015) was 9.8 cm·s -1 , while that in our study was 47.3 cm·s -1 . The sampling duration of Cheng et al. (2015) was 24 h, while that in our study was 12 h. That is to say, the positive artifact of Cheng et al. (2015) was about 2.4 times higher than that in our study.
Based on the literature results and taking into account the above factors (seasons, OC fractions, sampling procedure), the contribution of positive artifact to the measured OC was estimated to be 2.3 %, 1.4 %, 9.9 %, and 2.2 % during the sampling periods in winter, spring, summer and autumn respectively in this study, which is roughly acceptable.
To further estimate the impact of gas-particle partitioning and potential reactions occurring on filters, we overlapped two quartz filters and took samples at a flow rate of 1.05 m 3 ·min -1 for a duration of 12 h. The organic tracers selected in this study were measured. The organic tracers on the backup filters typically originate from three sources: (1) adsorption of the gas-phase organic species; (2) adsorption of the semi-volatile species evaporated from the front filter; (3) secondary formation from the adsorbed organic vapors on the backup filter. Except for cis-pinonic acid, the tracer concentrations on the backup filter were all less than 5 % of those on the front filters, while the concentration of cis-pinonic acid on the backup filter was 21.6 % of that on the front filter. This result suggested that the sampling procedure in this study might bring some uncertainties for the measurement of cis-pinonic acid, and the sampling artifact was not significant for the other organic tracers.

S2-1 Chemical analysis of water-soluble ions and water-soluble organic carbon (WSOC)
To analyze the concentrations of water-soluble ions and water-soluble organic carbon (WSOC), a punch of each sampled filter was cut into pieces and extracted with 40 mL ultrapure water (>18.2 MΩ) for 30 min, then passed through a 0.45 μm PTFE filter. Five cations (Na + , NH4 + , K + , Mg 2+ , Ca 2+ ) and four anions (Cl -, NO3 -, SO4 2-, C2O4 2-) were measured using the ion chromatography (Dionex 600), with the methanesulfonic acid (MSA) solution as cationic eluent and the potassium hydroxide (KOH) solution as anionic eluent. The concentration of WSOC was measured by a TOC analyzer (Shimazdu TOC-L CPN). The standard solution of total carbon (TC) was prepared by potassium acid phthalate (C8H5KO4), and that of inorganic carbon (IC) was made by sodium carbonate (Na2CO3) and sodium bicarbonate (NaHCO3). Total organic carbon (TOC) was calculated as total carbon minus inorganic carbon.

S2-2 The parameter settings of GC/MS/MS for analyzing organic tracers
The derivatives were immediately analyzed by a Shimadzu TQ8040 gas chromatography triple quadrupole mass spectrometry (GC/MS/MS). A JA-5MS capillary column (30 m × 0.25 mm i.d., film thickness 0.25 μm) was used as the GC column and helium was used as the carrier gas (1.0 mL min -1 ). The injector was set splitless at a temperature of 290°C. The programmed oven temperature increased from 70°C to 150°C at 2°C min -1 , then to 200°C at 5°C min -1 , then to 300°C at 25°C min -1 , and stayed at 300°C for 6 min. The MS was operated in EI mode at 70 eV with a scan range of 50-650 amu.

Uncertainties of the input data
According to the User Guide of PMF5.0 (Norris et al., 2014), the uncertainties of the target species can be calculated as follow: where Unc is the data uncertainty, c is the concentration of the target species, MDL is the method detection limit, and P is the error fraction. Since the User Guide did not give the calculation method for the error fraction (P), we estimated the P values referring to the measured relative standard deviations (RSD) of the target species. The RSD values were calculated by measuring six identical portions of an ambient sample. P was set as 10 % when RSD < 10 %, and 15 % or 20 % when RSD > 10 %.

Selection of base solutions
The chemical components input into the PMF model were selected based on our understanding of the possible WSOC sources (Norris et al., 2014). Interpretability was usually considered to be the most important factor for selecting the optimum PMF solution (Shrivastava et al., 2007;Huang et al., 2014). The interpretable solutions are those which group tracers from different sources into distinct factors, while those grouping tracers from multiple sources into the same factor, distributing tracers for one source across multiple factors, or including factors with no distinct grouping of species are judged less interpretable (Shrivastava et al., 2007;Sowlat et al., 2016). In some previous literature, the optimal solution was defined as that with the maximum number of factors which had distinctive groupings of species, and explained at least 90 % of the total variable (Shrivastava et al., 2007). In this study, PMF was run repeatedly by changing the number of factors and the start seed numbers.
The base solution was selected based on: (1) the interpretability of the derived factor profiles and the temporal variations of source contributions; (2) the reconstruction of the total variable and R 2 of input organic tracers (R 2 >0.90); (3) the scaled residuals of the input species.
As presented in Figure S1, the 7-factor solution separated cholesterol (the tracer for cooking) into multiple sources. It was difficult to explain why cholesterol appeared in the factor profiles of biomass burning, dust and fresh biogenic SOC. Besides, this solution led to poor fits for cholesterol (R 2 = 0.28) and cis-pinonic acid (R 2 = 0.32), which were the key tracers selected in this study.
Therefore, the 7-factor solution was not selected. As shown in Figure S2, the 8-factor solution also distributed cholesterol into multiple factors. This solution also resulted in a poor fit (R 2 = 0.28) for cholesterol. Therefore, the 8-factor solution was not chosen in this study. As shown in Figures S5-7, the solutions with 4 to 6 factors all showed poor interpretability for the derived factor profiles and poor fits for the key organic tracers. The 10-factor solution involved a factor without any tracer of high loading to indicate a specific source, thus could not be explained. By comparing the results with different factor numbers, the solution with 9 factors ( Figure S3) was thought to be the most interpretable one.

Diagnostics for the base model run
The selected 9-factor solution was converged, and Q(Robust) was similar to Q(True). As shown in Figure S4, most of the input species showed normally distributed residuals between -2.0 and +2.0, indicating that these species were well modeled. The R 2 of WSOC and HULIS were 0.94 and 0.93, respectively, and the R 2 for all the organic tracers were higher than 0.96, again suggesting that these species were well modeled.

Error estimation
The selected base solution was subjected to displacement (DISP) and bootstrap (BS) tests for error estimation. For the DISP test, the percent change in Q (%dQ) was less than 0.1 %, indicating that this solution was the global minimum (Paatero et al., 2014). No factor swapped for any value of dQmax, indicating little rotational ambiguity in this solution (Paatero et al., 2014). For the BS test, the factor of "cooking" was mapped 79 % of the runs, the factor of "other primary combustion sources" was mapped 69 % of the runs, while other factors were mapped more than 91 % of the runs.
The BS results indicated some uncertainties for the factors of cooking and other primary combustion sources, while the other factors were relatively stable. Brown et al. (2015) indicated that the unstable PMF solution might be due to too many factors involved. To investigate the effect of factor number on the stability of solutions, the BS results for solutions with different factor numbers were compared and shown in Table S3. As shown in Table   S3, reducing the number of factors did not significantly increase the successful rates of BS mapping, but decreased the interpretability of the derived factor profiles. As recommended by the previous studies (Norris et al., 2014;Paatero et al., 2014), some constraints can be defined based on the priori information of the sources to reduce the variability of the solution.

Constrained model run
Bozzetti et al. (2017) exploited the markers' source specificity to set constraints for the profiles, so as to solve the problem of large mixtures of PMF factors associated with contributions of markers from different sources. They treated the contribution of unrelated source-specific markers as zero for each source, while non-source-specific variables were freely apportioned by the PMF algorithm. In addition, they set constraints for primary markers and combustion-related markers that can be seen as negligible in the secondary factors.
In the constrained model run, we set the constraints similar to those of Bozzetti et al. (2017), with a slight difference that we set the constraints by "soft pulling" so as to obtain a stable solution with a minimal change in the Q-value (dQ). The constraints were set as follows: (1) Levoglucosan was pulled up maximally with a limit of 0.25 % dQ for the factor of "primary biomass burning"; (2) Cholesterol was pulled up maximally with a limit of 0.50 % dQ for the factor of "cooking"; (3) Sulfate, cis-pinonic acid and 2-methylerythritol were pulled down maximally with limits of 0.25 % dQ for the factor of "other primary combustion sources"; (4) Phthalic acid was pulled up maximally with a limit of 0.25 % dQ for the factor of "aromatic SOA". The dQ(Robust) for all the constraints were 0.93 % in the final constrained model run, which was acceptable (below 1 %) as recommended by the PMF user guide (Norris et al., 2014). As shown in Table S4, all the factors were mapped more than 94 % of the runs, suggesting that this solution was stable. Thus the constrained 9-factor solution was chosen as the final solution.

Factor identification
The source profiles of the final solution are shown in Figure 4. Factor 1 showed high levels of levoglucosan and EC, thus was interpreted as the direct emissions from biomass burning. Factor 2 exhibited a high level of cholesterol, thus was regarded as cooking. Factor 3 showed a large fraction of EC that could not be explained by the direct emissions from biomass burning, suggesting that it was the direct emissions from other combustion sources, such as coal combustion, traffic emissions and waste burning. Factor 4 was featured by high loadings of Mg 2+ and Ca 2+ , thus was considered as dust. No significant EC but high fractions of 4-methyl-5-nitrocatechol and phthalic acid were found in Factor 5 and Factor 6, respectively, which were regarded as SOC from biomass burning (biomass burning SOC) and aromatic precursors (aromatic SOC), respectively. Factor 7 exhibited a high level of cis-pinonic acid, thus was explained as fresh biogenic SOC. Factor 8 was characterized by high fractions of 2-methylerythritol and 3-hydroxyglutaric acid, which are the end products from isoprene and monoterpenes respectively, thus was identified as aged biogenic SOC. Note that cis-pinonic acid and 3-hydroxyglutaric acid were not grouped into the same factor though they are both SOA tracers of monoterpenes, owing to their different oxidation degree as discussed above. Factor 9 covered the secondary components (such as SO4 2-, NO3 -, NH4 + and C2O4 2-) that can not be well explained by the identified sources above, thus was considered to be SOC from other sources.
It seems that levoglucosan and 4-methyl-5-nitrocatechol should be distributed into the same factor.
Nevertheless, in fact, even though we reduced the factor number from nine to five, levoglucosan and 4-methyl-5-nitrocatechol could not be merged into one factor ( Figure S1, 2, 3, 5, 6). When the factor number decreased to four, levoglucosan and 4-methyl-5-nitrocatechol were merged into one factor ( Figure S8). However, this solution was less interpretable, and resulted in poorer fits for most of the input species (cholesterol: R 2 =0.17; cis-pinonic acid: R 2 =0.25; Ca 2+ : R 2 =0.67; Mg 2+ : R 2 =0.73; NO3 -: R 2 =0.75; etc). Furthermore, the slope of the fitting equation for the observed and predicted values of 4-methyl-5-nitrocatechol was even only 0.31, that is, the high values of 4-methyl-5-nitrocatechol in winter were not reproduced by the 4-factor solution. Hence, the 4-factor solution was also excluded in this study.
It was indeed interesting that levoglucosan and 4-methyl-5-nitrocatechol were not distributed in the same factor, though they showed strong correlation with each other. We attempted to explain this phenomenon as follows. The ratio of 4-methyl-5-nitrocatechol to levoglucosan showed significantly higher values (p<0.01) in winter (0.071±0.029) than in other seasons (0.010±0.009), which implied different types of biomass burning sources (primary and secondary). If they were merged into one factor, the ratio of 4-methyl-5-nitrocatechol to levoglucosan was regarded to be constant throughout the year, which was not the truth. According to the uncertainty estimation method for the input species (Equation 2), the data with lower concentrations usually have lower uncertainties, thus may have a larger impact on the Q value. Taking the 4-factor solution ( Figure S7) as an example, when these two tracers were merged into one factor, to minimize the Q value, the algorithm in the PMF model tended to assign a low value for the ratio of 4-methyl-5-nitrocatechol to levoglucosan in the factor profile of biomass burning (i.e. 0.024 in Factor 4). As shown in Figure S8, this ratio (orange line) was closer to the regression slope in other seasons (0.017), but much lower than that in winter (0.096). As a consequence, the high concentration of 4-methyl-5-nitrocatechol in winter could not be reproduced at all by such PMF solution. In conclusion, the solution which merged these two tracers into the same factor might bring about large uncertainties, and fail to reproduce the peak values of 4-methyl-5-nitrocatechol over the study period in winter.
Fresh biomass burning emissions show a high fraction of anhydrosugar, such as levoglucosan.
The relative intensity of anhydrosugar decreased due to the degradation or oxidation reactions biomass burning and biomass burning SOA using the PMF model (92 samples, which was less than that in our study). In this study, as shown in Figure 4, Factor 1 had high fractions of levoglucosan and EC, but a low fraction of 4-methyl-5-nitrocatechol, thus was considered as the direct emission from biomass burning. The concentration ratio of levoglucosan to WSOC in this factor was 0.085 μg·μg -1 , similar to that measured in the primary combustion of crop straws (0.097 μg·μg -1 ), wood (0.081 μg·μg -1 ) and leaves (0.095 μg·μg -1 ) in North China (Yan et al., 2018). Factor 5 showed a high level of 4-methyl-5-nitrocatechol, but low loadings of EC and levoglucosan, thus was identified as biomass burning SOA.

6-2 The interpretation for Factor 7, Factor 8, and Factor 9.
As presented in Figures S1-3 and Figures S5-7, even if we reduced the factor number from nine to four, 2-methylerythritol, 3-hydroxyglutaric acid and cis-pinonic acid could not be merged into the same factor. Large fractions of 3-hydroxyglutaric acid and 2-methylerythritol were usually grouped into one factor, since they strongly correlated with each other (r=0.94, p<0.01). Cis-pinonic acid could not be distributed in this factor since it correlated less strongly with 2-methylerythritol (r=0.51, p<0.01) and 3-hydroxyglutaric acid (r=0.58, p<0.01). As stated in Section 3.2, cis-pinonic acid is a lower-generation oxidative product from monoterpenes, while 2-methylerythritol and 3-hydroxyglutaric acid are more aged products from isoprene and monoterpenes, respectively (Kourtchev et al., 2009). Hence, Factor 7 with a high level of cis-pinonic acid was interpreted as the fresh biogenic SOC, and Factor 8 with high loadings of 2-methylerythritol and 3-hydroxyglutaric acid was interpreted as aged biogenic SOC. As shown in Figure 5, the seasonal variation of their source contributions also supported this interpretation.
Since the major fraction of 3-hydroxyglutaric acid was distributed in Factor 8, it was not proper to interpret Factor 9 as monoterpene SOC. In fact, as shown in Figure S1-3 and Figure S5-7, a minor fraction of 3-hydroxyglutaric acid was always distributed in factors other than the biogenic SOC. It was more interpretable when this minor fraction of 3-hydroxyglutaric acid was distributed in the same factor together with SO4 2-, NO3 -, NH4 + and C2O4 2-. In this case, Factor 9 of the selected 9-factor solution could be interpreted as a mixed secondary source and explain the secondary species that were not well fitted by other identified secondary sources. Similar factor profile has also been resolved in the literature, and was usually interpreted as the "inorganic-rich SOA" (Huang et al.,

6-3 The interpretation for Factor 3 (Other primary combustion sources).
As shown in Figure 4, for the constrained 9-factor solution, Factor 3 showed a significant level of EC that could not be explained by direct emissions of biomass burning, implying that it could be associated with the primary emissions from other combustion sources, such as coal combustion, traffic emissions, and waste burning, etc. Indeed, a minor fraction of SO4 2-(20.8 %), NH4 + (19.3 %) and phthalic acid (20.0 %) were also distributed in Factor 3. However, in fact, previous studies have indicated that SO4 2-, NO3 -, NH4 + and phthalic acid can also be directly emitted from coal combustion Table S1 The ratio of the OC concentrations on the bare quartz filters to those on the denuded quartz filters in Cheng et al. (2015), as well as the contribution of different OC fractions measured in this study.
The ratio of OC on bare quartz filters to denuded quartz filters (Cheng et al., 2015) The       (2015), "b" refers to the constructed PM2.5 below 30 μg m -3 , "c" between 30 μg m -3 and 90 μg m -3 , and "d" above 90 μg m -3 . e,f In Cheng et al. (2011), "e" was measured using the denuded quartz filter and "f" was measured using the un-denuded (bare) quartz filter.        The time series of the measured WSOC and the reconstructed WSOC based on the 9-factor solution. Figure S5. A 6-factor solution resolved by the PMF model. Figure S6. A 5-factor solution resolved by the PMF model. Figure S7. A 4-factor solution resolved by the PMF model.