Calculating the aerosol asymmetry factor based on measurements from the humidified nephelometer system

The aerosol asymmetry factor (g) is one of the most important factors for assessing direct aerosol radiative forcing. However, little attention has been paid to the measurement and parameterization of g. In this study, the characteristics of g are studied based on field measurements over the North China Plain (NCP) using the Mie scattering theory. The results show that calculated g values for dry aerosol can vary over a wide range (between 0.54 and 0.67). Furthermore, when ambient relative humidity (RH) reaches 90 %, g is significantly enhanced by a factor of 1.2 due to aerosol hygroscopic growth. For the first time, a novel method of calculating g based on measurements from the humidified nephelometer system is proposed. This method can constrain the uncertainty of g to within 2.56 % for dry aerosol populations and 4.02 % for ambient aerosols, providing that aerosol hygroscopic growth is taken into account. Sensitivity studies show that aerosol hygroscopicity plays a vital role in the accuracy of predicting g.


Introduction
In addition to aerosol optical depth and aerosol singlescattering albedo, the aerosol phase function is the most important factor for assessing direct aerosol radiative forcing (DARF) (Andrews et al., 2006;Russell et al., 1997).The Henyey-Greenstein phase function (PF HG ) is a widely used method to parameterize the phase function (Toublanc, 1996;Boucher, 1998;Pandey and Chakrabarty, 2016) because it uses the aerosol asymmetry factor (g) as the only free pa-rameter.The PF HG is expressed as where θ is the angle between the incident light direction and the scattered light direction.In this respect, the free parameter g can reflect the angular aerosol scattering energy distribution.g is defined as follows: where P (θ ) is the normalized scattering phase function.As a result, g can be a computationally efficient parameter to replace the phase function in the study of aerosol radiative transfer properties (Toublanc, 1996;Hansen, 1969;Boucher, 1998).This replacement proves to be useful and has been widely accepted in previous studies (Hansen, 1969;Wiscombe and Grams, 1976;Sagan and Pollack, 1967;Andrews et al., 2006); however significant bias may arise in g-related PF HG when estimating photo-dissociation rates (Toublanc, 1996) and aerosol radiative forcing effects (Boucher, 1998).
In the past, few studies have assessed the deviation when replacing the ambient phase function with the g-related PF HG (Pandey and Chakrabarty, 2016;Boucher, 1998;Wiscombe and Grams, 1976), and there are no known studies that use field measurements of aerosol optical properties to estimate the bias.Moreover, variations in g can influence the evolution of the atmospheric vertical structure by effecting the atmospheric radiative distribution.Kudo et al. (2016)

also found
Published by Copernicus Publications on behalf of the European Geosciences Union.
that the vertical profile of the asymmetry factor plays an important role in altering vertical variations in the solar heating rate.Marshall et al. (1995) reported that a 10 % overestimation of g can systematically reduce aerosol climatic forcing by 12 % or more.Furthermore, Andrews et al. (2006) found that a 10 % reduction in g would result in a 19 % overestimation of atmosphere radiative forcing at the top of atmosphere (TOA).Therefore, an accurate estimation of g has the potential to greatly improve the assessment of the aerosol radiative effect.
There are several methods available to derive the g of aerosol particles under dry and ambient conditions, respectively.Horvath et al. (2016) measured the phase function of aerosols, calculated the g of aerosols, and found that the g-related PF HG can be used as a good approximation of the measured phase function.Many studies have used the Mie model (Bohren and Huffman, 2007) to calculate the phase function and have proven its reliability (Andrews et al., 2006;Marshall et al., 1995;Bian et al., 2017).Comprehensive attempts have been made to relate g to the hemispheric backscatter fraction (b).The value of b is the ratio of light scattered into the backward hemisphere compared to total light scattered in all directions (Wiscombe and Grams, 1976;Andrews et al., 2006;Horvath et al., 2016), and is defined as follows: The main advantage of the backscatter ratio is that it can be measured with an integrating nephelometer equipped with a backscatter shutter (Charlson et al., 1974).
The free parameter g varies significantly for different aerosol types and different seasons.In previous studies, the g values have mainly been examined using the Mie scattering theory and the measured aerosol particle numbers size distribution (PNSD).D 'Almeida et al. (1991) suggested that g ranges from 0.64 to 0.83 at a wavelength of 500 nm depending on the aerosol type and the season; their study also found a mean g value of 0.67 at an ambient relative humidity (RH).Furthermore, Hartley and Hobbs (2001) reported a median g value of 0.7 for aerosols along the east coast of the United States.Formenti et al. (2000) measured Saharan dust aerosol and found that the aerosol g values ranged from 0.72 to 0.73.Biomass burning aerosols in Brazil were found to have a low g value of 0.54 (Ross et al., 1998).
Some studies have examined the impacts of aerosol hygroscopic growth on the parameter g (Hartley and Hobbs, 2001;Kuang et al., 2015;Andrews et al., 2006) and found that variations in g with RH can have significant influences on aerosol radiative effects (Kuang et al., 2015(Kuang et al., , 2016;;Andrews et al., 2006).Therefore, a parameterization scheme of g, which takes RH and aerosol hygroscopic growth into account, is necessary.
When exposed to the ambient atmosphere, aerosols can grow by taking up water, which causes their corresponding optical properties to change considerably.The κ-Köhler theory (Petters and Kreidenweis, 2007) is widely used to describe the hygroscopic growth of aerosol particles using a single aerosol hygroscopic growth parameter (κ) and the κ-Köhler equation, which is described as follows: where D d is the dry particle diameter; gf (RH) is the aerosol growth factor, defined as the ratio of the aerosol diameter at a given RH to the dry aerosol diameter (D RH /D d ); T is the temperature; σ s/a is the surface tension of the solution; M water is the molecular weight of water; R is the universal gas constant; and ρ w is the density of water.The aerosol hygroscopic growth parameter κ can be further used to investigate the influence of aerosol hygroscopic growth on aerosol optical properties (Tao et al., 2014;Kuang et al., 2015;Zhao et al., 2017) and aerosol liquid water contents (Bian et al., 2014).
According to the Mie theory, g is associated with aerosol particle number size distribution, the particle complex refractive index, the aerosol mixing state and ambient RH.At the same time, the aerosol morphology has a significant influence on g.Datasets from the humidified nephelometer system can partially account for all of these factors.The humidified nephelometer system consists of two parallel nephelometers, one of which measures dry aerosol scattering properties whilst the other measures aerosol scattering properties under well-controlled RH conditions.This system can give the light scattering enhancement factor (f RH ), which is defined as f RH (λ) = σ sca(λ) /σ sca(λ) , or the ratio of the aerosol scattering coefficient under given RH conditions to that under dry conditions.Each nephelometer can provide a scattering coefficient (σ sca ) and a back-scattering coefficient (β sca ) at three wavelengths (450, 525, and 635 nm).σ sca can be used to calculate the aerosol scattering Ångstrom index, which reflects the aerosol PNSD to some extent.In general, a larger value for the Ångstrom index always corresponds to a smaller predominant aerosol size.Variations in β sca and σ sca can be used to deduce the aerosol black carbon (BC) mixing state (Ma et al., 2012).At the same time, datasets from the humidified nephelometer system can also be used alone to measure the aerosol hygroscopicity and provide an overall hygroscopic parameter κ (Kuang et al., 2017).In conclusion, measurements from the humidified nephelometer system might be used for estimating g under given RH conditions.However, there is no clear relationship between the measured datasets from the humidified nephelometer and g.Furthermore, the nonlinear influence of the above listed factors on g also makes it difficult to parameterize the g.
The random forest machine learning model is a powerful technique that can be used for classification and nonlinear regression (Huttunen et al., 2016;Breiman, 2001;Hu et al., 2017).This model is a widely used nonparametric machine learning algorithm that has several strengths.First, it involves fewer assumptions regarding the dependence between observations and outcomes when compared with traditional parametric regression models.Second, strict relationships among variables are not needed before implementing the model.Third, this learning model requires far less computing resources than deep learning.Finally, this model has very low risk of over fitting by averaging over an ensemble of decision trees.Thus, the random forest machine learning model is used in this work to study the calculation of g based on the datasets of the humidified nephelometer system.
In this study, the Mie scattering theory and field measurements over the North China Plain (NCP) are used to study the characteristics of g.Section 2 describes the related datasets used in this study.Details of the study on the characteristics of g and the impacts of aerosol hygroscopic growth on g are shown in Sect.3.1.A new method, which is based on a random forest machine learning model, is introduced to calculate g in Sect.3.2.We also discuss the impacts of g variations on the uncertainties of DARF in Sect.3.3, and the corresponding results are presented in Sect.4.3.Section 4.1 gives the calculated characteristics of g and Sect.4.2 proves the feasibility of using the machine learning model to calculate g.At the same time, this method is validated by the ambient aerosol phase function measured with a charge-coupled device-laser aerosol detective system (CCD-LADS).Conclusions are given in Sect. 5.

Instruments and datasets
Datasets used in this study come from three field campaigns, which were conducted at three different sites in the NCP.These three field measurements were conducted at Gucheng in Hebei Province (Gucheng, 39  S1 in the Supplement.The PKU station is located in the northwest of Beijing, between the 4th and 5th ring road.It is 11 km from the center of the megacity of Beijing, which is adjacent to Hebei Province and the megacity of Tianjin.In the abovementioned three areas, industrial manufacturing has led to heavy air pollution.Datasets for the PKU station are representative of urban aerosols in the NCP.Gucheng is located between two megacities (120 km from Beijing and 190 km from Shijiazhuang) in the NCP; therefore, the pollution conditions of Gucheng are a good representation of the continental background in the NCP.Details regarding the Gucheng station can be found in a study by Kuang et al. (2017).The UCAS station is 60 km away from the center of Beijing and is at the edge of the NCP, which makes it suitable for measuring the regional pollution properties of the NCP (Ma et al., 2016).More details about the measurement sites are available in Sect.S1 of the Supplement.
Table 1 lists the information for the field campaigns and the datasets used in this study.During the campaigns, sampled aerosols that had an aerodynamic diameter of less than 10µm are selected by an impactor (Mesa Labs, Model SSI2.5) at the inlet.These aerosols are then dried to below 30 % RH with a Nafion drying tube and lead to each instrument.Aerosol PNSDs ranging from 3 nm to 10 µm are measured using a scanning mobility particle sizer spectrometer (SMPS, TSI Inc., model 3936) and an aerodynamic particle sizer spectrometer (APS, TSI Inc., model 3321) with a temporal resolution of 5 min.Black carbon (BC) mass concentrations are measured by a multi-angle absorption photometer (MAAP model 5012, Thermo, Inc., Waltham, MA USA) at UCAS and by an Aethalometer (AE33) (Hansen et al., 1984;Drinovec et al., 2015) at PKU and Gucheng.The aerosol σ sca is measured at wavelengths of 450, 525, and 635 nm by an Aurora 3000 nephelometer and the corresponding values are recorded every minute (Müller et al., 2011).
The f RH is measured by a self-constructed humidified nephelometer system.In this system, a humidifier is used to control the RH of the sample aerosol and σ sca is measured for each of the controlled RH levels.The sample aerosol is humidified through a Gore-Tex tube, which is surrounded by a circulating water layer in a stainless steel tube.The RH is changed by changing the temperature of the circulating water, which is controlled by a water bath and software.For each cycle, the RH points are set to range from about 50 to about 90 % over 45 min.For most of the cases, the aerosol PNSDs are consistent over the cycle.These cycles of f RH values are abandoned when either the measured maximum or the minimum σ sca values are beyond the range of 1.4 and 0.6 times the mean measured scattering coefficient of each cycle.The humidified nephelometer is described in detail by Kuang et al. (2017).
An ambient aerosol phase function with a time resolution of 5 min is measured at UCAS using a CCD-LADS.This system consists of a continuous laser, two charge-coupled device cameras, and corresponding fish eye lenses.The wavelength of the laser is 532 nm and a quarter-wave plate was mounted in front of the laser emitter to change the polarization state of the laser from linear to circular.The CCD-LADS can measure the ambient aerosol phase function at a wide angular range of 10-170 • with a high resolution of 0.1 • .More details of the measurement system can be found in Bian et al. (2017).The Mie model (Bohren and Huffman, 2007) is applied to calculate the characteristics of g Mie .When running the Mie model, aerosol PNSD, aerosol complex refractive index, BC mixing state, and BC mass concentration are essential.Its results include the aerosol phase function, and g Mie can be calculated using Equation 2. Mixing states of the BC come from field measurements.In the work by Ma et al. (2012), the mixing states of BC in the NCP are presented as both core-shell mixed and externally mixed.Ma et al. ( 2012) also provides the ratio of BC mass concentrations under an externally mixed state, M ext_BC , to total BC mass concentration, M BC as follows: The mean value of r ext_BC = 0.51 (Ma et al., 2012) is used in this study.The size-resolved distribution of the BC mass concentration is the same as that used by Ma et al. (2012).The κ-Köhler theory and the Mie scattering model are employed to calculate g Mie under different RH conditions.When the aerosol grows by taking up water, the BC is treated as a non-hygroscopic and insoluble core.The real time value κ, which is derived from the measurement of f RH , is used to account for aerosol hygroscopic growth.For each RH value, the growth factor can be calculated based on Equation 4. The corresponding ambient aerosol PNSD at a given RH can also be determined by applying the κ and Equation 4. The refractive index ( m), which accounts for water content in the particle, is derived as a volume mixture between the dry aerosol and water (Wex et al., 2002): where f v, dry is the ratio of the dry aerosol volume to the total aerosol volume under a given RH condition; m aero, dry is the refractive index for dry ambient aerosols; and m water is the refractive index of water.
The refractive indices of BC, non-light-absorbing aerosols, and water, which are used in this study, are 1.8 + 0.54i (Kuang et al., 2015), 1.53 + 10 −7 i (Wex et al., 2002), and 1.33 + 10 −7 i, respectively.Then, the corresponding g values under the given RH and PNSD can also be calculated.More details on using the Mie model to calculate the aerosol phase function for different RH conditions can be found in Zhao et al. (2017).
3.2 Calculating g using the random forest machine learning model (g ML ) In this study, the random forest machine learning model from the scikit-learn machine learning library (Hu et al., 2017;Pedregosa et al., 2011) was used to calculate g.The random forest model has two parameters: the number of input variables (n pre ) and the number of trees grown (n tree ).In this study, the n pre and n tree are determined by minimizing the relative difference of the g ML and g Mie .Details of choosing the values of n pre and n tree are shown in Sect.S2.The n pre and n tree are set as eight and thirty-two in this study, respectively.The eight input parameters include the three dry scattering coefficients, the three dry backscattering coefficients, the RH, and κ.
The measured datasets are divided into two parts: the training data for the random forest model and the testing data.All training datasets come from field measurements at Gucheng station, whereas the datasets from PKU are employed to test the accuracy of the model.With split datasets from different sites, the feasibility of the random forest model in the NCP can be guaranteed.Before calculating g Mie , we compare the measured σ sca from the dry nephelometer and calculate σ sca from the Mie scattering model.These data, where the relative difference between the measured and calculated σ sca is within 30 %, are used for the following analyses; therefore, instrument measurement inaccuracy can be avoided to some extent.More details regarding the data used is shown in Sect.S3.
To further avoid measurement uncertainties when training the random forest machine learning model, both the required input parameters and the predictors (g values) come from the calculations of the Mie scattering model.The Mie scattering model used aerosol PNSD and BC measurements from the field campaign in Gucheng.For each measured PSND and BC, the corresponding σ sca and β sca under dry conditions at 450, 525, and 635 nm are modeled based on the Mie theory.With the concurrently measured κ values from the humidified nephelometer, the g Mie values under different RH can also be determined.Then the modeled σ sca , β sca under dry condition, the κ values, and the RH are used as the input data for the model and the corresponding g Mie values are used as the prediction data.

Aerosol DARF estimations
Earth-atmosphere systems can be significantly influenced by aerosols through the scattering and absorption of energy.In this study, the Santa Barbara DISORT (discrete ordinates radiative transfer) Atmospheric Radiative Transfer (SBDART) model (Ricchiazzi et al., 1998) is employed to estimate the DARF.The characteristics of DARF relating to variations in g are studied.
The instantaneous DARF is calculated at the TOA for cloud-free conditions.DARF is defined as the difference between radiative flux at the TOA under present aerosol conditions and aerosol-free conditions: where (f a ↓ −f a ↑) is the downward radiative irradiance flux with given aerosol distributions and (f m ↓ −f m ↑) is the radiative irradiance flux under aerosol-free conditions.The DARF at 50 km is calculated because almost all of the aerosols are distributed within the height of 50 km in the parameterization scheme (Liu et al., 2009).Wavelengths in the range of 0.25 to 4 µm are calculated for irradiance in this study.
Input data for the SBDART are as follows: vertical profiles of the aerosol optical properties, which include the aerosol extinction coefficient (σ ext ), aerosol single scattering albedo (SSA), and g.All data have a vertical resolution of 50 m and come from the results of the Mie scattering model and the parameterized aerosol vertical distributions.Methods for parameterization and calculation of the aerosol optical profiles can be found in Sect.S4 or in Kuang et al. (2016) and Zhao et al. (2017).Atmospheric meteorological parameter profiles come from the results of the intensive radiosonde observations at the Meteorological Bureau of Beijing (39 • 48 N, 116 • 28 E) at 13:30 LT from July to September in 2008.Kuang et al. (2016) studied these measured profiles and found that the vertical distributions of these parameters, which include profiles for water vapor, pressure, and temperature, can be used as a good representation of the meteorological parameter profiles in the NCP during summer.The corresponding measured mean results during field measurement are used in this study and the details of these profiles are shown in Sect.S4.Surface albedo values are obtained from the Moderate Resolution Imaging Spectroradiometer (MODIS) V005 Climate Modeling Grid (CMG) Albedo Product (MCD43C3).The mean results of the surface albedo of Beijing from July to September in 2008 are used.The remaining input data for the SBDART are set to their default values (Ricchiazzi et al., 1998).

Results and discussion
4.1 Characteristics of g Mie 4.1.1Characteristics of g Mie at different sites Figure 1 gives the statistical results for the calculated g properties at Gucheng, PKU, and UCAS.The RH values at the three sites show almost the same diurnal variation pattern (Fig. 1a, b, and c).The RH reaches a peak in the morning at approximately 06:00 LT , and then reaches its lowest value at approximately 16:00 LT in the afternoon.However, the mean values of RH are 77.7 % ± 20.9 % at Gucheng, 47.8 % ± 20.8 % at PKU, and 33.49% ± 15.22 % at UCAS.The g Mie values under dry conditions that are calculated by the measured PNSD have almost no diurnal patterns.The g Mie values at PKU (0.614 ± 0.025) are slightly lower than those at Gucheng (0.601 ± 0.021) and UCAS (0.595 ± 0.023) (Fig. 1d, e, and f).The difference in the g Mie values results from different aerosol properties at these sites.From Fig. S6, it can be noted that the peak diameter of the mean and median PNSD at Gucheng is located around 150 nm.However, the peak diameter of the mean and median PNSD at PKU is located at around 100 nm.The peak values of the mean and median diameter of the aerosol PNSD at UCAS is located at around 60 nm.At the same time, there are large partitions of small particles that are lower than 60 nm at PKU and UCAS.However, these particles, which are lower than 100 nm, do not really contribute to the total aerosol scattering.The aerosol PNSD at PKU is more dispersed than that at the Gucheng and UCAS sites, which corresponds to a larger variation in the g values.From Fig. S6g, h, and i, the size distribution of the aerosol scatter coefficient at around 500 nm contributes less to the scatter coefficient at PKU than to the scatter coefficients at Gucheng and UCAS.Thus these particles with a diameter larger than 500 nm contribute more to the aerosol scattering coefficient.As g Mie increases with the aerosol diameter, the aerosol g Mie under dry conditions at PKU tends to be larger than that at Gucheng and UCAS.
However, ambient g Mie values have different patterns at different sites, as shown in Fig. 1g, h and i.The g Mie values have an RH-related diurnal pattern at Gucheng, with a mean value of 0.668 ± 0.073; although g Mie values show no diurnal variation at PKU and UCAS, where the mean values of g Mie are 0.639 ± 0.049 and 0.618 ± 0.033, respectively.The variations in ambient g Mie values mainly result from the variation in the aerosol hygroscopic growth under ambient conditions, which is highly related to the ambient RH.The g Mie value is significantly influenced by RH when the RH is higher than 80 %, which is be detailed in Sect.4.1.2.Ambient g Mie values at Gucheng, PKU, and UCAS can vary from 0.57 to 0.8, 0.55 to 0.76, and 0.56 to 0.72, respectively; this makes them comparable to g Mie values from Andrews et al. (2006), which range from 0.59 to 0.72.

Influence of RH on g
To assess the influence of RH on g, the g Mie values are calculated under different RH conditions for each aerosol PNSD.The statistical results of g Mie versus RH are shown in Fig. 2. The g Mie value has a wide variation, ranging between 0.54 and 0.67 with the mean value located at 0.61, under dry conditions.However, the mean g Mie value can change from 0.65 to 0.8 when the RH reaches 90 %.The g Mie enhancement factor, which is defined as the ratio of g Mie at a given RH and g Mie under dry conditions, can reach a mean value of 1.2 at an RH of 90 %, which means that the g Mie value under wet conditions is approximately 20 % higher than that under the dry conditions.This finding is consistent with that of Hartley and Hobbs (2001), who found that g is highly related to RH.
Contrary to RH, the aerosol complex refractive index has little influence on g and the uncertainties for g are less than 0.004 based on the Monte Carlo simulation of the g at different complex refractive index values.More details regarding the influence of the aerosol complex refractive index on g can be found in Sect.S6.  4.2 Calculating g ML using the machine learning model

Feasibility of using the random forest model
We establish two independent random forest machine learning models to predict g ML values under dry conditions and under ambient RH conditions, respectively.
When the random forest machine learning model is run for g values under dry conditions, σ sca and β sca are used as the input for independent variables at three different wavelengths.The other two input parameters, RH and κ, are set to zero.The predictor g values come from the results of the Mie scattering model.Figure 3a shows the calculated and the predicted g ML values from the random forest machine learning model under dry conditions at the PKU site.The results show that the g Mie values and g ML values have good consistency, with an R 2 value of 0.98.Therefore, in 95 % of the cases, the relative difference between g Mie and g ML is within 2.56 %.
Figure 3b shows the comparison of the predicted g ML values under different RH conditions and g Mie values calculated by the Mie scattering model.The correlation coefficient between g Mie and g ML reaches 0.93, and 95 % of the relative differences are within 4.02 %.The random forest model has the potential to be a good method to predict g values under different RH conditions with high accuracy; the uncertainties of predicting g values using the random forest machine learning model is estimated to be 4.02 %.
The fill colors of the dots in Fig. 3 represent the concurrently measured σ sca .It is shown that g values tend to be larger with an increase in σ sca , which is in accordance with the particle scattering properties.When a particle has a larger diameter the σ sca of the particle is higher, and there tends to be a larger partition of forward scattering light.
The reliability of the previous parameterization of the g using b is tested here.Wiscombe and Grams (1976) studied the relationship between b and g and gave the expression between them as follows: This equation is widely used to calculate g from b (Andrews et al., 2006;Horvath et al., 2016;Kassianov et al., 2007).We use the field measurement results to test its reliability.The comparison results between calculated g values from the Mie scattering model and parameterized g values from Eq. ( 6) are shown in Fig. S9.From Fig. S9, we can see that the parameterized g values are prevalently larger than the calculated g values by approximately 10 %.When the σ sca is smaller, the deviations become larger.Some other empirical relationships between b and g (Moosmüller and Ogren, 2017) are also tested.This parameterization scheme almost has the same result as Wiscombe and Grams (1976), which means that the previously established parameterization scheme is not applicable in the NCP

Sensitivity of the random forest model
Sensitivity studies are carried out to assess the influence of each input variable on g ML .Based on the work of Müller et al. (2011), the uncertainties in total scattering are 4 % (450 nm), 2 % (525 nm), and 5 % (635 nm) for experiments with ambient air and laboratory generated white particles.For backscattering, the differences are higher and amount to 7 % (450 nm), 3 % (525 nm), and 11 % (635 nm).The uncertainty of the RH measured by the RH sensors is 1.7 % for RH ranges from 0 to 90 % (Kuang et al., 2017) and the uncertainty of the derived κ values is 6 % (Kuang et al., 2017).Monte Carlo simulations are conducted to study the sensitivity of the g ML to the input parameters in three steps.First, the mean results of the measured dry σ sca , dry β sca , RH, and κ values are used to predict the g value.Second, the dry σ sca at 450 nm is randomly changed with a mean value of 0 and standard deviation of 4 % and the other inputs remain unchanged.The corresponding standard deviation of the predicted g value is used as the sensitivity of the g ML to the σ sca at 450 nm.Lastly, the sensitivity is determined for each input parameter and the uncertainties of the g ML values to the input parameters are estimated.The total uncertainties of predicting g RH are derived when all of the input parameters are randomly changed with their corresponding uncertainties.For each test, the Monte Carlo simulations are carried out 20 000 times.
Table 2 gives the error to two standard deviations of the g ML values corresponding to the uncertainties of the input parameters.From Table 2, it can be noted that the uncertainty of the measured σ sca has little influence on the g ML with g value uncertainties of 0.487, 0.492, and 0.486 % for 450, 525, and 635 nm, respectively.However, the measurement of the three β sca have larger uncertainties and lead to greater influence on predicting g ML with uncertainties of 0.651, 0.486, and 0.710 %.The uncertainty of the RH (0.487 %) has little influence on predicting g ML .However, the uncertainty of the derived κ values (6 %) influence the g values the most with a g value uncertainty of 1.92 %.The total uncertainty of predicting g due to uncertainties in the measurement parameters is 1.95 %.All in all, the total uncertainty of predicting the g ML is estimated to be 4.47 %, considering the 4.02 % uncertainty of the random forest machine learning model from Sect.4.2.1.

Validation of the random forest machine learning model
Datasets of the UCAS campaign are also used to validate the random forest machine learning model.On one hand, the g ML values are calculated by using the random forest machine learning model with the measurements of the humidified nephelometer.On the other hand, ambient g values are calculated by using the measured phase function from the CCD-LADS g CCD according to the definition shown in The results of the comparison of these two kinds of g values are shown in Fig. 4. As seen in Fig. 4, the values of g ML and g CCD show good consistency.In 95 % of cases the relative differences between the g ML and g CCD are within an acceptable range of 6.5 %, which is a little higher than the relative difference of the g values (4.02 %) between the machine learning method and the Mie scattering method.During the study period, the σ sca ranged from 30 to 260 Mm −1 , which led to cleaner conditions in UCAS than in Gucheng and PKU.Correspondingly, most of the g Mie values are small and located in the 0.54 to 0.62 range, which is obviously lower than the range of values from other campaigns.At the same time, the surrounding conditions at UCAS during winter are relative dry, which results in small g values.These conditions may partially explain the higher difference between the g ML and g CCD .With this validation, we conclude that the random forest machine learning model can give a reasonable g value based on the measurements of the humidified nephelometer system.When the PF HG is used to parameterize the calculated phase function using the Mie theory (PF Mie ), there are some deviations and the influence of these deviations should be estimated.The relative difference between the DARF from the PF Mie and from the PF HG is used to estimate uncertainties when using the PF HG .First, the PF Mie profiles are used as inputs to estimate DARFs.The PF Mie is then replaced with the g-related PF HG , which is parameterized by g Mie from the PF Mie , and the DARFs are calculated again.These relative differences between the DARFs from the above two steps are recorded and compared.The relative differences at different zenith angle conditions are calculated to comprehensively estimate the influence of the PF HG .
Figure 5 shows the estimated DARFs at different zenith angles.In Fig. 5a, DARF at the TOA can vary from −2.55 to −4.8 W m −2 .When the PF Mie is replaced by the PF HG , the calculated DARF ranges from −2.6 to −5.1 W m −2 .The relative difference of the DARFs between the two methods ranges from 1.3 to 7.1 %, as shown in Fig. 5b.It is concluded that using the g-related PF HG to replace the PF Mie to estimate aerosol radiative effects is applicable in the NCP, with a deviation of less than 7 %.

Impacts of g variations on DARF estimation
Variations in g can lead to significant changes in the estimated DARF (Kuang et al., 2016;Andrews et al., 2006;Mccomiskey et al., 2008).In this study, the uncertainty of the g value due to the uncertainty of the input parameters is estimated to be 1.95 % and the total variation in running the random forest machine learning model is estimated to be 4.47 %.At the same time, the g can vary about 10 % for different aerosol PNSD and can be enhanced by 20 % by an increase of the RH from 30 to 90 %.It is very important to know the extent of the variation in DARF corresponding to the uncertainties from g.
The variation in DARF from the uncertainties of g is calculated by increasing or decreasing g by 1.95, 4.47, and 10 % of the original g values, and then comparing the corresponding DARFs with the original values.To study the influence of RH on g and DARF, the DARF with the g values calculated from the dry parameterized aerosol population profile, is estimated.
Figure 6 shows the estimated DARFs with different variations in g and the corresponding variations in the estimated DARF.The results show that when g varies by 1.95 %, the DARF can vary by 4 %.However, variations of 4.47 and 10 % in g values can lead to variations of 9.4 and 21 % in the estimated DARF, respectively.
The estimated DARF using the parameterized aerosol profile, which considers the aerosol hygroscopic growth, is smaller than the DARF using the g profiles from the dry aerosol population.The g values under dry conditions are smaller than those under wet ambient conditions.Thus, there is larger partition of energy that is scattered forward which leads to less outgoing backscattering energy and a larger value of the estimated DARF.
When the DARF are estimated ignoring the impacts of aerosol hygroscopic growth on g, the relative difference can be as high as 20 % for all of the zenith angles.Thus, it is necessary to consider the aerosol hygroscopic growth when calculating the g values.

Conclusions
The characteristics of g in the NCP are studied based on the Mie scattering theory and field measurements from the Gucheng and PKU study sites.The results show that g Mie values are 0.604 ± 0.025 at Gucheng and 0.615 ± 0.021 at PKU.The ambient g Mie values at Gucheng show obvious diurnal variations due to variations in RH.When the ambient RH reaches 90 %, g Mie can be enhanced by 20 % and the g values from different aerosol population can vary by 10 %.Comparison of the calculated g Mie values from the Mie scattering model and the parameterized g values from the Wiscombe and Grams (1976) method shows that the parameterized g is overestimated by approximately 10 % and that the deviations become larger when the measured σ sca is below 200 Mm −1 .
The random forest machine learning model and datasets from the humidified nephelometer are employed to calculate g ML values.The input data of the random forest model contain measured σ sca and β sca at three wavelengths, RH, and the hygroscopic parameter κ.Except for RH, all input data came from measurements from the humidified nephelometer system (Kuang et al., 2017).The random forest model can significantly improve the accuracy of g ML prediction.The uncertainties of the predicted g ML values are constrained within 2.56 % under dry conditions and 4.02 % under ambient conditions and the uncertainties from the measurement of the humidified nephelometer can lead to a variation of 1.95 % in g, which mainly results from the inaccuracy of the derived κ.The total uncertainty of the g calculation using the random forest machine learning model is 4.47 %.This is the first time that a machine learning model and datasets from the humidified nephelometer system have been combined to study g.Additionally, this method can account for the influence of aerosol hygroscopic growth on g.
This new method for calculating g is validated by comparing the g ML values from the random forest machine learning model and the g CCD values from the measured phase function by using the CCD-LADS.The g values from these two methods show good consistency, with 95 % of the data within a relative difference of 6.5 %.
The SBDART model is used to study the impacts of g on DARF.We first studied the relative differences between the estimated DARFs using the PF HG and the calculated phase function using the Mie theory, the measured mean aerosol PNSD, and BC mass concentration at the Gucheng and PKU study sites.The results show that the relative differences in DARF can be contained within 7.1 % of the mean when re-placing the PF Mie with the g-related PF HG .The PF HG has the potential to be a feasible parameterization scheme to study DARF in the NCP.
The sensitivity study shows that the maximum uncertainties of DARF are 4, 9.4, and 21 %, which correspond to the uncertainties of the g from instrument measurements, the machine learning model, and the variation of aerosol PNSD.However, when the DARF are estimated ignoring the effects of aerosol hygroscopic growth on g, the relative differences of the DARF are as large as 20 % for all zenith angles.It is necessary to parameterize the g accounting for the effect of aerosol hygroscopic growth.
This work furthers our understanding of the role of g in influencing aerosol radiative effects and can help reduce uncertainties in estimating DARF.

Figure 1 .
Figure 1.Average diurnal pattern of RH (a, b, c), g values calculated from dry aerosols (d, e, f), and g values from ambient aerosols (g, h, i).Panels (a, d, g) are the results from Gucheng.Panels (b, e, h) are the results from PKU. Panels (c, f, i) are the results from UCAS.The box and whisker plots represent the 5th, 25th, 75th, and 95th percentiles.

Figure 2 .
Figure 2. Probability distributions of g under different RH conditions.The left y axis shows g values at different RH values and the right y axis shows the g enhancement factor, which is defined as the ratio of g at a given RH to the g value at dry conditions (RH = 30 %).The solid line (cyan) shows the mean result of the g values and the enhancement factor at different RH values.

Figure 3 .
Figure 3.Comparison of calculated g values (g Mie ) from the Mie model and predicted g values (g ML ) from the random forest model under (a) dry conditions and (b) ambient conditions at the PKU site.Colored dots represent the concurrently measured σ sca corresponding to the time of g.

Figure 4 .
Figure 4. Comparison of the calculated g values (g CCD ) from the CCD-LADS measured phase function and the calculated g values (g ML ) by using the random forest machine learning model.

Figure 5 .
Figure 5. (a) Estimated DARFs at different zenith angles using the g-related PF HG (dotted line) and the phase function calculated using the Mie scattering theory (solid line).(b) The relative difference between the DARFs in (a).

4. 3
Estimating the impacts of g on DARF 4.3.1 Uncertainties of replacing the calculated phase function with the PF HG

Figure 6 .
Figure 6.The variation in DARF when g varies by a range of 1.95 % (light red color), 4.47 % (light blue), and 10 % (light green).Different line styles represent the corresponding mean relative differences in DARF compared to the original value.

Table 1 .
Field information, dataset information, and instruments used in this study.

Table 2 .
The sensitivity of g to the input parameters.Parameter σ sca, 450 σ sca, 525 σ sca, 635 β sca, 450 β sca, 525 β sca, 635 a The uncertainties of the measured parameters.b The uncertainties of g values due to the uncertainties of the measurement parameters.