Lung Deposited Surface Area (LDSA) using multipollutant datasets

Lung deposited surface area (LDSA) has been considered to be a better metric to explain nanoparticle toxicity 17 instead of the commonly used particulate mass concentration. LDSA concentrations can be obtained either by direct 18 measurements or by calculation based on the empirical lung deposition model and measurements of particle size distribution. 19 However, the LDSA or size distribution measurements are neither compulsory nor regulated by the government. As a result, 20 LDSA data are often scarce spatially and temporally. In light of this, we develop a novel statistical model, named input21 adaptive mixed-effects (IAME) model, to estimate LDSA based on other already existing measurements of air pollutant 22 variables and meteorological conditions. During the measurement period in 2017–2018, we retrieved LDSA data measured by 23 Pegasor AQ Urban and other variables at a street canyon (SC, average LDSA = 19.7±11.3 μm cm) site and an urban 24 background (UB, average LDSA = 11.2±7.1 μm cm) site in Helsinki, Finland. For the continuous estimation of LDSA, 25 IAME model is automatised to select the best combination of input variables, including a maximum of three fixed effect 26 variables and three time indictors as random effect variables. Altogether, 696 sub-models were generated and ranked by the 27 coefficient of determination (R), mean absolute error (MAE) and centred root-mean-square differences (cRMSD) in order. At 28 the SC site, the LDSA concentrations were best estimated by mass concentration of particle of diameters smaller than 2.5 μm 29 (PM2.5), total particle number concentration (PNC) and black carbon (BC), all of which are closely connected with the vehicular 30 emissions. At the UB site the LDSA concentrations were found to be correlated with PM2.5, BC and carbon monoxide (CO). 31 The accuracy of the overall model was better at the SC site (R = 0.80, MAE = 3.7 μm cm) than at the UB site (R = 0.77, 32 MAE = 2.3 μm cm) plausibly because the LDSA source was more tightly controlled by the close-by vehicular emission 33 source. The results also demonstrate that the additional adjustment by taking random effects into account improves the 34 sensitivity and the accuracy of the fixed effect model. Due to its adaptive input selection and inclusion of random effects, 35 IAME could fill up missing data or even serve as a network of virtual sensors to complement the measurements at reference 36 stations. 37


Introduction 38
Particulate matter is one of the key components determining urban air pollution. Particulate matter can be described by a 39 combination of varying concentration (number, surface area and mass) and chemical composition. The mass concentrations of 40 particulate matter are dominated by large particles whereas the number concentrations are governed by sub-micron particles 41 (particle diameter (dp) <1 μm), particularly ultrafine particles (UFP, dp< 0.1 μm) (e.g. Petäjä  where is the aerodynamic diameter (µm) of spherical particles with the unit density (1 g cm −3 ). The equation is determined 239 in two parts with respect to the two different peaks in the deposition curve in Figure 1. The peak near the size of 20 nm can be 240 approximated to represent the Brownian deposition, whereas the peak between 1 µm and 2 µm represents the inertial 241 deposition. From the particle number size distribution, we calculated the particle surface area distribution assuming each 242 particle is monodisperse sphere of standard density at standard conditions. By Eq. (1), a deposition factor for each particle size 243 bin (26 size bins at SC and 49 at UB) were calculated. Size-fractionated LDSA was then computed by multiplying the surface 244 area concentration with in the corresponding size class. Total LDSA calculated by the ICRP lung model (LDSAICRP) can 245 be obtained by summing up the all the size-fractionated LDSA values. In this study, the alveolar LDSAICRP was calculated 246 based on DMPS measurements in SC and UB. Thus, while the alveolar LDSA measured by Pegasor (LDSAPegasor) represent 247 the ~10-400 nm size range, the alveolar LDSAICRP represent 6-800 nm and 3-1000 nm size range in SC and UB, respectively. 248 https://doi.org/10.5194/acp-2021-427 Preprint. Discussion started: 5 July 2021 c Author(s) 2021. CC BY 4.0 License.

Novel Input-adaptive mixed-effects (IAME) model 249
Input-adaptive mixed-effects (IAME) model is a combination of input-adaptive proxy (IAP) and linear mixed-effects (LME) 250 model. IAP was first introduced by Fung et al. (2020) and has been demonstrated reliable and flexible to fill up missing values 251 by taking input variables adaptively with robust ordinary least square regression models. IAP has been able to estimate BC 252 concentration by other air quality indicators with a satisfactory performance in two different categorised urban environments, 253 street canyon (adjusted 2 = 0.86-0.94) and urban background (adjusted 2 = 0.74-0.91). Some models outperformed IAP in 254 accuracy performance, but its transparent model structure and ability to impute missing values still make it a preferred option 255 as a virtual sensor (Fung et al., 2021). in responses might depend on the time indicators that are not the primary cause of the concentration variability, but they 267 indirectly alter human-induced activities, such as traffic amounts. To take them into account, we created three time hierarchical 268 sub-groups (12 months of year, 7 days of week and 24 hours of day) as the inputs of random effect variables. 269

270
The regression equation of IAME is similar to the equation of IAP, except that IAME includes additional intercepts term for 271 random effects as below: 272 where is the th estimated LDSA concentration. The first term on the right β 0 indicates the fixed intercept of the equation. 273 The second term represents the total contribution by the direct measurement of variable as fixed effects with a slope β at 274 each data point . A maximum of three inputs from the total 16 fixed variables are selected to from 696 sub-models (Figure 2). 275 The inputs for random effects are indicated by as intercepts of the corresponding three hierarchical sub-groups. A Gaussian 276 error term is indicated by . The explanation of Eq (2), is visualised in Figure 2. 277 278 One of the assumptions of LME models is that the random effects, together with the error term, have the following prior 279 distribution: 280 where is a -by-symmetric and positive semidefinite matrix, parameterized by a variance component vector , is the 281 number of variables in the random-effects term, and 2 is the observation error variance. We use an optimiser, restricted 282 maximum likelihood, commonly known as ReML, with the value 1x10 -6 as the relative tolerance on gradient of objective 283 function and 1x10 -12 as absolute tolerance on step size. The use of ReML over the conventional ML could produce unbiased 284 estimates of variance and covariance parameters (Lindstrom and Bates, 1988 After the sub-model formation, the dataset is randomly divided into five portions. 80% of the data are allocated for 4-fold cross 287 validation to remove variance of accuracy. The results of all the folds are averaged and the sub-models are ranked by several 288 evaluation metrics, which are further demonstrated in Figure 2 and described in Sect. 3.4. Some of the sub-models are subject 289 to rejection under two conditions: (1) strong multi-collinearity among the fixed parameters (variance inflation factor (VIF) 290 exceeding a threshold of 5) and (2) violation of the normality assumption of residuals also known as heteroscedasticity (fail in 291 Kolmogorov-Smirnov (K-S) test, p < 0.05). Based on the situation of missing data, the automatised IAME model will search 292 for the best sub-model option from the ranking chart. Hence, each data point might be estimated differently depending on the 293 available data. The number of data points being estimated by each sub-model is reported to show their frequency of usage. 294

Evaluation metrics 295
In order to evaluate the model performance quantitatively, we use the following metrics: 296 where ̂ and ̂ are th measured data point and estimated variable by the model, respectively. ̅ and ̃ are the expected value 297 of the measured and modelled dataset, respectively. is the number of complete data input to the model. Coefficient of 298 determination ( 2 ) is a measure of how close the data lie to the fitted regression line. It, however, does not consider the biases 299 in the estimation. Therefore, we further validated the models with mean absolute error ( ) and centred root-mean-square 300 differences ( ), where measures the arithmetic mean of the absolute differences between the members of each 301 pair, whilst calculates the square root of the average squared difference between the forecast and the observation pairs. 302 is more sensitive to larger errors than . Furthermore, together with , Pearson correlation coefficient ( ) 303 and normalised standard deviation ( ) of the modelled data set are also studied. describes the correlation between the 304 measured and modelled data whereas measures the relative spread of the data. Due to their unique mathematical 305 relationship, the three metrics can be portrayed on Taylor's diagram, which has been used for sub-model selection purpose. 306 We ranked our sub-models first by 2 , followed by and . and serve as additional evidence when we 307 explain the model performance. 308

Two-sample t-tests 309
We assessed the temporal and spatial impact on the IAME model by comparing the means of absolute differences between the 310 hourly measured and modelled LDSA in different time windows at both stations. Two-sample t-tests were performed on the 311 two populations of absolute differences abovementioned to determine whether the difference between these is statistically 312 significant. A significance level α of 5% is chosen as the probability of rejecting the null hypothesis when it is true, denoted 313 as p. RB (2018) are 19.7±11.3 µm 2 cm -3 , 11.2±7.1 µg m -3 , 11.7±8.6 µm 2 cm -3 and 7.6±5.4 µm 2 cm -3 , respectively ( Table 2). The 318 DH and RB site are included to give more substantial interpretation of data because the LDSA concentrations at RB can be 319 viewed as background measurements and the local LDSA increments in HMA can be represented by the LDSA at the hotspot 320 measurement site subtracted by the LDSA at the RB site. The timeseries of LDSA concentrations at the SC and the UB site 321 are presented in Figure 3 and Fig. S4, where the missing data of LDSA for the whole measurement period is 3% and 30%, 322 respectively. When comparing with the same site type in other cities around the globe, LDSA concentrations detected in HMA 323 are the lowest among the European cities with reported values, and about one-fifth that in Japan (Table 1). Some literatures 324 also report LDSA at tracheobronchial region but most just consider LDSA at alveolar which is considered to bring most harm 325 to human's lungs. 326

327
The diurnal pattern of LDSA at RB is not observable on workdays or over weekends ( Figure 4, upper panel). The relatively 328 low variability can be explained by the scarcity of human activities. We can then regard the LDSA at RB as the background 329 concentrations mainly influenced by the regionally and long-range transported aerosol and meteorological variation. As the 330 concentrations at RB is stable throughout the different hours of day; therefore, the diurnal pattern of LDSA concentration is 331 apparently indistinguishable between the measured concentration and the local increments. At the UB and DH site, the 332 magnitudes and the patterns of the average hourly LDSA concentrations at workdays are comparable, and both show bimodal 333 curves, one peak at 6−9 a.m., the other at 9−11 p.m.. The former has a larger peak during the morning peak hour because of morning are not identifiable and the evening peaks are amplified due to enhanced human activities. Similar diurnal variation 337 at residential area was observed for BC emitted by residential combustion by Helin et al. (2018). At the SC site, the morning 338 peak on weekends is not obvious because of the lack of work-related traffic. It appears that a similar bimodal curve can be 339 seen during workdays, but the evening peak is seen during the evening traffic rush hour around 4−6 p.m.. The reason is that 340 the main contributor of LDSA at the SC site is traffic and combustion processes and the diurnal variability mainly depends on 341 the citizen's movement by vehicles in the city. Over weekends, the average hourly LDSA concentrations are the minimum at 342 5 a.m. and they increase and remain at a high level at 2 p.m. until the late night. The level of LDSA concentrations at DH is 343 comparable with that at UB site. However, the amplitudes of the evening peak is higher than that of the morning peak both on 344 workdays and weekends due to elevated residential combustion. 345 346 However, the monthly variability of background measurements at the RB site is stronger compared to the diurnal pattern and 347 the calculation of local increment is necessary. With no intense point sources, the variations at RB are probably due to 348 horizontal dispersion and advection of aerosol particles and vertical dilution controlled by the boundary layer dynamics. In the 349 summer, when solar radiation is persistently stronger, the boundary layer becomes elevated due to surface heating and 350 associated thermal turbulence. This turbulence would dilute the concentration of pollutants at the surface. Another plausible 351 reason could be the higher regional and long-range transported LDSA in the summer, as demonstrated by Kuula Figure 4 shows the LDSA local increments after subtraction of the LDSA at the 353 RB site. For instance, the local LDSA increments at DH are the highest in the winter probably due to local small-scale wood 354 combustion (and traffic). However, without subtracting the background concentrations, the LDSA concentrations at DH are 355 higher in the summer than in the winter (due to high regional background concentrations in summer), as was observed also by 356 https://doi.org/10.5194/acp-2021-427 Preprint. Discussion started: 5 July 2021 c Author(s) 2021. CC BY 4.0 License. Kuula et al. (2020). This piece of evidence can help in the source apportionment. The variation of diurnal and seasonal LDSA 357 for all sites are visualised in Fig. S5. 358

The connection between LDSA and other parameters 359
Alveolar LDSA concentration, as a single number, comprises particles across the whole particle size spectrum measured (e.g.  Table 3 depicts the descriptive statistics of the overall model evaluation on its testing set. The overall model at the SC site is 395 able to explain 80% of the variability of the testing set of the measured data. The 2 in the winter is 0.86 being the highest 396 while the worst 2 is shown in the summer, i.e., 0.70. The and are the smallest during weekend with 2 not 397 particularly high ( 2 = 0.72) probably because the LDSA concentration itself is relatively low in that period. The overall 398 performance is generally worse in UB in terms of 2 , except during weekends that 2 is 10% higher. 399 400 For individual sub-models, their performance could be seen on the Taylor's diagram in Figure 6 Table 5). Although some 419 show exceptionally good performance, the overall model has a slightly worse performance than that in street canyon. The best 420 sub-model estimates 49% of the total measurement, followed by 17%. The third and fourth most used sub-models, which form 421 up to 30% of the estimates, have rather moderate performance ( 2 = 0.58 and 0.69). Considering all possible outcomes, the 422 overall model is still able to explain 77% of the total variance. CO and PNC dominate in the top five used sub-models. BC, 423 NOx and meteorological parameters, like RH and WD-N are also involved in the final LDSA estimation.  Figure 7). At the SC site, the estimates by both LDSAIAP and LDSAIAME could generally catch up with the diurnal cycle of the measured data. However, the models underestimate the peak if the change of the measured LDSA 437 concentration is sudden and relatively large. Despite the small difference observed in the figure, the blue dotted line 438 representing LDSAIAME often stays closer to the measured LDSA concentration (black line). When we smooth out all the 439 estimates at each hour, the ability for IAME to catch the morning peak on workdays is much better. At the UB site, IAME 440 underestimates the LDSA concentration by almost 50% and 25% in the morning on 15 and 23 February 2017, respectively. 441 The overestimation reaches 100% during the midnight between 26 and 17 February 2017. 442 443 A more generalised diurnal cycle can be found in Figure 8. The error bars of the modelled LDSAIAP and LDSAIAME are 444 consistently smaller than that of LDSAPegasor and LDSAICRP. It might be due to the reason that the model fails to catch the 445 extreme values although it manages to catch the general diurnal cycle. Since outliers are removed in the pre-processing stage 446 and the model penalises the extreme values, the model tends to give a more centralised estimate. It is a trade-off between the 447 option with better coefficients of determination but stronger extreme errors and that with better estimations at tails but 448 derivation of averaged estimation. This circumstance is more apparent on workdays than weekends. Furthermore, LDSAIAME 449 could follow the diurnal cycle of LDSAPegasor much better than LDSAIAP, especially during the start of the peak hours over 450 workdays at the SC site where the LDSA concentrations jump to a high level. LDSAIAME can explain 80% and 77% of the 451 variability of the reference measurements at SC and UB, respectively (Table 6  452 https://doi.org/10.5194/acp-2021-427 Preprint. Discussion started: 5 July 2021 c Author(s) 2021. CC BY 4.0 License. Table 6), and compared to LDSAIAP's 77% and 66%, LDSAIAME perform better in terms of accuracy. In addition, the slightly 453 smaller and the closer to 1 of the LDSAIAME suggest that the mean absolute error is improved and the spread of the 454 estimation distribution is closer to the reference measurement by taking random effects into account. 455 456 Furthermore, we assessed the temporal and spatial impact on the IAME model by comparing the means of absolute differences 457 between the hourly LDSAPegasor and LDSAIAME in different time windows at both stations. A descriptive statistic is presented 458 in Table 7. We used two-sample t-tests to assess whether the distribution of absolute differences were statistically significant. 459 At SC, the p value of the t-tests at all selected windows are below 0.05, which demonstrate that the performance at different 460 seasons, days of week and hours of day of absolute differences between the measured and modelled LDSA were significantly 461 different at the confidential level of 95%. At the UB site, the difference between the two selected hour periods is not statistically 462 significant. The same applies to the difference between winter and spring. There are no statistically sufficient evidence to 463 validate the difference among the rest of the selected time period. In other words, with the use of random effects of time 464 constraint, the overall models still perform differently at different time windows most of the time. This indicates that IAME 465 still needs improvements on minimising temporal differences. 466

Conclusion 467
In this study, we develop a novel input-adaptive mixed-effects (IAME) proxy, to estimate alveolar LDSA by other already 11.7±8.6 µm 2 cm -3 ) and a regional background site (RB, average LDSA = 7.6±5.4 µm 2 cm -3 ) are also included as reference 473 and background source estimation, respectively. At the SC site, LDSA concentrations are closely correlated with traffic 474 emission. The ratio to black carbon (LDSA/BC), to particle number concentration (LDSA/PNC), and to nitrogen oxide 475 (LDSA/NOx) have a higher value before the morning peak and it reaches its minimum during the morning peak since the role 476 of regional background is higher for LDSA compared to those of NOx, BC and PNC. However, the ratio of LDSA to mass 477 concentration of particles of diameter smaller than 2.5 µm (LDSA/PM2.5) perform differently since the freshly vehicular 478 emitted particles are smaller than 50 nm, which do not contribute much to PM2.5 mass concentration. The models alone cannot replace the need for reference measurements. However, the IAME proxy could serve as virtual 496 sensors to complement the measurements at reference stations in case of missing data. The two measurement sites in this study 497 serve as a pilot of the proxy development, and the next step is to extend the work to the existing network of several measurement 498 stations within the Helsinki metropolitan region. With similar configurations, we could fill up the voids with the information 499 from the other stations after conscientious calibration. For example, in this paper, the two measurement sites are characterised 500 as street canyon and urban background. In a different setup, we may assume the similarity of the same type of environment 501 and utilise the measurements as replacement. 502 503 Furthermore, this continuous LDSA estimation could be useful in updating some of the current air quality application, for 504 instance GreenPaths application which searches for the best route to wished destination with the least exposure to air pollution 505 (Poom et al., 2020) and ENFUSER air quality model which provide accurate spatio-temporal estimation for air pollutants in 506 Helsinki (Johansson et al., 2015). 507 508

Data availability 509
The air quality data and meteorological data are available from HSY website (https://www.hsy.fi/avoindata) and through 510 SmartSMEAR online tool (https://smear.avaa.csc.fi/). 511               (Taylor, 2001) at Mäkelänkatu SC site (first column) and at Kumpula UB site (second column). Each diamond marker in the Taylor's diagrams represents each sub-model used in the final estimation by IAME (solid black dot), compared with the reference data (solid red dot). Hues of colours represent how frequent the sub-model was used. The lower panel shows the scatter plots of modelled LDSA against the measured LDSA at Mäkelänkatu SC site (first column) and at Kumpula UB site (second column). Hues of colours represent the density of points on the figure.