Development and intercity transferability of land-use regression models for predicting ambient PM , and concentrations in northern Taiwan

. To provide long-term air pollutant exposure estimates for epidemiological studies, it is essential to test the feasibility of developing land-use regression (LUR) models using only routine air quality measurement data and to evaluate 15 the transferability of LUR models between nearby cities. In this study, we develop and evaluate the intercity transferability of annual average LUR models for ambient respirable suspended particulates (PM 10 ), fine suspended particulates (PM 2.5 ), nitrogen dioxide (NO 2 ), and ozone (O 3 ) in the Taipei–Keelung metropolitan area of northern Taiwan in 2019. Ambient PM 10 , PM 2.5 , NO 2 , and O 3 measurements at 30 fixed-site stations were used as the dependent variables, and a total of 156 potential predictor variables in six categories (i.e., population density, road network, land-use type, normalized difference vegetation 20 index, meteorology, and elevation) were extracted using buffer spatial analysis. The LUR models were developed using the supervised forward linear regression approach. The LUR models for ambient PM 10 , PM 2.5 , NO 2 , and O 3 achieved relatively high prediction performance, with R 2 and leave-one-out cross-validation (LOOCV) R 2 values of > 0.72 and > 0.53, respectively. The intercity transferability of LUR models varied among the air pollutants, with transfer-predictive R 2 values of > 0.62 for NO 2 and < 0.56 for the other three pollutants. The LUR-model-based 500 m × 500 m spatial distribution maps 25 of these air pollutants illustrated pollution hotspots and the heterogeneity of population exposure, which provide valuable information for policymakers in designing effective air pollution control strategies. The LUR-model-based air pollution exposure estimates captured the spatial variability of exposure for participants in a cohort study. This study highlights that LUR models can be reasonably established upon a routine monitoring network but there exist uncertainties when transferring LUR models between nearby cities. To the best of our knowledge, our study is the first to evaluate the intercity 30 transferability of LUR models in Asia. among the cohort participants. Furthermore, the LUR-model-based PM 10 , PM 2.5 , NO 2 , and O 3 exposure estimates and The spatial distribution maps of the four air pollutants showed that the developed LUR models are reasonable in modeling the spatial variabilities of air pollution. Ambient PM 10 , PM 2.5 , and NO 2 shared similar spatial variations, with relatively high 350 concentrations in urban areas and along the road network. Ambient O 3 presented a generally opposite spatial variability compared with PM 10 , PM 2.5 , or NO 2 . These estimated air pollution concentration surfaces provide information for the management of air pollution and exposure estimates for epidemiological studies. Compared with nearby-station measurements, the LUR-model-based concentration estimates captured a wider range of exposure to PM 10 , PM 2.5 , NO 2 , and O 3 for participants in a cohort study in the TKMA. Further studies should pay more attention to utilizing other data sources 355 (e.g., satellite remote sensing data) with comprehensive spatiotemporal coverage to validate the LUR-model-based estimations of air pollutant concentrations.


Introduction
Air pollution has been reported to be positively associated with a variety of health effect endpoints, such as lung function and respiratory-disease-related hospital admission (Çapraz et al., 2017;Zhou et al., 2020). Exposure assessment of air pollution is a critical component of epidemiological studies (Cai et al., 2020;Hoek et al., 2008). Cohort studies focusing on 35 the long-term effect on specific diseases of exposure to air pollution require accurate exposure estimates for a large group of participants (e.g., thousands or more) over a defined time period (Brokamp et al., 2019;Morley and Gulliver, 2018;Zhou et al., 2020). Different air quality prediction methods, such as air dispersion models, atmospheric chemical transport models, satellite remote sensing, and various statistical methods, have been developed and applied to estimating population exposure to air pollution (Hao et al., 2016;Michanowicz et al., 2016). Among these exposure assessment methods, land-use regression 40 (LUR) is a standard modeling approach widely used to characterize long-term average air pollutant concentrations at a fine spatial scale, which provides high spatial resolution estimates of exposure for use in epidemiological studies (Bertazzon et al., 2015;Eeftens et al., 2016;Jones et al., 2020).
The LUR method is based on the principle that ambient air pollutant concentrations at fixed-site measurement stations are linearly associated with different environmental features (e.g., land use, population density, road network, and 45 meteorological conditions) surrounding these stations (Anand and Monks, 2017;Lu et al., 2020;Naughton et al., 2018;Wu et al., 2017). In a city or even at a smaller spatial scale area, the LUR method is comparable to or sometimes even better than the approaches of satellite-remote-sensing-based air quality retrievals and air dispersion models in characterizing spatiotemporal variation in air pollution (Marshall et al., 2008;Shi et al., 2020). Following feasible procedures of data processing and analysis, established air pollution LUR models can be applied to predict concentrations of air pollutants at 50 locations without measurements at multiple spatial scales or at residential locations of participants in epidemiological studies Shi et al., 2020).
In recent years, a large number of air pollution LUR studies have been conducted in different areas around the world (Jones et al., 2020;Lee et al., 2017;Liu et al., 2016;Liu et al., 2019;Lu et al., 2020;Miri et al., 2019;Ross et al., 2007;Wu et al., 2017). However, the development and application of LUR models in the Taiwan region have been limited (Hsu et al., 2019). 55 In addition, most previous Taiwan LUR studies used data from purpose-designed monitoring networks or combined purposedesigned and routine monitoring networks (Ho et al., 2015;Lee et al., 2014;Lee et al., 2015). For example, Lee et al. (2015) established LUR models for ambient particles of aerodynamic diameter less than or equal to 2.5 µm (PM2.5) using a purposedesigned monitoring network of 20 sites in the Taipei metropolis. The purpose-designed monitoring campaign has the advantage of capturing short-term air pollution exposure profiles (Jones et al., 2020), but it typically requires extra human 60 labor and resources (e.g., experimental materials) (Hoek et al., 2008). Moreover, it is almost impossible to conduct long-term measurement (e.g., over years) using purpose-designed monitoring networks (Ho et al., 2015;Lee et al., 2017). As a result, a general limitation of LUR models upon purpose-designed monitoring networks is that the established models are usually only valid during the measurement period (Hoek et al., 2008;Shi et al., 2020). Therefore, the development of long-term https://doi.org/10.5194/acp-2020-950 Preprint. Discussion started: 27 October 2020 c Author(s) 2020. CC BY 4.0 License. average LUR models for specific air pollutants using only routine monitoring networks should be explored, which is 65 especially critical for epidemiological studies.
The application of established LUR models to areas outside the study area can reduce extra efforts to develop new models (Poplawski et al., 2009). To date, a few studies have evaluated the transferability of air pollution LUR models within a city and between cities or countries (Allen et al., 2011;Patton et al., 2015;Vienneau et al., 2010;Yang et al., 2020). Direct transferability refers to predictor variables and coefficients of LUR models both being transferred (Allen et al., 2011), 70 whereas transferability with calibration means that model coefficients are calibrated using air pollutant measurements from the target areas . Direct transferability is more meaningful because it can be applied in areas without air quality measurements (Allen et al., 2011;Yang et al., 2020). They concluded that the predictive performances of LUR models from one area to another were not consistent, ranging from poor (Marcon et al., 2015) to relatively acceptable predictive accuracy (Poplawski et al., 2009;Wang et al., 2014). Therefore, more studies should be conducted to assess the 75 transferability of air pollution LUR models.
In this study, annual average LUR models and spatial distribution maps were developed for ambient particles of aerodynamic diameter less than or equal to 10 µm (PM10), PM2.5, nitrogen dioxide (NO2), and ozone (O3) in northern Taiwan in 2019. In addition, the transferability of LUR models between cities in the study area was evaluated. The remainder of this paper is organized as follows: the Materials and methods section describes the study area, data collection and processing, 80 LUR model establishment and validation, and prediction of the air pollution exposure surface; the Results and discussion section presents an overview of measurement data, established LUR models and their comparison with previous LUR models in Taiwan, the transferability of LUR models, the spatial distribution maps of ambient PM10, PM2.5, NO2, and O3 concentrations, and PM2.5 exposure estimates for a cohort study; and the Conclusions section summarizes the main results and demonstrates the implications of the present study. 85 2 Materials and methods

Study area
The Taipei-Keelung metropolitan area (TKMA), located in northern Taiwan, includes Taipei City, New Taipei City, and Keelung City. The TKMA is the political, cultural, and social-economic center of Taiwan. It covers an area of approximately 2457 km 2 , and has 48 administrative districts (Chiu et al., 2019;Wang et al., 2018). The TKMA had a population of about 90 7.03 million in 2019 (TWMOI, 2020), accounting for approximately 30% of the total population of Taiwan ( Fig. 1(a)). The population densities of Taipei City, New Taipei City, and Keelung City were 10,175 people/km 2 , 2021 people/km 2 , and 2826 people/km 2 , respectively, in 2019 (TWMOI, 2020). The numbers of registered motor vehicles were 1.76 million, 3.21 million, and 0.28 million in Taipei City, New Taipei City, and Keelung City, respectively, by the end of 2018 (TWMOTC, The TKMA is situated in the subtropical region and on the downwind side of Mainland China. The built-up area of the TKMA is located in the central part of the Tamsui river basin surrounded by mountains, agricultural land, and forests ( Fig.   1(b) & (c)). The characteristics of the basin terrain can constrain the diffusion of polluted air masses and thus favor the accumulation of air pollution in urban areas (Yu and Wang, 2010). Local emission sources of air pollutants in the TKMA include vehicular exhaust, industrial emissions, and various sources related to residential activities (e.g., cooking) (Chen et 100 al., 2020;Ho et al., 2018;Wu et al., 2017). In winter time, the long-distance transport of dust and polluted air masses under the northeast monsoon from the Asian continent results in a significant increase in concentrations of air pollutants (Chi et al., 2017;Chou et al., 2010).

Data collection and processing
The Taiwan Environmental Protection Administration (TWEPA) operates 20 central air quality monitoring stations in the 105 TKMA, of which 12 stations are in New Taipei City, 7 are in Taipei City, and 1 station is in Keelung City (https://airtw.epa.gov.tw/ENG/default.aspx). In addition, the Taipei Environmental Protection Agency (TPEPA) operates 10 local air quality monitoring stations (https://www.tldep.gov.taipei/EIACEP_EN/Air_NormalStation.aspx). In total, these stations include 21 general stations, 6 traffic stations, 2 background stations, and 1 country park station ( Fig. 1(a)). Detailed descriptions of sampling stations, measurement instruments, and quality assurance and control procedures are available in 110 TWEPA (2020). Hourly measurements of ambient PM10, PM2.5, NO2, and O3 concentrations and the meteorological variables of temperature, wind speed, and relative humidity at the central stations from January 01, 2019 to December 31,

2019
were collected from the Environment Resource database of TWEPA (https://erdb.epa.gov.tw/DataRepository/EnvMonitor/AirQualityMonitorDayData.aspx). In addition, hourly concentrations of ambient PM10, PM2.5, NO2, and O3 at the local stations from January 01, 2019 to December 31, 2019 were downloaded 115 from the TPEPA website (https://www.tldep.gov.taipei/Public/DownLoad/AirAutoHour.aspx). We calculated daily average values of air pollutant concentrations and meteorological variables from hourly data, and calculated the annual average values from daily averaged data for the development of LUR models. Daily and annual average estimates for the air pollutants require at least 75% data completeness (Cai et al., 2020); otherwise there is no value estimate for that day or year.
As presented in Table S1 and Fig. 1, the potential predictor variables of the road network, land use data, normalized 120 difference vegetation index (NDVI), population density, and digital elevation data, which were frequently used in previous LUR studies, were collected. Land-use information was taken from the Land Use Investigation of Taiwan conducted by the National Land Surveying and Mapping Center (https://www.nlsc.gov.tw/LUI/Home/Content_Home.aspx). The Taiwan landuse status is classified into 9 main categories, 41 subcategories, and 103 detailed items. As shown in Fig. 1(c), the 9 main land-use categories are agriculture, forest, transportation, water bodies, built-up areas, public utilities, recreation, mining or 125 salt production, and others . The road network from the Taiwan Ministry of Transportation and Communications includes three types of road: local roads, major roads, and expressways ( Fig. 1(d)). The NDVI and

Model development and validation 135
The LUR models of ambient PM10, PM2.5, NO2, and O3 for the entire study area (the area-specific LUR models) were established using all 30 air quality monitoring stations. In addition, city-specific LUR models for New Taipei & Keelung City were developed using the 13 quality monitoring stations located in these two cities, and the established models were directly transferred to Taipei City. Similarly, city-specific LUR models for Taipei City were developed using the 17 quality monitoring stations located in this city, and the established models were directly transferred to New Taipei & Keelung City. 140 In this study, we did not consider the calibration of model coefficients because we planned to evaluate the direct transferability of city-specific LUR models to another nearby city area when there were no routine air quality measurements.
There is no standard modeling method for developing LUR models (Hoek et al., 2008). In this study, the supervised forward linear regression method (Cai et al., 2020;Eeftens et al., 2016;Xu et al., 2019) was used to develop the LUR models. This modeling method can ensure that only predictor variables following the plausible direction of effect are included and 145 meanwhile the predictive accuracy of the established model is maximized. In brief, all potential predictor variables were included as candidate independent variables and a prior direction was assigned for each category of variable based on the atmospheric mechanism. The model construction started by including the predictor variable with the highest adjusted explained variance (R 2 ). The remaining predictor variables were entered into the model if they met all of the following criteria: 1) the gain of the adjusted R 2 was no less than 1%; 2) the direction of effect of the predictor variable was pre-150 defined; 3) variables were added into the model when the probability of F was less than 0.05 and removed when the probability of F was greater than 0.10; 4) variables already included in the model retained the same direction of effect; and 5) following previous studies Marcon et al., 2015;Wang et al., 2014), the predictor variables with variance inflation factor (VIF) values larger than 3 were dropped to make a tradeoff between model interpretation and the predictive accuracy (Eeftens et al., 2016). Multiple buffer sizes of a specific variable (e.g., the length of local roads) could be 155 selected in the final model as long as they followed the selection criteria (Henderson et al., 2007).
Standard diagnostic tests were applied to ensure that the LUR models were reasonably established (Li, 2020;Wolf et al., 2017). The Cook's distance value was calculated to detect the outliers of data points (i.e., stations) (Jones et al., 2020). Air pollutant observations with a Cook's distance value greater than 1 would be excluded and the LUR model for this air https://doi.org/10.5194/acp-2020-950 Preprint. Discussion started: 27 October 2020 c Author(s) 2020. CC BY 4.0 License. pollutant would be re-established (Weissert et al., 2018;Wolf et al., 2017). In addition, Moran's I values on the 160 concentrations residuals of the final LUR models were calculated using ArcGIS software to evaluate the spatial autocorrelation (Bertazzon et al., 2015;Lee et al., 2017;Liu et al., 2016). The R 2 and root mean square error (RMSE) were estimated to evaluate the performance of the models. Furthermore, leave-one-out cross validation (LOOCV) was employed to evaluate the predictive capacity of the LUR models (Liu et al., 2019;Shi et al., 2020;Yang et al., 2020).
Spatial analysis and calculations were performed using ArcGIS software, version 10.6 (ESRI Inc., Redlands, CA, USA). The 165 statistical analysis was performed using R software, version 3.5.2 (R Core Team, 2018).

Air pollution surface prediction
The entire study area of the TKMA was divided into 9839 500 m × 500 m grid cells. The air pollutant concentrations at the centroids of the grid cells were estimated using the established area-specific LUR models. When the LUR models estimated negative concentration values, the concentration values of the grid cells were set to zero; when air pollutant concentration 170 estimates exceeded the maximum observed concentrations by more than 20%, the concentrations of grid cells were set to 120% of the maximum observed concentrations (Henderson et al., 2007). The area-specific LUR model-based negative and high concentration estimates accounted for only 0%, 4%, 2%, and 0% of PM10, PM2.5, NO2, and O3 estimates, respectively.

Descriptive statistics of the air quality data
In general, the included air quality monitoring stations were situated at different types of land uses across the TKMA (Table   1 and Fig. 1(c)), which suggests that the collected data set has relatively good representativeness. The annual average PM10 concentration of 39.3 µg/m 3 at background stations was the highest, followed in descending order by traffic stations with 180 33.6 µg/m 3 , general stations with 28.5 µg/m 3 , and the country park station with 15.7 µg/m 3 . The traffic stations and country park station had the highest and lowest annual average PM2.5 concentrations, respectively. The annual average PM2.5 concentrations at general stations of 13.7 µg/m 3 and background stations of 13.2 µg/m 3 were comparable. Except for the country park station, the annual average PM10 and PM2.5 concentrations at other types of stations were higher than the air quality guidelines (AQGs) for PM10 and PM2.5 of 20 µg/m 3 and 10 µg/m 3 , respectively, proposed by the World Health 185 Organization (WHO) (WHO, 2006). The annual average NO2 concentration of 24.6 ppb at the traffic stations was the highest, followed by general stations with 14.3 ppb. The annual average NO2 concentrations at background stations (3.81 ppb) and the country park station (1.89 ppb) were significantly lower than those of general and traffic stations because they were farther away from traffic emissions. The annual average NO2 concentration at traffic stations (24.6 ppb) was slightly https://doi.org/10.5194/acp-2020-950 Preprint. Discussion started: 27 October 2020 c Author(s) 2020. CC BY 4.0 License.
higher than the WHO NO2 AQG of 40 µg/m 3 (about 21.3 ppb) (WHO, 2006), while other types of stations had annual 190 average NO2 concentrations lower than this AQG. In contrast to NO2, the background stations (41.7 ppb) and the country park station (39.8 ppb) had higher annual average O3 concentrations than those of traffic stations (21.6 ppb) or general stations (29.4 ppb) (Table 1).

The area-specific LUR models
Fig. S1 shows that Cook's distance values were below 1 for all the stations of the area-specific LUR models, suggesting that 195 there were no station outliers in developing these LUR models. For PM10 and PM2.5 LUR models, Cook's distance values ranged from almost 0.00 to around 0.72. The Cook's distance values of the NO2 LUR model were between almost 0.00 and 0.28, whereas the Cook's distance values of the O3 LUR model were between almost 0.00 and 0.38 (Fig. S1). The final areaspecific LUR models and their corresponding predictive accuracy are summarized in Table 2 and Fig. 2. The model R 2 values ranged from 0.72 for PM2.5 to 0.91 for NO2, indicating a good fit for all air pollutants. PM10, NO2, and O3 LUR 200 models performed well, with LOOCV R 2 values being < 0.10 lower than the model R 2 values. For PM2.5, the model was not as robust as those of other air pollutants, with the LOOCV R 2 value being 0.19 lower than the model R 2 value (Fig. 2). The reason for this is that the PM2.5 concentrations among the stations were not as discrete as those of other air pollutants (Table   1 and Fig. 2). The significance of the predictor variables (p value) and VIF values all met the requirements for LUR model development. Moran's I values were 0.0047, −0.072, 0.023, and −0.055 for the LUR models of ambient PM10, PM2.5, NO2, 205 and O3. In addition, z-score values were 0.83, −0.79, 1.2, and −0.34 for ambient PM10, PM2.5, NO2, and O3 LUR models, respectively, indicating that the spatial patterns of concentration residuals of the LUR models do not appear to be significantly different from random (Fig. S2).
The final area-specific LUR models consisted of three (for O3), four (for NO2), and five predictor variables (for PM10 and PM2.5) ( Table 2) (2020), Weissert et al. (2018) and Wolf et al. (2017), the established LUR models contained at least one traffic-related predictor variable in buffer sizes ranging from 50 m to 3000 m. Traffic emission is a major source of air pollution in urban areas of the TKMA (Lee et al., 2014;Wu et al., 2017). For instance, it was reported that gasoline and diesel vehicle emissions contributed approximately half of PM2.5 concentrations in Taipei City based on source apportionment analysis (Ho et al., 2018). Several previous LUR studies selected the population density variable as the final explanatory variable in their 215 PM2.5 and NO2 LUR models (Ji et al., 2019;Meng et al., 2015;Rahman et al., 2017). However, it was not included in our final LUR models. A possible explanation is that the population density variable is moderately or highly correlated with the variables (e.g., the area of recreational land) included in our final LUR models.
As shown in Table 2 Table 2). The waterbodies can make PM10 absorb moisture and increase sedimentation. In addition, large areas of water provide good conditions for the dispersion of air pollutants (Zhu and Zhou, 2019).
For the NO2 LUR model, the four predictor variables included were the area of transportation land in buffer sizes of 3000 m and 50 m, the area of recreational land in a 1000-m buffer, and the sum of the length of local roads in a 1000-m buffer. The direction of effect for the recreational land was negative, while other predictor variables showed a positive effect (Table 2). 235 The O3 LUR model included predictor variables with relatively small buffer sizes of less than 700 m. The three predictor variables were the area of transportation land in buffer sizes of 700 m and 50 m, and the area of public utilization land within a 300-m buffer. The directions of effect for these three variables were all negative ( Table 2) A comparison of this study with previous LUR studies in Taiwan is presented in Table S2. The predictive performance of the LUR model for ambient PM10 in this study was slightly worse than that of Lee et al. (2015) with an R 2 value of 0.87. In addition, the R 2 and LOOCV R 2 values (0.72 and 0.53, respectively) of the PM2.5 LUR model in this study were lower than an R 2 value of 0.74 (Hsu et al., 2019). Our study established a reasonable LUR model for ambient O3 in the TKMA with an R 2 value of 0.80 and an LOOCV R 2 value of 0.70, which is a relatively high predictive performance. Compared with PM10, 250 PM2.5, and NO2, the establishment of O3 LUR models has been limited in these previous Taiwan LUR studies (Table S2) or in most of the LUR studies in other areas, but it is essential to establish O3 LUR models given that O3 is a toxic photochemical pollutant threatening human health and the ecosystem (Ning et al., 2020;Yim et al., 2019). https://doi.org/10.5194/acp-2020-950 Preprint. Discussion started: 27 October 2020 c Author(s) 2020. CC BY 4.0 License.

Transferability of the city-specific LUR models
The city-specific LUR models for ambient PM10,PM2.5,NO2,and O3 in Taipei City and New Taipei & Keelung City are 255 shown in Tables S3 and S4, respectively. The model R 2 values of the Taipei City PM10, PM2.5, NO2, and O3 LUR models were 0.91, 0.64, 0.89, and 0.76, respectively (Tables S3), while the New Taipei City & Keelung City PM10, PM2.5, NO2, and O3 LUR models had R 2 values of 0.63, 0.65, 0.95, and 0.93, respectively (Tables S4). In general, for each specific air pollutant, the predictive performance of these city-specific LUR models can be slightly higher or lower than those of the area-specific LUR models. Fig. 3 shows the transferability of LUR models between Taipei City and New Taipei & Keelung 260 City. The city-specific LUR models performed worse in another city area than in the city where these models were established. For instance, the transfer-predictive R 2 values of the Taipei LUR models were 0.31, 0.04, 0.62, and 0.56 for predicting ambient PM10, PM2.5, NO2, and O3 in New Taipei & Keelung City, respectively (Fig. 3). These values were substantially lower than the corresponding R 2 values of the Taipei LUR models. The NO2 LUR models showed good transferability between the two city areas, with transfer-predictive R 2 values higher than 0.62. However, the PM10, PM2.5, and 265 O3 LUR models performed poorly when they were transferred between the two city areas, with transfer-predictive R 2 values of < 0.31, < 0.37 and < 0.56, respectively (Fig. 3). Similar to the previous studies of Marcon et al. (2015) and Yang et al. (2020), these results suggested that there may be large uncertainties in transferring LUR models between cities, and even between nearby cities with similar geographic and urban design characteristics. The use of novel cost-effective methods (e.g., low-cost air quality sensors or satellite remote sensing approach) is therefore recommended to assess air pollution and 270 associated population exposure in cities with limited fixed-site measurement stations.

Spatial maps
LUR-model-derived air pollution spatial distribution maps provide valuable and useful air pollutant concentration surfaces in the TKMA. In general, there was a good agreement between LUR-model-based concentration estimates and observations for PM10, PM2.5, NO2, and O3 (Fig. 4). For PM10 and PM2.5, there were certain differences between LUR-model-based 275 concentration estimates and observations at the country park station (Fig. 4). A possible reason for this difference may be that the kriging interpolation method removed low-concentration estimates at this small area when the concentration estimates at nearby areas were higher.
High concentrations of ambient PM10, PM2.5, and NO2 were predicted in the urban areas of Taipei City, New Taipei City, and Keelung City, and along the road network. The estimated PM10 and PM2.5 concentrations in urban areas were around 35.0 to 280 40.9 µg/m 3 and around 12.0 to 17.0 µg/m 3 , respectively, whereas the urban areas had NO2 concentrations of around 12.0 to 31.7 ppb (Fig. 4). This spatial distribution pattern is understandable given that the traffic-related predictor variables were included in the final PM10, PM2.5, and NO2 LUR models. A similar spatial pattern of PM2.5 concentrations was reported by Wu et al. (2017), which documented that high PM2.5 concentrations were distributed mainly in the urban areas of the TKMA and there were also scattered points of high PM2.5 concentrations in its outer ring. However, the estimated 2019 annual 285 https://doi.org/10.5194/acp-2020-950 Preprint. Discussion started: 27 October 2020 c Author(s) 2020. CC BY 4.0 License. average PM2.5 concentrations in this study were significantly lower than those for 2006-2012 estimated by Wu et al. (2017).
There was a clear decreasing trend in PM2.5 concentrations in the whole of Taiwan over the past decade (Ho et al., 2020;Jung et al., 2018). For example, Jung et al. (2018) reported that the estimated PM2.5 concentrations declined by 1.7 µg/m 3 and 1.6 µg/m 3 in the morning and afternoon, respectively, per year over the whole of Taiwan during the period 2005-2015.
O3 showed a generally opposite spatial variability pattern compared with the other three air pollutants, with lower 290 concentrations (< about 32.0 ppb) in urban areas than in rural areas (Fig. 4). A possible explanation for this finding is that high concentrations of NO and NO2 in urban areas react with O3, resulting in a decrease in O3 concentration (Hsu et al., 2019;Vardoulakis et al., 2011).
Correlations of estimated concentrations of PM10, PM2.5, NO2, and O3 in the TKMA are shown in Table 3.

Air pollutant exposure estimates for a cohort study
Air pollutant concentrations measured at nearby fixed-site stations are often used to represent exposures in epidemiological studies (Lin et al., 2016;Shi et al., 2020), but the spatial resolution of these estimates is relatively coarse due to the limited number of sampling stations (Bertazzon et al., 2015). In recent years, LUR modeling has become a more widely applied 305 method to estimate air pollution exposures at a fine spatial scale (Lee et al., 2014;Wolf et al., 2017). Fig. S3 shows that there are differences between LUR-model-based air pollution exposure estimates and nearby-station measurements at residential locations of participants in a cohort study conducted in the TKMA. The average values of the LUR-estimated PM10, PM2.5, NO2, and O3 exposure concentrations were 36.0 µg/m 3 , 14.2 µg/m 3 , 18.0 ppb, and 29.2 ppb, respectively, whereas the corresponding nearby-station measurements were 27.7 µg/m 3 , 13.8 µg/m 3 , 16.3 ppb, and 28.6 ppb, respectively (Table S5). 310 Compared with LUR-model-based estimates, the nearby-station measurements underestimated PM10, PM2.5, NO2, and O3 exposures of cohort participants by 8.23 µg/m 3 , 0.41 µg/m 3 , 1.73 ppb, and 0.60 ppb, respectively (Table S5). In addition, the nearby-station measurements were weakly correlated, with linear regression R 2 values ranging from 0.05 for PM10 to 0.19 for NO2 (Fig. 6). The obtained results highlight that air pollution LUR models may provide more accurate exposure estimates than nearby-station measurements. 320

Limitations
This study is subject to several limitations. First, apart from the variables used in this study, more predictor variables (e.g., localized emission data and urban building morphology data) should be included and tested to develop LUR models. For example, Wu et al. (2017) and Chen et al. (2020) assessed the roles of two culturally specific emission sources, Chinese restaurants and temples, on the development of ambient PM2.5 and NO2 LUR models in Taiwan. More studies should be 325 conducted to test the influence of different potential predictor variables on the development of LUR models (Hoek et al., 2008). Second, like most linear regression techniques, the supervised forward linear regression method is not proficient in modeling extreme values (Jones et al., 2020). In addition, there may be complex and non-linear relationships between the explanatory variables and air pollutant concentrations . Other types of linear regression methods (Hoek et al., 2018;Shi et al, 2020) and the novel machine learning algorithms

Conclusions
Following standard development procedures, the annual average LUR models of ambient PM10, PM2.5, NO2, and O3 were established in the TKMA of northern Taiwan using only data from the routine monitoring network. These LUR models were 340 reasonable, based on the evaluation metrics of Cook's distance, VIF, Moran's I, and p values. The R 2 values of the LUR models for ambient PM10, PM2.5, NO2, and O3 were 0.80, 0.72, 0.91, and 0.80, respectively. The traffic-related predictor variables were the major explanatory factors in the LUR models for all the studied air pollutants.
The predictive performance varied greatly among air pollutants in examining the transferability of city-specific LUR models between New Taipei & Keelung City and Taipei City, with relatively high transfer-predictive R 2 values for NO2. Therefore, 345 this study highlights that the established LUR models in a city area can result in a large estimation bias when applied to another nearby city area with similar geographic and urbanization conditions. It is necessary to conduct more studies to evaluate and improve the intercity transferability of LUR models. https://doi.org/10.5194/acp-2020-950 Preprint. Discussion started: 27 October 2020 c Author(s) 2020. CC BY 4.0 License.
The spatial distribution maps of the four air pollutants showed that the developed LUR models are reasonable in modeling the spatial variabilities of air pollution. Ambient PM10, PM2.5, and NO2 shared similar spatial variations, with relatively high 350 concentrations in urban areas and along the road network. Ambient O3 presented a generally opposite spatial variability compared with PM10, PM2.5, or NO2. These estimated air pollution concentration surfaces provide information for the management of air pollution and exposure estimates for epidemiological studies. Compared with nearby-station measurements, the LUR-model-based concentration estimates captured a wider range of exposure to PM10, PM2.5, NO2, and O3 for participants in a cohort study in the TKMA. Further studies should pay more attention to utilizing other data sources 355 (e.g., satellite remote sensing data) with comprehensive spatiotemporal coverage to validate the LUR-model-based estimations of air pollutant concentrations.
Data availability. The model data presented in this article are available from the authors upon request (steveyim@cuhk.edu.hk). 360 Author contributions. SHLY planned, supervised and sought funding for this study. ZYL performed the data analysis and prepared the paper with contributions from all co-authors.
Competing interests. The authors declare that they have no conflict of interest. 365    1 Note: ** Correlation is significant at the 0.01 level (2-tailed).