A model for simultaneous evaluation of NO 2 , O 3 and PM 10 1 pollution in urban and rural areas: handling incomplete 2 data sets with multivariate curve resolution analysis 3

. A powerful methodology, based on multivariate curve resolution alternating least squares (MCR- 13 ALS) with quadrilinearity constraints, is proposed to handle complex and incomplete four-way atmospheric 14 data sets, providing concise and easy interpretable results. Changes in air quality by nitrogen dioxide (NO 2 ), 15 ozone (O 3 ) and particular matter (PM 10 ) in eight sampling stations located in Barcelona metropolitan area 16 and other parts of Catalonia during the COVID-19 lockdown (2020) with respect to previous years (2018 17 and 2019) are investigated using such methodology. MCR-ALS simultaneous analysis of the 3 18 contaminants among the 8 stations and for the 3 years allows the evaluation of potential correlations among 19 the pollutants even when having missing data blocks. NO 2 and PM 10 show correlated profiles due to similar 20 pollution sources (traffic and industry), evidencing a decrease in 2019 and 2020 due to traffic restriction 21 policies and COVID-19 lockdown, especially noticeable in the most transited urban areas ( i.e. , Vall 22 d’Hebron, Granollers and Gràcia). Ozone evidences an opposed inter-annual trend, showing higher 23 amounts in 2019 and 2020 respect to 2018 due to the decreased titration effect, more significant in rural 24 areas (Begur) and in the control site (Obserbatori Fabra). the individual models of the contaminants together with their simultaneous analysis

in Fig. 1a. In the data set of the entire three years (Fig. 1b), for NO2 and O3, the months of January and 115 February 2018 in Observatori Fabra station were missing, respectively. For PM10, data from three air quality 116 monitoring stations were missing: Gràcia (September and October, 2018), Begur (months from January to 117 October, 2018, and months from January to July, 2020) and Observatori Fabra (months from January to 118 September, 2018).

121
In this study, the two data multisets were separately arranged to further undergo MCR-ALS individual 122 analyses of the complete experimental data sets (Fig. 1).

123
To conduct the analysis of the month of April, data matrices for NO2, O3 and PM10 were separately arranged 124 in a first step. For each contaminant, a total of 24 data matrices, one per year (three years) and per 125 monitoring station (eight stations), of size 30 x 24 (month days' x hourly measurements), were obtained.

126
As observed in Fig. 1a these 24 data matrices were labeled as Dstation-year; with the name of the corresponding  Fig. 1a. The resulting dimensions of these two column-wise augmented data matrices for further 134 MCR-ALS analysis were (720 x 24). However, as previously stated, for PM10, data of three months were 135 missing and thus, the final column-wise augmented matrix was built only with the six stations containing 136 no missing data (30 x 3 x 6), resulting in a (540 x 24) matrix (yellow-shaded area in Fig. 1a).

137
To conduct the analysis of the entire three years, data matrices for NO2, O3 and PM10 were also separately

159
Data arrangement for the simultaneous study of the three pollutants considering the whole incomplete 160 multiblock experimental data sets is further described in Sect. 2.7.

163
Estimation of missing data was used for the case when failures of stations and/or malfunction of them 164 caused the absence of measurements for few hours or few days. In order to estimate such missing data, the 165 nearest-neighbor method (Peterson, 2009) (i.e., knn imputation) was used. In this study, the function 166 mdcheck (i.e., missing data checker and infiller) of PLS Toolbox version 8.9.1 (Eigenvector Inc., WA) was 167 utilized to perform the imputation. This function checks for missing data and infills them using a PCA 168 model imputation from distinct algorithms. In our case, three algorithms were tested consisting on 'svd' 169 (Singular Value Decomposition), 'NIPALS' (Nonlinear Iterative Partial Least Squares) and 'knn', the latter 170 providing the better estimation results in our case, and thus, the one that was finally used in this study.

171
It is important to mention that estimation of missing data was not performed in cases where the entire month 172 was missing. For those cases, the station was not included in the MCR-ALS analysis of the complete data 173 set. For the analysis of incomplete multiblock data sets, an especial arrangement was performed using a 174 particular data fusion strategy, as further explained in Sect. 2.7.

184
The simplest application of the MCR-ALS method is based on a bilinear model that performs the factor 185 decomposition of a two-way data set (i.e. a data table or a data matrix). Eq. (1) summarizes this bilinear

256
Therefore, the proposed trilinear and quadrilinear models take advantage of the natural structure of the 257 analyzed data sets, especially in relation to their different temporal modes (i.e. hourly, daily, yearly) and to        four, three and three components were considered (Table 1). These values indicate the higher complexity 406 of the NO2 data compared to O3 data as will be shown also below. When the quadrilinear constraint was 407 applied these values decreased to 78.4%, 92.9% and 78. 4% respectively, confirming again the less complex variance overlap (also given in Table 1) in every case can be obtained subtracting the sum of the individual 411 variances with the variance obtained with all the components simultaneously. This difference is again larger 412 in the case of NO2. In Table 1 also, the variances obtained when the trilinearity constraint was applied, 413 instead of the quadrilinearity constrain, are also given, with similar results to those obtained by both 414 multilinear models. In the case of MCR-ALS of all-year data of the Dcaug-allyear-NO2, Dcaug-allyear-O3 and Dcaug-415 allyear-PM10 data matrices, rather similar results to those from April were obtained in terms of explained 416 variances for all three type of models (see Table 1), reflecting again the higher complexity of the NO2 data     can be better explored in these plots. Profiles of components 1 (C1) and 2 (C2) mostly described the O3 505 pollution: C1 hour profile showed an ozone day-time profile with a wide maximum between 12:00 and 506 22:00h and C2 described the ozone night-time profile, again with a large maximum between 00:00 and 507 10:00h. Component 3 described both PM10 and NO2 correlated pollution sources, with particulate matter 508 having the highest contribution. The correlation between NO2 and PM10 can be due to the common sources

574
As occurred with NO2 and O3, the profiles of the components in PM10 MCR-ALS analysis of all year were 575 similar to those obtained in the analysis of the month of April (Fig. S6). C1 and C3 hours profile in April's 576 model were equivalent to C3 and C1 in all years' model, respectively, and C2 described the same PM10 577 profile in both models. Also, the diminution observed in 2019 and in a bigger extent in 2020 in the month 578 of April was also produced when analyzing all the year, but in a lesser extent, as stated for NO2. Moreover,

615
The simultaneous analysis of the incomplete multiblock data sets allowed the exploration of the potential 616 correlations among the three contaminants, which was easily interpretable with the representation of 617 overlapped NO2, O3 and PM10 hourly profiles. Interestingly, both in the study of the month of April and the 618 study of the entire years, the simultaneous analysis of the three contaminants evidenced a correlation 619 between NO2 and PM10, due to their common pollution sources (i.e., traffic and industry). Moreover, the 620 profiles of these two contaminants showed an inter-year decrease, due to the introduction of LEZs (LEZ -