A powerful methodology, based on the multivariate curve resolution
alternating least squares (MCR-ALS) method with quadrilinearity constraints, is
proposed to handle complex and incomplete four-way atmospheric data sets,
providing concise results that are easy to interpret. Changes in air quality by
nitrogen dioxide (NO

Monitoring studies of air quality have always been indispensable to assess
the impact of air pollutants on human health and the environment. Most
evaluated air pollutants include the ones linked to industrial and traffic
emissions, such as tropospheric ozone (O

The chemistry of nitrogen oxides (NO

Conversely, the chemistry of PM

Different approaches exist to assess air quality by evaluating concentration changes of these chemical pollutants. In classical air quality monitoring studies, the data treatment strategy generally involves data arrangement and analysis using traditional statistics. However, these methods require extensive computer calculations and their results are often limited and restricted. Instead, chemometric methods are powerful data analysis tools used to investigate the sources of data variance in experimentally measured environmental monitoring big data sets, such as air quality data sets that often contain some missing blocks. These methods can be used to extract and summarize the information often hidden in these environmental big data sets. Among these methods, the Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) method (Tauler, 1995), originally used in the spectrochemical analysis of chemical mixtures, has also been proved to be a competitive method in air pollution studies (Malik and Tauler, 2013; Alier et al., 2011). The MCR-ALS method is a flexible, soft-modeling factor analysis method that allows for the introduction of natural constraints, like non-negativity of the factor solutions. Although it only requires the fulfillment of a bilinear model for the factor decomposition, it can be easily adjusted to the analysis of more complex multiway data structures and multilinear models, such as three-way and four-way environmental data sets (Tauler, 2021), which can be analyzed using trilinear and quadrilinear MCR-ALS models, as shown in this study. The results of the application of the MCR-ALS method can be used for the discovery of the main driving factors (latent variables) responsible for the observed data variance, in this case, the observed changes in the measured chemical pollutants.

The present study is focused on promoting and extending the use of
the MCR-ALS method, including
trilinear and quadrilinear constraints, for the investigation of NO

The experimental data used in this work consisted of O

In this study, two experimental data multisets were analyzed (see Fig. 1).
Both of them contained hourly concentrations of NO

As observed in Fig. 1, both data multisets contained some missing data blocks, which were not included in the MCR-ALS analyses of individual contaminants, apart from some spot values, which were further estimated to undergo chemometric analysis.

In the data set of the month of April, no missing data existed for NO

In this study, the two data multisets were separately arranged to further undergo individual MCR-ALS analyses of the complete experimental data sets (Fig. 1).

To conduct the analysis of the month of April, data matrices for NO

To conduct the analysis of the entire 3 years, data matrices for
NO

Data arrangement for individual analysis of completed data sets.
Concentrations of NO

These 24 data matrices were then arranged using a column-wise arrangement,
obtaining three augmented data matrices:

Data arrangement for the simultaneous study of the three pollutants considering the whole incomplete multiblock experimental data sets is further described in Sect. 2.7.

Estimation of missing data was used for the case when failures of stations
and/or their malfunction caused the absence of measurements for a few hours
or a few days. In order to estimate such missing data, the nearest-neighbor
method (Peterson, 2009) (i.e., knn imputation) was used. In this
study, the function

It is important to mention that estimation of missing data was not performed in cases where the entire month was missing. For those cases, the station was not included in the MCR-ALS analysis of the complete data set. For the analysis of incomplete multiblock data sets, an especial arrangement was performed using a particular data fusion strategy, as further explained in Sect. 2.7.

Different chemometric methods have been proposed in the literature for the analysis of environmental monitoring data. The MCR-ALS method is frequently used in spectrochemical mixture data analysis, which can also be easily extended to the analysis of environmental source apportionment data sets (Alier et al., 2011). The MCR-ALS is a flexible, soft-modeling factor analysis tool which allows for the application of natural constraints (see below), and it can be easily adapted to the analysis of complex multiway (multimode) data structures, such as three- and four-way environmental data sets using trilinear and quadrilinear model constraints (De Juan et al., 1998; Smilde et al., 2004; Malik and Tauler, 2013).

The simplest application of the MCR-ALS method is based on a bilinear model
that performs the factor decomposition of a two-way data set (i.e., a data table or a data matrix). Equation (1) summarizes this bilinear model in its
element-wise way, while Eq. (2) presents the same model in a matrix linear
algebra format:

In this work the MCR-ALS method has been applied, either to the individual
data matrices

Solving Eq. (3) using bilinear MCR-ALS does not take into account the temporal and spatial structure of the data in the vertical concatenated mode, which includes the information of the day, year and station. This data structure can be considered in the trilinear and especially in the quadrilinear extensions of the bilinear models described in Eqs. (1)–(3).

The factor decomposition model given before can be extended to a three-way
dataset,

MCR-ALS with the quadrilinearity constraint. Graphical description of the implementation of the quadrilinearity constraint during the Alternating Least Squares (ALS) optimization. See Eqs. (4)–(7) and their explanation in the paper.

Figure 2 shows the practical implementation of the quadrilinearity constraint
in the MCR-ALS analysis of the four-way data set obtained in the two types of
data, when the April data of the three parameters (O

The individual data sets with the concentrations of the three parameters
(one per year and station), were arranged in the column-wise augmented data
matrix

The simultaneous analysis of the NO

However as previously described, April and the whole-year individual data
sets were not obtained for all stations, years, and pollutants. Therefore,
they could not be fitted together in a rectangular super-augmented data
matrix containing all the data for all the years and stations as shown in
Eq. (8) for

MCR-ALS with the quadrilinearity constraint for the simultaneous analysis of the three contaminants in the incomplete multiblock data set for the month of April. See Eq. (9) and their explanation in the paper.

This new incomplete data set

Analogous equations can be described for the NO

In the proposed approach, missing data blocks were not included in the least
squares estimations of the factor solutions
(

The final evaluation of the MCR-ALS fitting results is performed calculating
the explained data variances (

The development platform MATLAB 9.10.0 R2021a (The MathWorks, Inc., Natick, MA, USA) was used for data analysis and visualization. The new graphical
interface MCR-ALS GUI 2.0 (Malik and Tauler,
2013), freely available as a toolbox at the web address

Results of MCR-ALS will be shown separately for the analysis of the month of
April and for the analysis of the entire years. In the study of the month of
April, the individual analysis of the three contaminants per separate is
initially performed, using only data from stations with no missing blocks
(i.e., data matrices

MCR-ALS decomposition and explained variances for the different models.

The MCR-ALS bilinear analysis of April data in the

Possible correlations between NO

In Fig. 4, from left to right, the profiles of the different modes of the four components are shown: X – day (blue), Z – year (black), W – station (green), and Y – hours (red). Component profiles in the four modes obtained by MCR-ALS when using non-negativity and quadrilinearity constraints are shown in Fig. 4.

MCR-ALS analysis of NO

The NO

Profiles obtained by MCR-ALS for the three components using non-negativity
and quadrilinearity constraints are shown in Fig. 5. The MCR-ALS hourly (Y-mode)
resolved profiles of the first component (C1) showed a maximum between 14:00–22:00 LT, due to the cumulative solar radiation. There was practically no difference in this component among stations, years or the days
of the month. The MCR-ALS hourly resolved profile of the second component (C2) showed a different O

MCR-ALS analysis of O

Profiles obtained by MCR-ALS for these three components using non-negativity
and quadrilinearity constraints are shown in Fig. 6. The MCR-ALS hourly resolved profiles in the Y-mode for the three resolved components indicated a wide maximum between 00:00–15:00 LT (C1), between 15:00–22:00 LT (C2), and
between 10:00–20:00 LT (C3). As observed in the year profile (Z-mode), the PM

MCR-ALS analysis of PM

The MCR-ALS resolved profiles of the

MCR-ALS analysis of NO

Profiles obtained by MCR-ALS using non-negativity and quadrilinearity
constraints are shown in Fig. S4. The hour profiles of the four resolved
components in the analysis of the entire years were similar to those obtained
in the analysis of the month of April: C1 hour profile in April's model was
equivalent to C3 hour profile in all years' model, and C2 and C4 hour
profiles were equivalent in both models. Also, the diminution observed in
Z-mode profile in 2019, and to a greater extent in 2020, in the month of April was also produced when analyzing all the years, but to a lesser extent. This might be due to the fact that the traffic restriction policies were mostly implemented during the strictest confinement (from 14 March to 4 May in Catalonia) and were gradually removed in the de-escalation
phases (Gorrochategui et al., 2021). Also, the extraordinary rainy conditions registered in April 2020 (Gorrochategui et al., 2021) were not registered for the rest of the months, making the
NO

MCR-ALS results of the simultaneous analysis of NO

Only few differences between the analysis of the entire years versus that of
the month of April were observed in the inter-year Z-mode (Fig. S5).
Component 2 in all years' model, corresponding to a late evening peak of
O

As with NO

The MCR-ALS resolved profiles of

The MCR-ALS method with quadrilinearity constraints has demonstrated to be a powerful tool to resolve the principal contamination profiles of four-way environmental data sets, even when containing missing data blocks. The main advantage provided by the use of quadrilinearity constraints is the better and easier interpretability of the profiles, which appear more condensed and concise.

In this study, resolved MCR profiles using quadrilinearity constraints have
been shown to adequately describe the different patterns and evolution of
NO

The simultaneous analysis of the incomplete multiblock data sets allowed the
exploration of the potential correlations among the three contaminants,
which was easily interpretable with the representation of overlapped hour profiles of NO

On the other hand, MCR-ALS O

Overall, this work contributes to the better knowledge of the evolution of
NO

The MCR-ALS code is being continuously further developed, and it can be found on the public web page:

The original data set is available under

The supplement related to this article is available online at:

EG performed data curation, formal analysis and writing. IH provided air quality data. RT contributed to data curation and global supervision.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank the Spanish Ministry of Science and Innovation and Generalitat de Catalunya for providing the data and funding our research.

This research has been supported by the Ministerio de Ciencia e Innovación (grant no. PID2019-105732GB-C21).We acknowledge support of the publication fee by the CSIC Open Access Publication Support Initiative through its Unit of Information Resources for Research (URICI).

This paper was edited by Leiming Zhang and reviewed by Vasil Simeonov and two anonymous referees.