Why is the city ’ s responsibility for its air pollution often underestimated ? A focus on PM 2 . 5

While the burden caused by air pollution in urban areas is well documented, the origin of this pollution and therefore the responsibility of the urban areas in generating this pollution are still a subject of scientific discussion. Source apportionment represents a useful technique to quantify the city’s responsibility, but the approaches and applications are not harmonized and therefore not comparable, resulting in confusing and sometimes contradicting interpretations. In this work, we analyse how different source apportionment approaches apply to the urban scale and how their building elements and parameters are defined and set. We discuss in particular the options available in terms of indicator, receptor, source, and methodology. We show that different choices for these options lead to very large differences in terms of outcome. For the 150 large EU cities selected in our study, different choices made for the indicator, the receptor, and the source each lead to an average difference of a factor of 2 in terms of city contribution. We also show that temporaland spatial-averaging processes applied to the air quality indicator, especially when diverging source apportionments are aggregated into a single number, lead to the favouring of strategies that target background sources while occulting actions that would be efficient in the city centre. We stress that methodological choices and assumptions most often lead to a systematic and important underestimation of the city’s responsibility, with important implications. Indeed, if cities are seen as a minor actor, plans will target the background as a priority at the expense of potentially effective local actions.

for NO 2 . The highest mortality burden for PM 2.5 occurs in northern Italy, southern Poland, and eastern Czech Republic. De Bruyn and de Vries (2020) showed that for all 432 cities in their sample (total population: 130 million inhabitants), the social costs (e.g. hospital admissions, premature mortality) but also the costs due to air pollution exceeded EUR 166 billion in 2018 for Europe (EU-27 plus the UK, Norway, and Switzerland). City size was shown to be a key factor contributing to the total social costs: all cities with a population over 1 million feature in the top 25 cities with the highest social costs due to air pollution.
Given the health and economic burden caused by air pollution in urban areas, it is important to identify the origin of this pollution in order to reduce and control its impact. Identifying the sources of urban pollution and then assigning responsibilities enables a process to implement measures and control air pollution. Assessing the responsibility or share of cities for their pollution has important implications. For being effective, pollution reduction plans must be designed and applied to target the most polluting sectors at the relevant spatial scale (national, regional, and/or local) and with the appropriate temporal scales. In this context, quantifying the share of the city pollution caused by their own emissions becomes a crucial element to determine whether actions need to be applied locally or at the regional, national, or continental scale. This has important governance consequences for the effective control of air pollution.
For pollutants like NO 2 that mostly originate from traffic sources and have a relatively short lifetime in the atmosphere, there is a general agreement on the fact that cities are the main contributor to these pollutant concentration levels and that acting locally on traffic emissions is the most efficient way of improving NO 2 concentration levels in a particular city (Tobías et al., 2020). There is available European-wide information, such as in Degraeuwe et al. (2019), providing overviews of the potential impact of traffic emission reductions per vehicle type in different European cities. There is also agreement regarding O 3 that this secondary pollutant is most effectively reduced by implementing reduction measures at larger spatial scales, involving actions driven at the regional and even continental scales (e.g. Luo et al., 2020). For other pollutants, like PM 2.5 , complex physical and chemical atmospheric processes with different timescales drive its formation, involving numerous precursors themselves emitted by several sources. The sources of PM 2.5 pollution range from local traffic, domestic fuel burning, and industrial activities to regional sources such as agriculture in rural areas. Even though the latter emissions do not originate from cities, Thunis et al. (2018) showed that their impact on urban pollution could be important, reaching up to 30 % in several European cities. Because of this complexity, there is less consensus regarding a city's responsibility for or share of its pollution when addressing PM 2.5 . Because of this lack of consensus and the major burden of PM 2.5 on health, we focus our analysis on this pollutant.
The usual approach to assess the city's share of pollution levels (in other words the city's responsibility) is source apportionment (SA). However, many SA approaches exist. The most widely used SA methods are the "potential impact" (or brute force), "increment", and "tagging" approaches. An overview description of these methods and an evaluation of their limitations and capabilities for use can be found in Thunis et al. (2019). Moreover, many ways to parameterize them exist as well, leading to a variety of results and interpretations. For the 18 million inhabitants of New Delhi, Amann et al. (2017) concluded that only 40 % of the PM 2.5 pollution was originating from local city sources, based on potential impact SA and expressed in terms of city-averaged population exposure, averaged yearly. In the context of the Copernicus programme, CAMS (Copernicus Atmosphere Monitoring Service) performs SA calculations daily with two different approaches, namely tagging and potential impacts, for a series of European cities. Results show important differences on a day-by-day basis, although these differences smooth out when considering longer-term averages (Pommier et al., 2020). Based on the increment approach, Kiesewetter and Amann (2014) derived SA estimates for a series of European cities and aggregated these detailed results at the country level, leading to relatively low city responsibilities (e.g. about 25 % for French, German, and Italian cities). Based on a potential impact approach, Thunis et al. (2018) estimated city shares for 150 cities in Europe. They highlighted their large variability across Europe and stressed the importance of the definition of the city with regards to the results by testing the sensitivity to different city extensions. The choice of the SA method, but also the way this method is configured, can lead to very different outcomes for the city's share of pollution, ranging from cities being a major contributor to their pollution to cities having a limited responsibility. This explains why the actual city's responsibility for its pollution remains a topic of discussion and why some authors stress the importance of local actions Wu et al., 2011;Raifman et al., 2020) when others stress the need for regional, national, or even continental actions (Huszar et al., 2016;ApSimon et al., 2021;Liu et al., 2013). This diversity of conclusions has serious consequences in terms of policy decisions. Blaming external (i.e. outside the city) pollution sources as being primarily responsible for urban pollution is sometimes an easy argumentation for decision makers to justify local inaction.
This work aims at explaining the main causes of discrepancies between different assessments of the city emission's impact on its pollution levels and show that these discrepancies generally lead to underestimation of the city's responsibility. It proposes a specific harmonized nomenclature for source apportionment approaches, and it shows how it is important to document the choices to enable correct interpretation of the results. We begin with a conceptual overview of the parameters structuring any SA approach (Sect. 2). This includes the definition of the key parameters of any SA study: indica-tor, source, receptor, and methodology to relate them. Then (Sect. 3) we assess the sensitivity of the urban SA results to the choices of these four parameters. In Sect. 4, we analyse implications in terms of air quality planning and suggested strategies. We finally provide conclusions in Sect. 5.
2 Assessing the city's responsibility for air pollution: main concepts In this section, we detail the steps required to quantify the responsibility of a city for its air pollution through source apportionment (SA). SA is a methodology that serves to estimate the contribution of a given source at a specific receptor for a given indicator (for example the concentration of a given pollutant like PM or NO 2 ). It involves the following steps ( Fig. 1): 1. defining a relevant indicator, denoted as (I ) to characterize air pollution; 2. defining the receptor (R) through its spatio-temporal characteristics, i.e. the area (x r ) and time period (t r ) over which the indicator is averaged; 3. defining the source (S), in our case the city, and its spatio-temporal characteristics, i.e. the city area (x s ) and time period for which the city's responsibility is assessed (t s ); 4. selecting the source apportionment (SA) methodology to capture the processes that relate the source to the receptor.
Figure 1 summarizes these steps as well as the nomenclature and symbols used in this work. We use this new nomenclature to attach contextual information (i.e. metadata) to the source apportionment. Further explanations of the symbols are given in the subsections below.

Definition of the air pollution indicator (I)
The first step required to assess the role and responsibility of city emissions with respect to its air pollution is to define an indicator that identifies the pollution aspect we are interested in. The indicator can be defined in many ways, for example as the total concentration of a given compound (e.g. PM), as a specific constituent of that total concentration (e.g. PM 2.5 or its primary fraction, PPM), as a composite based on a mix of different pollutants (e.g. maximum among O 3 , PM 2.5 , and NO 2 concentrations as in some air quality indexes such as ATMO2003, 2003, or as population exposure (i.e. product of population and concentration).

Definition of the receptor (R)
Estimating the indicator, from either a measuring instrument or a model simulation, implies an averaging process in both space and time. For model data, averages correspond to the spatial and temporal resolutions (e.g. the time step and grid cell size), whereas for measurement, the space-time average will depend on the instrument acquisition time and on the atmospheric dispersion characteristics at the measuring site. Regardless of these intrinsic time and space averages, indicators are generally averaged over longer spatial and temporal scales for convenience. The receptor is defined as the spatiotemporal entity over which the indicator is averaged. Both a spatial and a temporal scale (denoted by x r and t r , respectively) must be associated with the receptor to define it.
For the temporal dimension, typical examples for PM 2.5 are days (t r = D) or years (t r = Y ). Spatially, the indicator can be estimated at a specific location, e.g. the city centre (x r = x center ) or at the location where the maximum concentration occurs (x r = x max ), or it can be averaged over the city (x r = city). For convenience, we use interchangeably the following notations to refer to the receptor: (1)

Definition of the source (S)
The source is defined as the spatio-temporal entity (e.g. city, emission macro-sector, etc.) for which we assess the contribution to the indicator. For the purpose of this work, the source is defined as the city and more precisely as the emissions that originate from it. The source emissions (denoted by E) are indeed responsible for the pollution fraction that can be associated with the source or city at the receptor (R). These emissions are characterized by a spatial (x s = extension of the city) and a temporal scale (t s = period of time over which the source activity is assessed). For convenience, we use interchangeably the following notations to refer to the source: In this work, we analyse in particular the impact of the city extension (x s ) on the apportionment outcome. For this purpose, we define cities in two ways: 1. as core cities, i.e. the local administrative units, with a population density above 1500 km −2 and a population above 50 000, where the majority of the population lives in an urban centre, and 2. as functional urban areas (OECD, 2012, denoted as "FUAs") composed of core cities plus their wider commuting zone, consisting of the surrounding travel-towork areas where at least 15 % of the employed residents work in the city.
Details on the FUA and core city areas are available for 150 EU cities in the urban PM 2.5 atlas . Note that other city definitions exist. In the context of the CAMS source allocation analysis, cities are defined as an arbitrary number of grid cells in the modelling domain (Pommier et al., 2020). Finally, we define the city background as the sum of all contributions from sources that are not covered by the spatial (x s ) and temporal (t s ) scales of the city source.
One main difference between sources and receptors is that for the latter, spatio-temporal characteristics are averaged. Apart from this, temporal and spatial characteristics can also differ in terms of value. For example, the source can be defined as the FUA (x s = FUA), while the receptor is a specific location (x r = x max ). Temporally, interest can be in assessing the contribution of the city weekly activity (t s = 1 week) for a given day (t r = D) at the receptor. In the results presented here, the source and receptor temporal scales are however chosen to be identical for convenience.

Selection of the SA methodology
When the air pollution indicator and the spatio-temporal characteristics of both the receptor and the source have been selected, the next step consists of distinguishing and quantifying the fractions of the indicator related to the city source (I city (R)) and to the background (I bg (R)) at the receptor R, respectively. This decomposition is summarized by the following equation: Different SA methodologies exist to perform this operation. In this section, we describe three main approaches, but only in brief, as details about each of these are discussed in other works Thunis et al., 2019Thunis et al., , 2018Mertens et al., 2018). As mentioned previously, we use the indicator's superscript to refer to its calculation method (I M city (R)). Methods are summarized in Table 1.
-Potential impacts (PI). The city contribution in this method is denoted as I PI100 city (R) and is calculated as the difference between two simulations: a base case that includes the city (I (R)) and a scenario in which the city emissions are switched off I city 100 (R) . In this notation, the source superscript (here, 100) indicates the percentage intensity by which the source emissions are reduced. Reductions are intended as percentage variations from the base case situation. The same approach can be used with reduction percentages that are lower than 100 %. In this case the resulting difference is divided by the reduction percentage to obtain the potential impact I PIα city (R) . A similar approach is used to calculate the background contribution, i.e. by removing or reducing partially the background emission sources. Potential impact methods for source apportionment are widely used (Osada et al., 2009;Huszar et al., 2016;Huang et al., 2018;Wang et al., 2014Wang et al., , 2015Van Dingenen et al., 2018;Thunis et al., 2016;Clappier et al., 2015;Pisoni et al., 2017).
-Increment (INC). With this methodology, the background contribution is estimated as the concentration observed or modelled at a given location "y" I INC bg (R) = I y, t r . This location must be far enough from the source to not feel its influence but be close enough to the source to avoid influences from other sources external to the city. These assumptions are further described and discussed in Thunis et al. (2017).
-Tagging (TAG). With this approach, species emitted by the city are numerically tagged and followed through the modelled transport, dispersion, and chemical transformation processes. When chemical transformations take place, preserved atoms are used as tracers. For example, the nitrogen atom (N) will be used to follow the NO source emissions through its successive transformations into NO 2 and HNO 3 to reach its final product NO 3 , which will then be attributed to that source. Example of tagging applications are, for example, Kranenburg et al. (2013), Yarwood et al. (2004), Wagstrom et al. (2008), Kwok et al. (2013), Bhave et al. (2007), and Wang et al. (2009). Some of these approaches are implemented operationally to estimate daily city contributions to air pollution (https://topas. tno.nl/documentation/, last access: 24 November 2021).
The formulations corresponding to these three main approaches are summarized in Table 1.
A few key points are worth noting. While tagging and potential impact approaches explicitly consider city emissions in their calculations, this is not the case for increments that only refer to them implicitly. By construction, both the increment and tagging approaches are additive (i.e. I (R) = I city (R)+I bg (R)), whereas this is not the case for potential impacts when pollutants behave non-linearly because of air transport, deposition, or chemical processes .

Results
Recognizing the impossibility of assessing the sensitivity of the results for all combinations of indicators, source, receptor, and methodology, we focus our analysis on comparisons in which only one parameter is changed at a time to highlight major sensitivities. For this purpose, we use the following two main sources of data and results.
-SHERPA. SHERPA is a modelling tool based on source-receptor relationships that represent a simplified version of a chemistry transport model, used to simulate the contribution to PM 2.5 concentration levels by all precursor emissions (NO x , non-methane volatile organic compounds (NMVOCs), PPM, SO 2 , and NH 3 ) from different cities in Europe (Clappier et al., 2015;Thunis et al., 2016Thunis et al., , 2018. In its current configuration, SHERPA is based on the CHIMERE model (Menut et al., 2013) covering the whole of Europe at roughly 7 km spatial resolution. In this work, we use the source apportionment results over 150 cities as reported in the PM 2.5 urban atlas  as well as additional SHERPA data to provide further analysis.
-EMEP simulations. The EMEP model is an offline regional transport chemistry model (Simpson et al., 2012; https://github.com/metno/emep-ctm, last access: 24 November 2021). The model has 20 vertical levels, with the first level around 50 m. The model uses meteorological initial conditions and lateral boundary conditions from the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecasting System (IFS). The meteorological year is 2015. Detailed information on the meteorological driver, land cover, model physics, and chemistry is described in Simpson et al. (2012) and in the EMEP 2017 status report (EMEP2017, 2017. In this work, we use specific simulations where emissions have been removed partially or fully in a series of European cities. Additional details regarding these simulations are provided together with the discussion of the results. Based on these sources of information and data, we discuss hereafter the sensitivity of the SA results to the choice of the indicator (Sect. 3.1), to the choice of the methodology (Sect. 3.2), to the source (Sect. 3.3), and finally to the receptor (Sect. 3.4).

Sensitivity to the indicator
The implications resulting from the choice of the indicator are illustrated in Fig. 2 for four indicators based on SHERPA results for 150 cities in Europe. The four indicators selected to characterize air pollution are (a) the PM 2.5 concentration (top left; from Thunis et al., 2017), (b) the anthropogenic fraction of PM 2.5 ("PM 2.5 ant"; top right), (c) the primary anthropogenic fraction of PM 2.5 ("PPM 2.5 ant"; bottom left), and (d) the primary fraction of PM 2.5 originating from the transport and residential sectors ("PPM 2.5 oxy"; bottom left). The reference (PM 2.5 total mass; top left) corresponds to the indicator currently used in legislation (e.g. European Ambient Air Quality Directive, AAQD2008, 2008) against which health impacts are correlated (WHO2005, 2005. In the second case, the indicator is limited to its anthropogenic fraction (PM 25 ant), excluding therefore natural contributions (dust, marine salt, etc.). This is motivated by the fact that policies have no impact on this component. According to this indicator, city contributions increase significantly (by about 20 % on average), and in some cities where natural dust pollution is important (e.g. in Sicily), the city's responsibility shifts from minor to major. If we further restrict the indicator to its primary anthropogenic fraction ("PPM 2.5 ant"; bottom right) because of its suggested higher health burden (Park et al., Table 1. Formulation of the three main methods to estimate the contribution, potential impact, and increment of a city. The letters, I , S, and R refer to the indicator, source, and receptor, respectively. The indicator superscript refers to the SA method (PI for potential impacts, INC for increments, and TAG for tagging), while its subscript indicates the source (city or background (bg)); α represents the percentage reduction factor applied for the source emissions in the potential impacts method. See text for additional details.

City contribution Background contribution
Potential impact I PIα city = 2018; Viana et al., 2008), the city contribution then increases significantly in most cities. This becomes even more striking if we limit the indicator to the PPM 2.5 fraction originating from the transport and residential sectors (bottom right). These two sectors have recently been shown to generate the largest burden on human health given the high oxidative potential of their emissions (Daellenbach et al., 2020;Li et al., 2016). With this indicator, the majority of EU cities become main contributors to their pollution. Regarding the latter indicator, it is important to note that although the increasing adoption of electric vehicles shows rather positive impacts on health (Choma et al., 2020), the remaining PM emissions from road traffic like tyres and brake and road wear emissions (Kole et al., 2017;Grigoratos and Martini, 2014; Ntziachristos and Boulter, 2019) will remain an issue. The calculation of various geochemical indices (enrichment factor, geo-accumulation index, pollution index, and potential ecological risk) also show that road dust is extremely enriched and contaminated by elements from tyre and brake wear (e.g. Sb, Sn, Cu, Bi, and Zn).

Sensitivity to the SA methodology
A comparison of SA methodologies is proposed in Thunis et al. (2019), where the potential impact, increment, and tagging approaches are compared on both simple theoretical examples and real data to highlight differences among methods and stress their limitations. In this section, we summarize the main findings of this work and complement it with comparisons that focus on the apportionment of the city vs. background contributions. We also provide in Appendix A a comparison of all SA methods discussed in this section, applied on a theoretical example tuned to the city scale.

Increment vs. potential impacts
Thunis (2018) compared increments and potential impacts with the SHERPA model for a series of European cities. He showed that increment approaches lead to important underestimations (30 % to 50 %) of the city's responsibility for PM 2.5 and NO 2 with respect to potential impacts. This underestima-tion is explained by the non-fulfilment of the two underlying increment assumptions related to the external location (i.e. y in I INC bg (R) = I y, t r ) being (1) far enough from the city to not feel its influence but (2) close enough to the city to avoid influences from sources external to the city. The authors show that these two assumptions are seldom fulfilled in reality. Clappier et al. (2017) discussed the concepts underlying these two SA methods and showed that important differences in terms of results arise as soon as non-linear processes are present.  highlighted and quantified these large differences based on a real-case inter-comparison exercise. Finally, Thunis et al. (2019) reviewed in their work many inter-comparisons between tagging and potential impact SA results. In their application over the Po basin (Italy), they showed that differences are large for the agriculture sector (dominated by NH 3 emissions) but are also important for other sectors when dealing with high temporal resolution (e.g. daily) at the receptor. Unfortunately, these examples did not address the particular case of a city-scale apportionment.

Full vs. partial potential impacts
To analyse differences between full and partial impacts, we use a series of EMEP simulations in which we remove totally (PI100) or partly (PI20) the London FUA emissions (source) during an entire year. Figure 3 shows the differences between city contributions obtained with the two PI methods. Differences can be important (up to 25 percentage points for specific days). Although the number of high-difference days is limited (leading to a yearly average difference of a few per cent), these days might represent high-pollution episodes for which assessing the city's responsibility is important to act. In general, the higher resolution applied to the temporal and/or spatial averages at the receptor, the larger the differences are among methods. It is also interesting to note that partial potential impacts systematically underestimate full potentials (no negative values). SHERPA results for 150 major cities in Europe for the overall PM 2.5 concentration (a), for its anthropogenic fraction ("PM25_ant"; b), for its anthropogenic primary fraction ("PPM25_ant"; d), and for its primary fraction originating from the transport and residential sectors ("PPM25_oxy"; c). For all cities, the source is defined spatially as the FUA over which emissions are reduced over a year (Y ). The receptor is defined as the city location where the concentration is maximum (x max ), and the indicator is averaged yearly at the receptor (Y ). All calculations are made with the same SA methodology, namely potential impacts (PIs) with city emissions reduced by 50 % (PI50).

Figure 3.
Histogram of daily city contribution differences to London PM 2.5 levels between two potential impacts methods, PI100 and PI20, calculated with the EMEP model. The source is defined spatially as the FUA where emissions are reduced yearly (Y subscript). The receptor is defined spatially as the city location where the maximum yearly averaged concentration is modelled (x max ) and temporally as daily average (D). Each column represents the number of days with a specific PI difference (PI100 − PI20). The blue line provides the yearly average difference. Figure 4 shows the comparison between SA obtained with sources defined as core cities (left) and as FUAs (right). The city contribution or responsibility is multiplied by a factor of 2 on average (see also Fig. 8) when FUAs are considered. The larger spatial extension of the FUA and its implied additional emissions explain the differences that lead some cities to become a major actor, i.e. where the city contribution dominates the background one (e.g. Athens, Warsaw, Milan, Turin, and Rome).

Sensitivity to the receptor
In this section, we discuss the spatial and temporal averages applied at the receptor. Spatially, different averaging options exist, ranging from a single location (i.e. one model grid cell) to more or less extended areas covering part of the source or even larger. To illustrate the sensitivity of SA to that choice, we use the case of Paris (Fig. 5), where emissions have been reduced over the FUA (source) over a full year.
SA varies largely from one location to another within Paris. We highlight this with bars that distinguish the city vs. background contributions for locations at different distance from the city centre. We note opposite trends, dominated by the city source (around 60 %) at the city centre and  . City rings' source apportionment for Paris PM 2.5 and associated population exposure. The city and background apportionment (bars) is represented for rings (i) progressively more distant from the city centre (x axis). The ring average concentration (C i ) and population density (P i ) relative to the city centre values are represented in blue and green, respectively. The relative (to the FUA total, i.e. all rings) weight of each ring (i) in the city average concentration (brown) is calculated as C i × S i / i (C i × S i ), where S i is the ring area. A similar expression (C i ×S i ×P i / i (C i ×S i ×P i )) is used to determine the weight of each ring in the calculation of the average population exposure (red curve). dominated by the background source towards the periphery (around 80 %). While the SA at the city centre is representative of a single cell within the city, this is not the case for SA close to the periphery. This is highlighted by the city rings (below the x axis) that indicate the area of representativeness of a given SA. When we spatially average an indicator (PM 2.5 or population exposure) over a receptor that covers the entire FUA (all six rings), these areas of representativeness enter into play. The brown curve indicates the weight (in the spatial average) attached to each city ring relative to the city total (i.e. all rings). Weights increase fast when moving towards the periphery because of the larger ring areas. The spatial-averaging process leads to over-representation of the periphery, which outweighs the city centre SA by almost a factor of 40. It is interesting and counterintuitive to note that with this averaging process, the city's responsibility de-creases when the city area increases. With population exposure as an indicator (weights shown by the red curve), the rapid population density decrease balances the ring area increase when moving outward, leading to weights that dominate for middle rings. With average population exposure, the city centre weight is still similar to the weight obtained 28 km away. Figure 6 compares SA for 150 cities obtained for receptors defined (1) as the location where the maximum concentration is reached within the FUA (x max ) and (2) as the FUA spatial average (FUA). On average, city impacts for a spatially averaged receptor are about 55 % lower. Depending on the spatial characteristic of the receptor, some cities will be considered to be minor or major actors with respect to their pollution. We discuss this point further in Sect. 4. Figure 6. Comparison of potential impacts for 150 cities in Europe obtained for a receptor spatially defined as the location where the concentration is maximum in the city (x max -x axis) and defined as the spatially averaged FUA (FUA). For these calculations, the source is defined as the FUA over which emissions are switched off during the whole year. The indicator is the total PM 2.5 mass. All results are based on the SHERPA-CHIMERE model using a potential impact SA method for a reduction strength of 50 % (PI50) and are based on yearly averages at the receptor (Y ).
As seen from these results, spatial averages at the receptor significantly reduce the city's responsibility, potentially leading to underestimation of the city's ability to reduce pollution levels via local controls. The large differences resulting from the choice of the receptor settings prevent meaningful comparisons. It is for example challenging to compare CAMS city contributions that are averaged spatially over the city area with the urban results obtained in the context of the Thematic Strategy on Air Pollution (Kiesewetter and Amann, 2014) that are aggregated at the country level or with SHERPA estimates based on a single grid cell receptor. It is therefore crucial to associate all SA settings (metadata) to the results in order to inform about the meaningfulness of a comparison. We further discuss this issue in the context of air quality planning in Sect. 4.
Similar considerations apply to temporal averages. Figure 7 compares SA obtained when the indicator at the receptor is averaged yearly and seasonally with daily single values. For a yearly average, Madrid city's contribution is 54 %, but the spectra of daily contributions show variations that range from 10 % to beyond 90 %. Even seasonal averages show important differences of a factor of 2 between summer and winter. Similarly to spatial averages, temporal averages encompass a large spectra of SA outcomes. Indicators averaged yearly at the receptor have been used for example in SHERPA  and GAINS (Kiesewetter and Amann, 2014), whereas daily indicators are used in CAMS (Pommier et al., 2020). Correlating low and high city contributions to meteorological factors (cold vs. warm days, windy vs. calm situations, etc.) is beyond the scope of this work. This point is however addressed in Pisoni et al. (2021).
Note that spatial averages have a larger smoothing effect than temporal ones because they are bidimensional.

Methodological assumptions and uncertainties
In addition to referring to the SA method itself (Sect. 2.4), other modelling parameters need to be documented as well. We list the main ones hereafter.
One of the main assumptions attached to models is the spatial resolution and its potential impact on the calculation of the city contribution. While a coarse resolution might be able to capture relatively well the background (characterized by smoother fields), this will not be the case for peak concentrations within the city. The coarser the model spatial resolution, the larger the underestimation of the city's responsibility will be (De Meij et al., 2007).
Uncertainties may result from our incomplete knowledge of some model input parameters, in particular chemical processes and emission sources. Some urban emission sources are not well documented and are probably underestimated. This is the case for residential emissions, for which the inclusion of condensable organic species remains a question mark Simpson et al., 2020), or for the resuspension of particles generated by vehicles (Amato et al., 2014). On the other hand the spatial allocation for emissions can be uncertain for some sectors. These lacking or incomplete emission sources will lead to a potential misestimate of the city's responsibility.
On the meteorological side, the estimation of wind speed, planetary boundary layer (PBL) height, and/or turbulence intensity will largely influence the dispersion of city emissions and uncertainties, and these will therefore impact the calculation of city contributions. While the impact of meteorological parameterization on air quality has been extensively assessed from regional to urban cases (De Meij et al., 2009, 2015, 2018Jiang et al., 2020), only few studies assessed their importance to city contributions. One of these (Huszar et al., 2021) shows, for example, that the inclusion of an urban canopy meteorological forcing in multi-year simulations largely impacts the estimation of the city's responsibility. In the next section, we discuss the consequences of these results on policy, in particular when SA information is used to design air quality plans.

Implications for air quality strategies
Estimating a city's pollution contribution has important consequences in terms of air quality management. Indeed, an important city contribution will be a logic argument to support substantial control measures at the local level to abate pollution. The effectiveness of the control measures then relies on the relevance and accuracy of this city contribution; over-or underestimated city contributions potentially lead to inefficient measures. In previous sections, we see that the city contribution largely varies depending on the choices made for the SA setting parameters (definition of the indicator, source, receptor, and methodology), hence the challenge to obtain a relevant and accurate estimate to support local action.
Given the range of possible SA options and their impact on results, the first recommendation is obviously to report these SA setting choices together with the results to provide policymakers with the full picture and allow them to take informed decisions. This advocates for the use of the proposed nomenclature or a similar one that reports and details the choices in the SA approach, providing accountability of the method and enabling correct interpretation of the results. The proposed nomenclature can be understood as a documentation of the SA metadata information. Apart from this point on the importance of documenting SA approach choices, we show below that some of the SA settings are fixed by the purpose of the study. We provide suggestions for the remaining free choices.

METHOD: an approach based on potential impacts is recommended for SA
It is important to recall that not all SA methodologies are equally suited to support air quality planning. As mentioned by several authors (Burr and Zhang, 2011;Qiao et al., 2018;Mertens et al., 2018;Clappier et al., 2017;Grewe et al., 2010Grewe et al., , 2012Thunis et al., 2019), potential impacts are recommended when non-linear species are involved (which is the case for PM 2.5 and PM 10 but also for other species like NO 2 or O 3 ). It is worth reminding that tagging or incremental approaches are still used erroneously and believed to be suited for air quality planning purposes (Qiao et al., 2018;Guo et al., 2017;Itahashi et al., 2017;Timmermans et al., 2017;Wang et al., 2015;Hendriks et al., 2013). Although challenging practical issues are attached to potential impacts and may be seen as a burden (e.g. lack of additivity; see Appendix), they only reflect the complexity of the real processes that must be accounted for. It is true that uncertainties associated with the PI approach (e.g. imperfect emission inven-tory) may lead other SA methods to perform better in some instances because methodological biases compensate uncertainties; this is however coincidental. While uncertainties can be tackled and reduced to improve the approach, this is not the case for methodological biases. These points are extensively discussed in Thunis et al. (2019).
For the remainder of this section, focusing on policy aspects, only potential impact results are discussed. Fixing the methodology, however, still leaves free options in terms of indicator, receptor, and source. Figure 8 summarizes the variability in the SA results presented in the previous sections (i.e. Figs. 2, 4, and 6) to these free options. Differences in terms of the city's responsibility reach a factor of 2 on average for each of these remaining parameters, with much larger values for some cities.

INDICATOR: the indicator choice is driven by health and environmental objectives
The choice of the indicator is generally motivated by health or environmental considerations. Currently, the WHO guidelines (WHO2005, 2005) refer to the total PM 2.5 mass as the indicator correlating best with health impacts. These guidelines (or the Ambient Air Quality Directive, AAQD, limit values) are then the logical and most relevant indicator choice among the options presented in Sect. 3.1 and shown in Fig. 2. As illustrated by Fig. 8, evolving knowledge of health-related pollution impacts (i.e. the increased toxicity of some PM 2.5 constituents like those related to the traffic and residential activities) might, however, drive the choice towards more detailed indicators (e.g. PPM 2.5 ), leading to an increased responsibility for the cities.
4.3 SOURCE: importance of matching sources with governance levels Figure 8 shows that plans limited to city cores would be significantly less efficient than if applied at the FUA scale. On average over all cities, the efficiency decreases by a factor of 2, but larger differences occur in many cities. The source does not, however, represent a free choice in the context of policy practice. Indeed, authorities in charge of air quality (AQ) plans only have power to act on the area under their responsibility, which sets where measures apply. The same applies for the source temporal characteristic, fixed as the period of time during which measures apply. A good match between the SA settings and the temporal and spatial characteristics of the source is therefore important to provide meaningful support to policymakers.
4.4 RECEPTOR: drawbacks associated with spatialand temporal-averaging processes at the receptor As clearly shown in Fig. 5, spatial-averaging processes lead to a loss of information. In our example, a city-average-based SA would totally occult the city centre SA. It would lead to a strategy that mostly targets the background at the expense of the city centre, where the high concentration issues would not be solved. This is well illustrated by Amann et al. (2017), who analyse the responsibility of the city of New Delhi for its air pollution, both at a city centre hot spot receptor and in terms of city-average population exposure. In the first case, SA suggests acting on local sources, while in the second, SA suggests acting on regional sources. Spatial averaging drives the balance towards regional actions that will be less effective in solving the pollution issue at the city centre. The larger the city, the more important this shift will be. As illustrated by Fig. 8, there is a difference of more than a factor of 2 between city-averaged and hot spot indicators. Similar considerations apply to temporal averages. Figure 7 clearly shows that yearly average values hide the potential for effective local actions during wintertime and even more on specific days. Averaging implies merging, into one single number, locations and time instants that are characterized by different and sometimes opposite SA. This may lead to strategies that will not be efficient everywhere all the time. Whenever the final objective is to reduce a temporally or/and spatially aver-aged indicator (e.g. average population exposure), strategies would gain in efficiency with the following process: (1) perform SA and hierarchize the raw (not averaged) SA results into homogeneous spatio-temporal clusters; (2) design strategies on the basis of these clusters; (3) assess the strategy efficiency against the averaged indicator. The key here is to design strategies for raw or clustered results rather than averaged ones to prevent information loss. Note that designing a unique strategy based on multiple SA results (point 2 above) does not necessarily complicate the analysis as these different SAs will likely suggest action for different sectors of activity that can be combined in the final strategy.

Conclusions
Although air quality has improved in Europe over the last decades, in great part thanks to effective measures and consistent EU-wide legislation, pollution hot spots still remain in many European cities. The extent to which city emissions are causing these elevated urban pollution levels is however still a subject of scientific discussion. This can be explained by the complex processes driving the formation of some pollutants like PM 2.5 , for which there is not a simple relationship between emissions and concentrations (in other words, local emissions do not always imply local responsibilities). Source apportionment represents a useful technique to quantify the city's responsibility, but the approaches and applications are however not harmonized and therefore not comparable, resulting in confusing and sometimes contradicting interpretations.
In this work, we analysed how different SA approaches apply to the urban scale and how their building elements and parameters are defined and set. We identified the possible settings associated with four key steps in SA: indicator, receptor, source, and methodology. We showed that different choices for these settings lead to very large differences in terms of results. On average over the 150 large European cities selected as examples, the choices made for the indicator, the receptor, and the source each lead to an average difference of a factor of 2 in terms of the city's responsibility. These various options and the large differences that result highlight the difficulty of comparing results from different studies and stress the need to document the SA approach with its related metadata associated with the key four steps.
This work advocates for the use of a harmonized nomenclature to support the comparability of SA approaches. We propose the use of indexes and sub-indexes attached to the four key steps in any SA approach in a harmonized way to uniquely document the approach and enable correct interpretation of the results. We believe that the adoption of this nomenclature will provide clarity to the scientific discussion on different results and enable the correct interpretation of the results for policy applications. Even though this is applied to the specific case of PM 2.5 , the concepts presented here can easily be generalized to other pollutants.
In the context of supporting urban air quality plans, the SA configuration and most setting parameters are driven by the purpose of the AQ plan itself and by its associated constraints. While environment-and/or health-related considerations guide the choice of the indicator, the spatio-temporal characteristics of the source are strongly correlated to governance aspects. In other words, the source characteristics should reflect the governance levels to facilitate interpretation. Finally, the recommended SA method should be based on "potential impacts" to prevent misleading interpretations in terms of expected AQ plan outcome.
At the receptor level, temporal-and spatial-averaging processes lead to a loss of information, especially when diverging SA results are aggregated into a single number. Averaging process, in particular spatial, often lead to the favouring of strategies that target background sources while neglecting actions that would be efficient at the city centre. In our 150-city example, the impact of spatial averaging leads to an average difference of a factor of 2 in terms of the city's responsibility. Results differ not only from one city to the other and from one location to another in a given city, they also differ through time. To cope with this variability, we recommend using non-averaged SA results for the design of AQ strategies. Once clustered in homogeneous spatio-temporal classes, these can serve to understand where and when actions are most efficient. When implemented, the efficiency of abatement measures can then be assessed via spatially and temporally averaged indicators (e.g. city-average population exposure).
The responsibility of a city to its pollution is obviously city-dependent. But even for a given city, SA studies using different approaches and parameter settings will deliver very different outcomes. It is important to note that on top of departure from the methodological recommendations listed above, additional uncertainties and assumptions will most often lead to a systematic and important underestimation of the city's responsibility. We showed that on average over 150 Eu-ropean cities, departures in terms of source, receptor, and indicator may each lead to an underestimation by a factor of 2. This comes with important implications: if cities are seen as a minor actor, plans will target the background as a priority at the expense of potentially effective local actions.
Future work will consist of comparing spatially or temporally averaged SA results with SA results that are clustered in homogeneous spatio-temporal classes and assess the implications in terms of AQ strategy.

Appendix A
To illustrate the differences among SA methods, we use here the theoretical example schematically represented in Fig. A1. A city source (in red) emits with a Gaussian dispersion profile both primary PM (PPM) and a gas-phase precursor (NO x ). The background pollution (in blue) is composed of a mix of NO x , NH 3 , and PPM compounds. The various chemical reactions that take place are simplified here for convenience into a single reaction; 1 mol of NH 3 reacts with 1 mol of NO x to create 1 mol of ammonium nitrate (NH + 4 NO − 3 ), i.e. secondary PM (NO x +NH 3 +X →→ NH + 4 NO − 3 ). We assume here that the external compounds involved in the reaction (X) are abundant and do not have a limiting effect on the formation of PM. While the city emissions (source) remain unchanged, we modify the relative importance of the three background compounds so that the background becomes in turn PPM-, NO x -, and NH 3 -dominated. The PM concentration at a given location "x" is given by Based on the formulations provided in Table 1 and Eq. (4), the expressions to calculate the city and background components for the theoretical example presented above are detailed in Table A1. While these formulations are relatively straightforward for potential impacts and increments, it is more complex for the tagging method. The city tagging component is the sum of all PM species that are directly related to the city emissions. This includes PPM and NO 3 that are related to the PPM and NO x city emissions, respectively. For the background component, it includes PPM, NO x , and also NH 4 that is related to the NH 3 emissions. Tagging allows the NO x and NH 3 emitted compounds to be followed through their chemical processes and transformations until they create NO 3 and NH 4 , respectively, that can be attributed to their respective sources. As NO x is emitted by both sources, the total NO 3 must be fractionated and attributed to each single source. In our example, the NO 3 fraction attributed to the city depends on the ratio of the available NO x precursor at the location of interest β = NO x,city (cc) NO x (cc) . A similar process is used to calculate the background component.
This example is used to compare the increment (INC), tagging (TAG), and potential impact (PI) SA approaches. Table A1. Formulations for the potential impacts, increments, and tagging approach for the example presented in Fig. A1. The indicator for all methods and components is the total particulate matter mass (PM). The SA method is indicated as superscript (PIα, INC, or TAG), whereas the source (city or bg) is in subscript. The receptor is the city centre (cc), while the rural location selected for the increment approach is denoted by "bg". For the tagging, the source subscript is also expressed directly as emissions (E) distinguishing each compound (within brackets).  Figure A1. Schematic representation of the theoretical example used to compare the three SA approaches. The city source (in red) emits NO x and PPM. The background (in blue, including other cities as well as rural sources) is composed of NO x , PPM, and NH 3 in different relative proportions (indicated by the arrow). The "cc" and "bg" symbols represent the city centre receptor and the background location used for the increment approach, respectively. Figure A2 shows the city and background contributions obtained with the three SA methods, differentiating two options for the PI one: 100 % (PI100) and 20 % reduction in the sources (PI20). The figure also distinguishes four situations characterized by different background compositions.
-No background. When no background is present (top left), the city NO x emissions do not form PM, only PPM emissions do. In such cases, all methods deliver the same response.
-PPM background. When the background is composed of PPM only (top right), no secondary species are formed. All methods agree, with the exception of the increment approach. This is due to the non-fulfilment of one of its underlying assumptions, i.e. the lack of spatial homogeneity of the background, which affects the rural and city locations differently (indicated by "cc" and "bg" in Fig. A2, respectively).
-SEC background with NH 3 > NO x . When secondary background precursors (NO x and NH 3 ) reach the city (bottom row), SA methods deliver different results because they manage non-linear processes differently. When NH 3 is more abundant than NO x (bottom left), the PI100 method does not preserve additivity (discussed in Sect. 2.4); i.e. the sum of the two components exceeds the total PM concentration. As seen from the results and also from Table A1, this is not the case for the increment and tagging approaches that are constructed to be additive.
-SEC background with NH 3 < NO x . When NH 3 is less abundant than NO x (bottom right), differences remain important between the tagging, potential impact, and increment approaches, but additivity is preserved for both PI100 and PI10, which provide identical responses.