Estimating lockdown induced European NO 2 changes

. This study provides a comprehensive assessment of NO 2 changes across the main European urban areas induced by the COVID-19 lockdown using satellite retrievals from the Tropospheric Monitoring Instrument (gradient boosting) along with an assessment of the biases that can be expected from methods that omit the influence of weather. We also compare the weather-normalized satellite NO 2 column changes with both weather-normalized surface NO 2 concentration changes and simulated changes by the CAMS regional ensemble, composed of 11 models, using recently 35 published emission reductions induced by the lockdown. We show that all estimates show the same tendency on NO 2 reductions. Locations where the lockdown was stricter show stronger reductions and, conversely, locations where softer measures were implemented show milder reductions in NO 2 pollution levels. Regarding average reductions, estimates based on either satellite observations (-23%) surface stations (-43%) or models (-32%) are presented, showing the importance of vertical sampling but also the horizontal representativeness. Surface station estimates are significantly changed when sampled 40 to the TROPOMI overpasses (-37%) pointing out the importance of the variability in time of such estimates. Observation based machine learning estimates show a stronger temporal variability than the model-based estimates.

The lockdowns are expected to have large effects on urban NO2 air pollution levels. A number of studies used surface 65 measurement sites. For example, Wang et al. (2020a) showed that lower emissions from motor vehicles and secondary industries were most likely responsible for the observed decreases of NO2 concentrations in China during January-March 2020. Collivignarelli et al. (2020) showed that major NO2 reductions occurred in Milan, a city that showed rapid increase of cases early in the European COVID-19 crisis (February 2020) and was one of the first cities to be put into lockdown in Europe.
Accounting for the effect of the meteorological variability, Petetin et al. (2020), highlighted a strong reduction of surface NO2 70 concentrations across most Spanish urban areas during the first weeks of lockdown.
The first quarter of 2020 had specific and very changing meteorological conditions. The storm Ciara crossed over Europe in the second week of February followed by the storm Dennis that also crossed Europe a week later. Both extratropical storms generated strong winds over the northern half of Europe (above 45°N) from February 9th, 2020 until February 18th, 2020. Strong wind situations, yet milder, over the Iberian Peninsula, the southern part of France and the northern part of Italy 75 were also generated by storms Karine and Myriam in the first week of March. Moreover, February and March 2020 displayed stronger positive temperatures anomalies over Europe in comparison with February and March 2019 (https://surfobs.climate.copernicus.eu/stateoftheclimate). Such weather anomalies however did not persist further during the second quarter of 2020. Air quality modelling prediction systems represent the evolution of pollutants in the atmosphere accounting for changes in weather using numerical weather prediction data. The Copernicus Atmospheric Monitoring Service 80 (CAMS) produces European air quality forecasts and analyses daily using an ensemble of 11 models and the European Centre for Medium-range Weather Forecasts (ECMWF) data as input ensuring unique reliability and quality (Marecal et al., 2015).
Using scaling emission factors to account for lockdown measures such system can be used to estimate lockdown reductions and account for the weather variability (Colette et al., 2020).
Several studies used the recently launched (October 2017) Tropospheric Monitoring Instrument (TROPOMI, 85 Veefkind et al., 2012) on board the Copernicus Sentinel-5 Precursor satellite to showcase the NO2 reductions due to the COVID-19 lockdown. Due to the young age of the instrument it is impossible to work with a climatological baseline that would use at least several years to assess the lockdown reductions. Often satellite data from 2020 are compared with data from 2019 sometimes over short time periods. For example, Muhammad et al. (2020), compared full March 2019 averages with the 14-25 March 2020 average for Europe, amongst other estimates for other regions. Bauwens et al. (2020) provided a more in-90 depth assessment of NO2 column reduction estimates by using similar year to year methodology, i.e. comparing 2019 to 2020.
A number of TROPOMI NO2 studies on COVID-19 lockdown reductions give little weight to the synoptic meteorological conditions and how they could potentially flaw the estimates. Zambrano-Monserrate et al. (2020) and Nakada et al., (2020) showed maps of TROPOMI for short time periods comparing 2020 with 2019 for Europe, Asia and South America with no clear quantitative and robust assessment of the underlying weather conditions. Wang et al. (2020b), used differences of 95 TROPOMI NO2 images over China before and during the lockdown to illustrate the impact of the lockdown on air pollution, but do not emphasise on how those differences might be affected by differences in weather conditions. In contrast, Schiermeier (2020) mentioned the 'weather factor' early on in the COVID-19 crisis which can affect strongly the pollution levels. And https://doi.org/10.5194/acp-2020-995 Preprint. Discussion started: 7 October 2020 c Author(s) 2020. CC BY 4.0 License. studies as for example  showed 2019 and 2020 TROPOMI NO2 comparisons but acknowledged the impact of weather anomalies on pollution levels. Only very recently a weather-normalization technique has been applied to estimate NO2 100 changes across cities in the US based on TROPOMI (Goldberg et al., 2020). Also, insufficient importance and clarity are given about the fact that satellite data used in such analyses are conditioned by the cloud coverage, revisit frequency and quality flag.
Ignoring or not acknowledging such information can also lead to flawed satellite based estimates and provide misleading information (https://atmosphere.copernicus.eu/flawed-estimates-effects-lockdown-measures-air-quality-derived-satelliteobservations). 105 In this paper, we aim to first illustrate how misleading it is to consider non-weather normalized TROPOMI estimates for assessing changes in NO2 induced by lockdown measures. We focus on Europe and provide a method that accounts for weather variability or more broadly speaking estimates what TROPOMI would have measured in Spring 2020 under "business as usual" (BAU) emission forcing (i.e. without any lockdown measures) in section 2. We then aim to provide a comprehensive assessment of the European lockdown induced NO2 changes. We compare the satellite estimates against what can be inferred 110 from surface observations that also account for weather variability in section 3. We compare also with model-based scenarios using ad hoc bottom-up emission inventories reflecting lockdown restrictive measures in section 4. We summarize and confront all the presented estimates in section 5.

Dataset and analysis periods 115
We use the operational Copernicus Sentinel 5 Precursor (S5P) TROPOMI NO2 level 2 product, for which data have been available since 28 June 2018. These observations are tropospheric columns (from the surface to the top of the troposphere) with a pixel resolution of 5.5km by 3.5km since 6 August 2019 and 7km by 3.5km before. The instrument can have up to daily revisit upon clear sky condition and in this study, we are making use of a quality flag provided with the retrieval, the so called "qa" flag, and only selecting good quality data, i.e. qa > 0.75. This removes cloud-covered scenes, errors and problematic 120 retrievals (Eskes et al., 2019). We have binned the data on a regular 0.1°x0.1° grid in order to perform statistical analyses and to facilitate the processing of timeseries for locations of interest, i.e. large European cities in this study (see section 2.2) as well as the comparison with other datasets such as the 0.1°x 0.1° CAMS regional air quality models (Marecal et al., 2015) and the 9km European Centre for Medium range Weather Forecasts (ECMWF) weather forecasts.
In this study we consider February, March and April 2020 and 2019 to assess the changes seen in TROPOMI NO2 columns 125 due to COVID-19 restrictions over Europe. Even though the lockdown conditions and dates vary between countries, after 15 March can be considered as a European lockdown as in the middle of the approximate 2-week transition period (e.g. 9 March 2020 in Italy and 23 March 2020 in the UK). We choose to limit our comparisons to the period up to the end of April as a large portion of countries eased up their lockdown restrictions from the beginning of May onwards. To have an equivalent prelockdown period we then include 1 February until 15 March.

Non weather normalized TROPOMI NO2 column changes estimates
Changes of NO2 tropospheric columns associated with the lockdown measures can be estimated by comparing NO2 levels observed during the lockdown period in 2020 with a given baseline. In this section, we compare the results obtained with three different baselines : (1) the NO2 levels observed during the pre-lockdown period in 2020 (hereafter referred to as the "beforeduring" approach), (2) the NO2 levels observed during the same period of the year in 2019 (hereafter referred to as the "year-165 to-year" approach), and (3) an machine learning-based estimate of the business-as-usual NO2 levels that would have been observed during the lockdown period in 2020 in the absence of lockdown measures. We focus our study on largest European urban areas exceeding 0.5 million inhabitants, making a total of 100 locations. Assessing the changes of NO2 tropospheric columns from satellite observations is more challenging over rural areas as the NO2 levels are much lower than over urban areas. Signal to noise ratio is significantly low in rural areas thus estimates are very sensitive to small changes in the 170 tropospheric columns. We use the TROPOMI NO2 re-gridded 0.1° x 0.1° averages filtered according to relevant quality flags (see section 2.1) and choose the pixels closest to the European city centres and that have more than 5 data points per period defined in Table 1. We first show the reduction estimates over Europe as calculated by examples of non-weather-normalized estimates. The "before-during" estimate corresponds to the difference between pre-lockdown and the lockdown periods. Figure   3 shows changes calculated in that way for 2020 ( Fig. 3b) and equivalent in 2019 (Fig 3a) for comparisons. This method shows 175 drastic NO2 reductions of more than 75% in 2020 for most of large urban areas of Southern Europe. Reductions are not obvious over some of Northern Europe and show strong variations from one city to another. For example, over the UK and Germany some urban areas show increases well above 30% while other urban areas show reductions even though the same lockdown measures were applied nationwide. Applying the same method to data from 2019 shows strong decreases of NO2 levels in many major European urban areas between the corresponding pre-lockdown and lockdown periods. This illustrates that such 180 "before-during" type of satellite comparisons is misleading and unfit for assessing the effects of COVID-19 lockdown because it is very sensitive to seasonal variations of weather regimes and emissions. The "year-to-year" approach has been widely used in scientific publications and web press releases is based on comparing 2020 to 2019 data over the period of interest. Figure 4 shows such "year-to-year" estimates for the pre-lockdown (Fig 4a) and 190 lockdown (Fig 4b) periods. During the lockdown an overall reduction is seen all over Europe with more moderate reductions over Southern Europe as compared to the "before-during" estimates (see Figure 3). Northern Europe changes do not show strong city dependent variations and an overall decrease that is not as strong as in Southern Europe. However, looking at the pre-lockdown estimates, Northern Europe shows drastic negative changes, that are actually larger than during the lockdown period, where such deviations from the BAU levels should not be expected. The "year to year" method is strongly dependent 195 on the interannual NO2 variability even in the BAU situation, where meteorology plays a crucial role. Therefore, this method can lead to large errors when assessing differences in NO2 levels and more generally the pollution level reductions due to the COVID-19 lockdown.

205
The weather-normalization method accounts for weather variability to estimate the net changes of NO2 induced by the lockdown in urban areas. We have simulated NO2 tropospheric columns as TROPOMI would have measured in BAU conditions for 2020, i.e. in the absence of lockdown restrictions. Using meteorological and air pollution predictors to build a simplified model for satellite tropospheric observation simulations or predictions for atmospheric composition have been used in previous studies (e.g. Worden et al., 2013, Barré et al., 2015. In this study, we use a novel approach for satellite observation 210 simulation based on the Gradient Boosting Machine (GBM, Friedman, 2001) regressor technique. GBM is a popular decision tree-based ensemble method belonging to the boosting family. We use weather and air quality variables as predictors from the ECMWF and CAMS operational forecasts at 9km and 0.1° resolutions respectively: 10m wind speed and direction, planetary boundary layer height, 2m temperature, surface relative humidity, geopotential at 500hPa, NO2 concentrations from the CAMS regional ensemble forecasts (no assimilation) but also latitude, longitude, population, Julian date (number of days since January 215 details) was performed using a grid search method with 5-fold cross-validation and using the ranges indicated by Petetin et al.
(2020) that set up a similar method using surface stations. Contrary to Petetin et al. (2020) that trained one ML model per surface air quality monitoring station, only one single ML model is trained here for all cities, due to the small dataset available (about 10,000 data points, see Table 2). After the hyperparameter tuning and evaluation of the model, the observation BAU predictions have been generated using 100% of the January-May 2019 dataset in order to use the maximum amount of data 225 points possible. Detailed scores of the performance of the gradient boosting regressor with respect to the real observations such as mean bias (MB), normalized mean bias (nMB), root-mean-square error (RMSE), normalized root-mean-square error (nRMSE) and the Pearson Correlation Coefficient (PCC) can be found in Table 2. The statistics on both training set and test set show similar results such as low bias, good correlation but significant RMSE. Results indicate that there is no sign of overfitting in 235 the predictions. Since TROPOMI data are available only from mid-2018, the training set is relatively small. For this reason, the predictions are featuring significant RMSE values and will have a large random error. Such RMSE values stay however in the range of surface site air quality machine learning predictions as shown in Section 3 and Table 3. The low mean bias and high correlation values indicate that the main BAU NO2 tropospheric column variability is represented without large systematic errors. Subtracting the BAU NO2 simulated columns with the actual observed NO2 columns during the lockdown period (from 240 16 March 2020 to 30 April 2020) gives us an estimate of the reductions on the NO2 background levels on the major European urban areas. Figure 5 shows the ML-based BAU equivalent estimates for the pre-lockdown and lockdown periods. The estimates are based on the median value of the real observation minus simulated BAU observation distributions. We choose to display the median and not the mean as the ML estimates are generating strong outliers due to the small training set used. The pre-lockdowns ML 245 estimates do not show as strong overall reductions as in the "year-to-year" (Fig. 4) or "before-during" (Fig. 3) estimates. A summary of the results is provided in Table 3 displaying the average and the standard deviation of NO2 changes across all European urban areas considered. The "before-during" and the "year-to-year" approaches also show stronger reduction estimates on average during 2019 and the pre-lockdown period, respectively. Such methods also display a stronger standard deviation across cities than the weather-normalization methods, which suggests substantial biases in the former due to the 250 omission of meteorological variability.  the standard deviation is a metric of the inter urban area spread. 260

MB
The weather-normalization method is not devoid of uncertainties. While a value close to zero would be expected during the pre-lockdown period, the method estimates a slight overall reduction of around -8%, partly due to the shortage of training data. It does however perform much better than the year-to-year and before-during methods that estimate a -24% and -30% reduction, respectively, during the pre-lockdown period. In the case of the lockdown period the weather parameter 265 distributions are much more similar between 2019 and 2020 ( Figure 2) and on average across Europe the "year-to-year" and weather-normalized estimates show results in the same range.

Surface station estimates
We estimated the impact of the COVID-19 lockdown on surface NO2 pollution in European areas using the methodology introduced by Petetin et al. (2020), applied to up to date (i.e. unvalidated real-time) hourly NO2 data from the 270 European Environmental Agency (EEA) AQ e-Reporting (EEA, 2020). We first selected the urban/suburban background stations located within 0.1° from the city centres and applied the quality assurance and data availability screening described in Petetin et al. (2020), using the GHOST metadata (Globally Harmonised Observational Surface Treatment, Bowdalo et al., 2020, in preparation). A total of 164 stations in 77 urban areas were selected. At each station (independently), we estimated the business-as-usual NO2 mixing ratios that would have been observed during the lockdown period under an unchanged 275 emission forcing. This was done using GBM models fed by meteorological inputs (2-m temperature, minimum and maximum 2-m temperature, surface wind speed, normalized 10-m zonal and meridian wind speed components, surface pressure, total cloud cover, surface net solar radiation, surface solar radiation downwards, downward UV radiation at the surface and boundary layer height) taken from the 31km horizontal resolution ERA5 reanalysis dataset (Hersbach et al., 2020) in addition to other time features (date index, Julian date, weekday, hour of the day). Using the ERA5 reanalysis data set has a consistent 280 model version over time but a lower resolution (31km) in comparison to the ECMWF high resolution 9km operational forecasts used in the TROPOMI estimates. All GBM models were trained and tuned during the past 3 years (2017-2019) and tested in 2020 before the lockdown. Using the last three years is long enough to capture weather variability at each site, but not too long with regards to long-term reduction of NO2 happening as a result of policy measures across Europe. Contrary to Petetin et al. (2020) that predicted BAU NO2 at the daily scale, the ML models developed here are predicting NO2 at the hourly scale (in 285 order to get results collocated in time with TROPOMI overpasses, see below). We then deduced the weather-normalized NO2 changes due to the lockdown by comparing observed and ML-based BAU NO2 mixing ratios. To account for the potential error due to the satellite sampling we provide two estimates: either with complete hourly 300 sampling or filtered according to with S5P satellite overpass time (13:30 local solar time) and qa filtering (clear sky only) sampling for a stricter comparison with the results discussed in Section 2. Figure 6 displays relative change estimates, showing the median of the distributions for each European city above 0.5 million inhabitants. Overall the estimates for both samplings are broadly consistent, with mean NO2 changes of around -37% and -43% for the hourly sampling and the S5P overpass sampling, respectively (Table 4). Northern Europe (particularly Germany, Poland and the UK) displays larger reduction with 305 the estimates at satellite overpass time. In general, those NO2 relative changes based on surface in-situ concentrations are larger than the ones based on NO2 tropospheric columns. This is expected as NO2 surface site measurement do not directly translates to the TROPOMI NO2 tropospheric column, which is the integrated NO2 content from the surface to about the 200hPa altitude.
Due to the short lifetime of NO2, only marginal lockdown induced changes of free tropospheric NO2 contents are expected.
Changes are mainly expected near-surface and within the PBL. Also, even if the stations are grouped within a 0.1° range from 310 the city centres the representativeness of surface observations used might not be similar to a 0.1°x0.1° pixel, depending on the surface station coverage on each city. This can also exacerbate the difference between surface and tropospheric columns reduction estimates.

CAMS regional ensemble model estimates
Model estimates have been calculated using the CAMS European regional air quality forecasting framework, which 320 is an ensemble of 11 models (CHIMERE, DEHM, EMEP MSC-W, EURAD-IM, GEM-AQ, Lotos-Euros, MATCH, MINNI, MONARCH, MOCAGE, SILAM, Marécal et al. 2015). Using such a multi-model approach is useful to minimize the imperfections in each model's formulation. Two sets of models hindcasts have been conducted using two different emissions scenarios: BAU emissions and reduced COVID-19 lockdown emissions. The emission inventory used for the BAU reference simulation is the same that is used in the daily Regional Air Quality Forecasts of CAMS for Europe, i.e. CAMS-REG-AP (v3.1 325 for the reference year 2016). It is compiled by TNO under the CAMS emission Service, based on official emissions reported by the countries to the EU (NEC Directive) and UNECE (LRTAP Convention /EMEP) (Kuenen et al., 2014;Granier et al., 2019). The spatial resolution of the emissions is 0.1°x0.05° but later re-gridded at 0.1°x0.1° to match the models' grid. The alternative emission scenario, corresponding to the lockdown period, was derived by combining the original CAMS-REG-AP inventory with a set of country-and sector-resolved reduction factors . For the present work, time 330 invariant emission reduction factors where proposed by country and for three activity sectors: manufacturing industry, road transport, and aviation (landing and take-off cycles) that are reduced on average by 15.5%, 54% and 94%, respectively. These sectors were considered to be the most affected by changes in activity during lockdown (Le Quéré et al., 2020). The reduction factors were computed from collections of near-real time activity data, such as Google Community Mobility Reports (Google LLC, 2020) for road transport, airport statistics from Flightradar24 (2020) for aviation and electricity load information from 335 ENTSO-E (2020) for industry. More details about the emission scaling procedure can be found in Colette et al. (2020) where the resulting country and activity sector dependent reduction factors are provided for the EU28 countries plus Norway and Switzerland. The largest reductions are observed in those countries where lockdown restrictions were more stringent, such as Italy, Spain and France. All the models operated with the exact same setup as the CAMS regional operational production. The modelling domain covers Europe at 0.1°x0.1° resolution. The meteorological and chemical boundary conditions are obtained 340 from the Integrated Forecasting System (IFS) of ECMWF, which is the same system that provides part of the dataset for the ML-based estimations (see sections 3.1 and 2.2). The reference simulation was using the BAU anthropogenic emissions as described above and the lockdown scenario was using the same lockdown inventory, modulated by country and activity sectors.
From the two sets of 11 model simulations the median at each grid point is calculated from an ensemble simulation (as is routinely done for the operational CAMS predictions, Marecal et al., 2015). Differences between the BAU ensemble and the 345 lockdown scenarios ensemble are then used to calculate model reduction estimates. Figure 7 displays the relative change estimates for each European city with more than half a million inhabitants, calculating the medians of the full hourly distribution (Fig. 7a) and of the distribution at qa filtered S5P overpasses times and dates only (Fig. 7b). As expected, urban areas in stricter lockdown countries (i.e. Spain, Italy, France) show the largest reductions (e.g. down to 60% in Madrid, see Figure 8) whereas urban areas with softer lockdown measures (i.e. Germany, 350 Poland, Sweden) show milder reductions (e.g. around 16% in Stockholm, see Figure 8). The time sampling difference (hourly versus S5P overpass) does not affect the model estimates much, only few percent differences are seen for most of the European urban areas. In comparison, the surface station estimates show more sensitivity to the time sampling. Table 4 summarises the overall European reduction estimates. On average, the S5P overpass sampling changes the estimates by around -6% for surface station estimates and -1.5% for model estimates. This could suggest a dependence between the time of day and the reduction 355 level (e.g. traffic emissions are peaking daytime hence more reduction should be expected during the day). This topic needs to be further investigated.   estimates, i.e. the standard deviation is a metric of the inter urban area spread.

Summary and Discussions
In this paper, we first show the importance of accounting for weather variability in satellite-based estimates of NO2 changes due to the COVID-19 lockdown. While focusing over Europe and using the TROPOMI instrument, we show that the satellite estimates based on direct comparisons between different time periods without accounting for weather variability can 375 be flawed and should not be used for this kind of assessments. To account for weather variability in satellite estimates, we use a recently developed methodology based on the gradient boosting machine learning technique. This methodology has proven to be efficient with surface sites to estimate lockdown induced changes over Spain . We extended those surface estimates over Europe to compare with the satellite estimates. Finally, we included NO2 changes estimates predicted by the 11 models CAMS regional ensemble, using emission reduction factors representative of the lockdown period. By 380 providing and comparing the three different methodologies we provided a comprehensive and complementary assessment of NO2 pollution level changes during the COVID-19 European lockdown. Providing such assessment is crucial to accurately quantify the lockdown pollution changes for air quality policy but also for the impact on the COVID-19 pandemic itself.
Several studies have investigated the correlation between the high level of COVID-19 mortality and atmospheric pollution (e.g. Contincini et al. 2020, Ogen et al. 2020, Achebak et al., 2020. Feedbacks are then to be expected between the effects of 385 short-term air pollution exposure on COVID-19 mortality and lockdown measures. In Figure 8 and Table 4 we summarize the results of this study. While Table 4 shows the average reduction with the inter urban area variability over Europe, Figure 8 shows the difference between the estimates per urban area. For clarity and relevance, we choose to display only urban areas that are above 1 million inhabitants. The three weather normalized estimates agree on identifying stronger reductions where more severe lockdown measures were implemented. As shown in Section 2 390 satellite estimates show a relationship between NO2 tropospheric columns reductions and the extent and generalization of restrictive measures in each country. A similar relationship is observed for surface sites and model estimates (Sections 3 and 4). The largest NO2 reduction estimates of around 50% to 60% for both surface and tropospheric column are found for Spanish, Italian and French urban areas concentrations. In countries that adopted softer lockdown measures urban areas show lower reductions, e.g. Germany, Netherland, Poland or Sweden. Although significant discrepancies exist between the satellite, 395 surface and model estimates in urban areas such as for example Naples (Italy), Sofia (Bulgaria), Katowice (Poland), the three methods provide overall a consistent broad picture. This is remarkable to note particularly as satellite data are concerned and this result contributes to establish their usefulness for urban air quality and not only for atmospheric pollution in general.
Machine learning observation-based estimates display more spread that includes a stronger variability than model estimates. In Figure 8, satellite and surface observation ML estimates show large interquartile ranges, with larger ranges with 400 satellite for certain urban areas. Such large ranges show that there is a strong spread in the ML based estimates that is not seen in the model-based estimates. Model estimates are induced by emission country dependent reduction/scaling factors that are constant over time. In that case variability is induced by the changes in atmospheric conditions but not by changes in the emissions. The estimates from the ML approach can represent the transition into the lockdown where emission gradually decreased. This is contributing to the increased spread seen in the ML estimates. Scores from ML estimates (see table 2 and 405 4) also show significant RMSE that can add noise to the time series and then add to the resulting spread of the distributions.
Stronger spread in TROPOMI estimates are likely due to the small training set used. Also, as time goes more TROPOMI data will be available to strengthen the reliability of the method. Disentangling the noise and the actual variability would need to be carefully done in further works.
In all the different estimates presented above we tried to be consistent in scales using 0.1°x0.1° TROPOMI averaged 410 pixels that match the CAMS forecasts and background stations within a 0.1° range from the city centre. Some urban areas considered in this study likely display a background footprint that is finer than 0.1°. The differences seen between surface station estimates and gridded estimates (models and satellites) point out such possible representativeness issues. Resolution representativeness is a difficult and important topic and deserves further research as it would require higher resolution modelling forecasts and an observation network at a resolution finer than 10km. 415 Satellite overpass local times and presence of clouds in the measurement pixel can potentially influence the reduction estimates using TROPOMI data. We considered 1.5 months to compute the satellite reduction estimates. Overall the sample size (S5P valid overpasses) in Figure 8 ranges between 14 (Sevilla) and 37 (The Hague). On the same Figure 8, surface sites and model estimates are displayed for hourly and S5P sampled estimates. Smaller or larger samples cannot explain discrepancies between all the different estimates. Results however can be affected when the sample size becomes statistically 420 very small and if shorter time periods (e.g. 1 or 2 weeks) are considered for satellite reduction estimates. Very small samples were not considered in this study. Sampling also shows greater changes in the surface station estimates than in the model estimates. This can suggest that the lockdown-induced reduction estimates could also depend upon the time of the day. Further and more detailed research is needed on this topic.
Finally, tropospheric columns NO2 reduction estimates are mostly smaller than the NO2 surface estimates (sites and 425 model). The different nature of the vertical sampling (tropospheric columns versus surface concentrations) affects the relative reduction estimates. Some exceptions can be seen in certain Spanish and Italian urban areas where column estimates are close to the surface estimates, but overall column reductions are weaker. Further work will be needed to link quantitatively tropospheric columns and surface levels variations. Including sampling the model estimates using an observation operator commonly used in data assimilation and inverse modelling systems. This important work will be carried out in a further study. 430

Annex A. Gradient Boosting Regressor Tuning 445
We have used TROPOMI data from 2019-01-01 to 2019-05-31 to train our machine learning simulator. We used the gradient boosting regressor function included in the scikit-learn python library. For validation purposes, the data set has been split between a training set (90% of the total dataset) and a test set (10% of the total dataset) using the train_test_split function. The hyperparameter tuning is then using the training set to generate the simulators and test set to find the best fit. Similarly, to 450 Petetin et al. (2020) the learning rate was fixed to 0.05 and the number of features (max_features) is set to "sqrt". In addition, the tuning of the gradient boosting regressor was done for the following hyperparameters using the grid search method. The following hyperparameters were tuned: the subsample (subsample : from 0.3 to 1.0 by 0.1 with a best value of 0.9), the number of trees (n_estimators: from 50 to 1000 by 50 with a best value of 400) and the minimum sample in terminal leaves (min_samples_leaf : from 1 to 30 with a best value of 22). We use the default 5-fold cross-validation. We then test the final 455 results on the test set in order ensure not overfitting.