Referee comment on "Quantification of CH4 coal mining emissions in Upper Silesia by passive airborne remote sensing observations with the MAMAP instrument during CoMet"

The authors should better describe the key findings of the study. The article provides a lot of technical detail and, as a result, the reader cannot see the “forest” behind the “trees.” The authors provide too many peculiarities, so the article could benefit from adding general conclusions about methane emissions in the basin. The article is quite long, so part of the technical material can be moved to a supplement. It would be great to have some information about coal production in the Upper Silesian Coal Basin. What is the annual coal production? What is the rank of the coal? What is the methane content of the coal? How do the mines report emissions? The key novelty of the article is the comparison of measured data with emission inventories. What can be done to improve the accuracy of emission inventories? How the result of the study can help?

supporting the flight planning and the interpretation of the observations. Consequently, CH 4 emissions originating from~54 coal mine ventilation shafts distributed over an area of around 60 × 40 km 2 could be investigated on different scales, ranging from single shafts over smaller clusters up to the entire basin.
In this study, we will focus on CH 4 column anomalies retrieved from spectral radiance observations, which were acquired by the 1D nadir-looking passive remote sensing Methane Airborne MAPper (MAMAP) instrument, using the Weighting Func- The release of greenhouse gases from anthropogenic activity significantly influences the atmospheric surface temperature and the Earth's climate (Stocker et al., 2013). Consequently, there is a well recognized need to reduce these emissions (Fesenfeld et al., 2018;UNFCCC, 2015UNFCCC, , 1998. The largest impact on the surface temperature results from the increase in carbon dioxide 35 (CO 2 ), which exerts a radiative forcing (RF) of~1.8 W m −2 (Etminan et al., 2016). The second most important man made increase in radiative forcing results from the increase in methane (CH 4 ) with~0.6 W m −2 (Etminan et al., 2016). However, on a per mass basis, CH 4 is 34 times more efficient in trapping heat in the Earth's atmosphere over 100 years than CO 2 (Myhre et al., 2013, including climate-carbon feedbacks). Moving to shorter time scales (e.g., 20 years), the effectiveness (or the global warming potential, GWP) of CH 4 rises to 86 times that of CO 2 (Myhre et al., 2013, including climate-carbon feedbacks). The 40 relatively high GWP of CH 4 in combination with a relatively short atmospheric lifetime of around 9 years (Prather et al., 2012) makes CH 4 an attractive target for short term emission and, thus, climate mitigation strategies (Saunois et al., 2016;Shindell et al., 2012).
To reduce methane emissions, their emission strengths and also locations need to be known. However, current knowledge is inadequate as evidenced by the discussion about the origin of increasing atmospheric CH 4 concentrations observed since 45 2007 (Dlugokencky et al., 2011). Depending on the applied methodology (e.g., measuring ethane-to-methane ratio or isotopic analysis), authors either conclude that CH 4 emissions from fossil fuels Hausmann et al., 2016;Helmig et al., 2016;Turner et al., 2016) or from wetlands and agriculture (Nisbet et al., 2016;Schaefer et al., 2016;Schwietzke et al., 2016) have increased or that the increase in atmospheric CH 4 is even related to a decline in atmospheric OH, which removes the CH 4 (Rigby et al., 2017;Turner et al., 2017). Interestingly, even though Schwietzke et al. (2016) concluded that the increase 50 is mostly related to wetlands and agriculture, they further stated that global emissions from the fossil fuel industry could be ~40 % higher than previously expected by Saunois et al. (2016). A study by Petrenko et al. (2017) supports this hypothesis and finds indications that even this revised number might be too low by at least 25 %. A recent study from Jackson et al. (2020) also concluded that the global increase in atmospheric CH 4 has been mostly driven by anthropogenic emissions and natural CH 4 emissions remained almost unaltered between the period 2000 -2006 and 2017. However, not only globally, but also 55 on smaller scales our knowledge and characterization of fossil fuel CH 4 emissions is inadequate (e.g., Buchwitz et al., 2017;Maasakkers et al., 2016;Alexe et al., 2015;Turner et al., 2015).
A large source of anthropogenically emitted CH 4 originates from coal mining. It globally accounts for around one-tenth of the anthropogenic CH 4 emissions of about 350 Mt CH 4 yr −1 (Saunois et al., 2016(Saunois et al., , 2020. China, the largest emitter of CH 4 from coal mining, is responsible for~50 % of the global total (EPA, 2012). The share of the European Union is around 4 %, 60 with the largest contribution originating from Poland. This country is also home to the largest contemporary hard coal mining area in Europe, located in the Upper Silesian Coal Basin (USCB), occupying around 7400 km 2 (Gzyl et al., 2017) in total, and extending into the Czech Republic (compare to Fig. 1, area in Poland is around 5400 km 2 ).

CoMet measurement campaign and instrumentation
The CoMet research campaign took place in early Summer 2018 with one of its goals being the investigation of coal mining 95 emissions from the largest European CH 4 emission hot spot, the USCB (between~18.3 • -19.2 • E and~49.9 • -50.3 • N) in Poland. CH 4 is emitted by over 50 coal mine ventilation shafts in that area occupying around 60 × 40 km 2 . However, common inventories (Crippa et al., 2020;Janssens-Maenhout et al., 2019;Scarpelli et al., 2020) provide CH 4 emissions only at a coarse spatial resolution of 0.1 • × 0.1 • (translating to~7 × 11 km 2 in the discussed area). Consequently, for optimal flight planning and also subsequent assignment of observed CH 4 enhancements to specific CH 4 sources, the CoMet team generated a more 100 detailed inventory. This inventory, hereafter referred to as CoMetv3 and described in further detail in Sect. 2.4, comprises annually reported CH 4 emissions of about 530 kt CH 4 yr −1 , which are assigned to 54 exactly geolocated active ventilation shafts found in the region. Figure 2 shows the area and the ventilation shafts under consideration.  To investigate these CH 4 emissions on different scales ranging from single shafts over smaller clusters up to the entire basin, a variety of observation platforms and instruments were deployed in the USCB during May and June in 2018. The

105
two key instruments were the airborne passive remote sensing instrument MAMAP (operated by the University of Bremen, Gerilowski et al., 2011) and the airborne active remote sensing instrument CHARM-F (CO 2 and CH 4 Remote Monitoring -Flugzeug, Amediek et al., 2017;Fix et al., 2015;Quatrevalet et al., 2010, operated by DLR, Deutsches Zentrum für Luft -und Raumfahrt) installed aboard a Cessna aircraft operated by the FUB (Freie Universitaet Berlin) and a Gulfstream G550 (HALO, High Altitude and Long Range Research Aircraft) operated by DLR, respectively. Due to its long range capabilities, the HALO 110 aircraft operated out of Oberpfaffenhofen (EDMO), Germany, whereas the FUB Cessna was deployed at the Katowice airport (EPKT), Poland, located at the northern edge of the mining area (compare to Fig. 2). Additional observations comprised airborne in situ concentrations of CH 4 and CO 2 by the FUB Cessna, by the HALO aircraft, and by a second Cessna Caravan (also operated by the DLR, Fiehn et al., 2020;Kostinek et al., 2019). The airborne observations were complemented by on ground measurements of in situ concentrations of CH 4 and CO 2 by mobile vans (operated by AGH Krakow, IUP Heidelberg, 115 and Utrecht University as part of the MEMO2 3 activities), stationary and mobile CH 4 column observations by FTS (operated by DLR, Luther et al., 2019), and wind field observations by three stationary wind lidars in that region specifically deployed for CoMet (operated by DLR, Wildmann et al., 2020). For adequate flight planning and also interpretation of the collected data sets, various model support and weather forecast systems were provided (Nickl et al., 2020, Galkowski et al., in prep).
While those results are subject to other papers either published or in preparation, the focus of the study in hand is to 120 estimate the small scale CH 4 emissions of clusters of ventilation shafts by combining CH 4 column observations from the passive remote sensing MAMAP instrument and wind observations from the three wind lidar stations. MAMAP is a grating spectrometer, which records back scattered solar radiation from the ground while flying above the planetary boundary layer (PBL) in which the emission plumes are located. Spectra are recorded in the shortwave infrared (SWIR) region between 1590 and 1690 nm with a spectral resolution (full width at half maximum, FWHM) of around 0.9 nm. Column information of CH 4 125 is then extracted using absorption spectroscopy. The retrieved CH 4 column anomalies have a single-measurement precision of better than 0.4 % relative to the background column. They have, for instance, been used to estimate CH 4 emissions from two coal mine ventilation shafts near Ibbenbueren, Germany (Krings et al., 2013) and from landfills in Los Angeles, USA (Krautwurst et al., 2017). The precision of the instrument is therefore sufficient to investigate CH 4 emissions in the more complex region of the USCB. This was also investigated by means of Observation System Simulation Experiments (OSSEs; 130 for details, see e.g, Krautwurst et al., 2017;Gerilowski et al., 2015) before the campaign, which simulate observed CH 4 column anomalies based on expected emissions under various wind conditions and also considering instrumental characteristics as e.g. the measurement precision.
The wind information required for the flux estimates is derived from the three wind lidar systems (Leosphere Windcube 200S), which were deployed at three different locations in the USCB as shown in Fig. 2. They measure the vertically resolved 135 wind field at the location of the wind lidar. Data are available as 30 minute averages in 50 m altitude bins. Additionally, the eddy dissipation rate is computed, from which the boundary layer height is estimated. The uncertainty of the wind speed is given with 0.2 m s −1 (Luther et al., 2019). Further details on the measurement principle and analysis are found in Luther et al.
noon. Usual flight duration over the mining area was two to three hours each. Wind lidar observations were continuously collected throughout the entire campaign period.

Retrieval of column anomalies and inversion to emissions
The following sections introduce the applied algorithm to extract the desired CH 4 column information from the measured MAMAP spectra, how these CH 4 columns are inverted to cross-sectional fluxes, and how the wind, which is needed for any 145 flux estimate, is computed. Additionally, potential error sources of the column anomalies, the wind, and the final cross-sectional fluxes are presented.

CH 4 column anomalies
During a measurement flight, the MAMAP instrument typically probes the air column below the aircraft while flying above the PBL downwind of potential emission sources. The spectra collected in this way contain the absorption features of CH 4 (and 150 also CO 2 ), whose strengths depend on the amount of those gases in the atmosphere. From these features, the CH 4 column anomalies are retrieved using the Weighting Function Modified Differential Optical Absorption Spectroscopy (WFM-DOAS) algorithm and the CH 4 over CO 2 proxy method, which are described in detail in Krings et al. (2011) and in Sect. A1.1.
On average, the accuracy and precision of the retrieved CH 4 column anomalies are estimated to be around 0.10 % and 0.22 %, respectively, relative to the CH 4 background column for this investigated data set. The single-measurement precision 155 is directly computed from the scatter of the measured data after applying the retrieval described in Sect. A1.1 and analyzing only observations which are not influenced by a CH 4 plume. The accuracy considers the influence of the terrain, such as surface elevation and surface spectral reflectance, on the retrieved CH 4 column anomalies, which might not be entirely accounted for during the retrieval process. A more detailed discussion of the error budget is given in Sect. A1.2.

160
To describe the mass flow through a cross-section of column measurements, not only trace gas anomalies, but also wind information, are required. Ideally, the wind field is measured inside or near the emission plume simultaneously to the trace gas observations. This can be achieved for airborne in situ measurements, if the aircraft is equipped with the adequate instrumentation. In this case, trace gas and wind information are directly measured along the flight track while crossing the plume (e.g., Fiehn et al., 2020;Pitt et al., 2019;Ren et al., 2019;Peischl et al., 2018;Gordon et al., 2015). Since the MAMAP instrument 165 needs to fly above the PBL, alternative sampling strategies for the wind field have been investigated in the past. This included, for instance, utilizing 3D wind fields from numerical weather prediction models , or splitting the measurement flight into two parts, where during the first part the trace gas column observations are collected by the remote sensing instrumentation flying above the PBL, and during the second part the wind information is collected within the PBL inside and outside of the plume (Krautwurst et al., 2017).
For the current study, observations from the three wind lidar stations are used to estimate the prevailing winds at the location and time of the MAMAP measurement, because they are available during all six flights. As an example, Fig. 3 shows the temporal evolution of the wind speed at all three stations on 7 June. The locations of lidar stations are inlcuded in Fig. 2. approach has also been chosen by Luther et al. (2019). This wind speed and direction are then used in the cross-sectional flux 180 calculation described in the next section. As measure for the wind error, the 1-σ standard deviation over all values as used in the average is utilized to also take into account the uncertainty caused by the variability over the basin and in time. Furthermore, this approach also covers vertical variations due to a possible wind shear or vertically unevenly distributed emissions. This leads in general to errors of~1 m s −1 and~10 • for wind speed and direction, respectively, which exceed the measurement uncertainty of the observations (0.2 m s −1 , Sect. 2.1) significantly. Additionally, a comparison between one of the wind lidar instruments and 185 ultrasonic anemometers indicate biases of smaller than 0.5 m s −1 and of around 10 • for wind speed and direction, respectively . We assume that these errors are covered by our uncertainty computation because it is estimated from the standard deviation of observations from all three wind lidars, in most cases.
To also get a better overview of the large scale wind situation in the entire basin on each day, 2D wind fields are extracted from 3D WRF v3.9.1.1 reanalysis data simulations (detailed model description will be given in a separate study in the current 190 special issue, see Galkowski et al., in prep). These fields are provided at a spatial resolution of 2 × 2 km 2 with 15 vertical levels below 3 km altitude, and high temporal resolution with instantaneous values every minute. They are used to identify unfavourable wind conditions, which would prohibit a reliable flux estimate, not obvious in the wind lidar measurements alone. To allow for a better comparison between model and wind lidars, the WRF data are averaged within the boundary layer, as calculated by the modelled PBL parametrization scheme.

195
Additionally, both data sets are averaged over the entire time of a measurement flight, which is on the order of two to three hours. The results are presented in Sect. 3.1.

Flux inversion
The method to derive cross-sectional fluxes has been used widely to quantify trace gas emissions not only from airborne in situ measurements (e.g., Krautwurst et al., 2017;Peischl et al., 2016;Lavoie et al., 2015;Cambaliza 200 et al., 2015;Turnbull et al., 2011;White et al., 1976) but also remote sensing column observations (e.g., Krings et al., 2018;Amediek et al., 2017;Krautwurst et al., 2017;Frankenberg et al., 2016;Krings et al., 2013) and column observations by satellite instruments (e.g., Reuter et al., 2019). The mass flow through a flight track of trace gas column observations driven by the local wind field is given by The dominant error sources of the computed flux F track arise from uncertainties or errors in the estimated wind speed (~1 m s −1 ) and wind direction (~10 • ), which can increase to up to 2 m s −1 and 40 • for specific days, the choice of the background observations, and the retrieved CH 4 column anomalies expressed as column anomaly precision and accuracy (~0.22 % and~0.10 %, respectively, as discussed in Sect. A1.2). The error δF track of the flux F track of one track is computed by root sum squaring these error sources: where δF u , δF α , δF bg , δF col-pr , δF col-ac are the errors arising from the wind speed, from the wind direction, from the choice of the background observations, and the column anomaly precision and accuracy in t CH 4 hr −1 . δF u and δF col-ac are computed by Gaussian error propagation of Eq. 1. δF col-pr (n) is also calculated by Gaussian error propagation taking into account its random nature by dividing the value for the estimated precision by √ n, where n is the number of observations within the plume.

220
The wind direction modifies the flux via a cosine term and its error can thus not easily be calculated by error propagation.
Consequently, we estimate δF α by varying the prevailing wind direction by its estimated error on a specific day and use the difference to the 'true' flux F track as error estimate. The choice of the background observations is investigated by randomly selecting two-thirds of the observations from either side of the plume and computing a new background for one flight track, which is used to calculate a new flux estimate. This is done for up to 500 combinations for each side. The 1-σ standard deviation 225 of those fluxes is then used to estimate the error δF bg .
An additional uncertainty source originates from variability in the atmospheric transport caused by turbulence and leading to varying cross-sectional fluxes if estimated from multiple overflights of the same source, which cannot be explained by source variability alone (e.g., Krautwurst et al., 2017;Matheou and Bowman, 2016, Wolff et al., in prep.). This variability, expressed as δF atm , is estimated as the 1-σ standard deviation (STD) from the overflights themselves and is then combined with the 230 error δF tracks , resulting from the errors of the single tracks, to estimate the standard error (1-σ) of the averaged flux if multiple overflights of the same source(s) are available: with where m is the number of flight tracks.

Investigated mines and shafts
To reliably measure emissions and assign them to small clusters of coal mine ventilation shafts, MAMAP observations need to 240 be collected in relatively close vicinity to the respective shafts. An adequate maximum distance depends, for example, on the Wieczorek-a 1 10.6 14.7 Wieczorek-b 1 5.0 complexity of the investigated area, the density of the occurring sources, and the position of the flight tracks on the different flight days. In general, the further away observations are acquired, the more complicated it is to disentangle observed fluxes from individual or groups of shafts due to potential mixing of the different plumes along their way. However, setting the focus to small clusters and primarily analyzing tracks closest to the shafts also limits the number of available observations.

245
Consequently, as compromise and for the purpose of this study, we only analyze flight tracks which are within~15 km of ventilation shafts. This also reduces the probability of interference of large CO 2 sources, which would have, depending on position, an adverse effect on the retrieved CH 4 column anomalies (compare to Sect. A1.1). The drawback of this approach is that most clusters of shafts releasing CH 4 were only observed once during each flight. However, as observed in other studies and as discussed in Sect. 2.2.3, fluxes estimated from multiple overflights can vary significantly as a result of turbulence, which 250 leads to CH 4 column maxima and minima. To address this issue, we only try to separate and estimate emissions from clusters of ventilation shafts when at least 2 overflights are available. Additionally, the plume and background regions must be clearly distinguishable as they are selected by visual inspection. This is not the case, for example, if the flight track passes over lakes, which have a very low reflectivity in the SWIR spectral region and thus poor signal to noise ratio. Consequently observations acquired over water bodies are thus not considered in this study.
Four clusters of ventilation shafts, illustrated in Fig. 2, were identified based on the above mentioned boundary conditions.
The clusters are labelled as 'cluster a' to 'cluster d' starting in the north and counting counter-clockwise. They comprise~40 % of all CH 4 mining emissions in the region according to the CoMetv3 inventory. The annual CH 4 emissions, the name of the mines, and the number of shafts are listed in Table 1. Depending on the position of the actual flight track, which depends on the prevailing wind direction on a specific day and cloud cover and the Air Traffic Control (ATC) restrictions in that region, not 260 all shafts of a cluster could be investigated during each flight. This led to the investigation of smaller sub-clusters, as discussed further below (Sect. 3.2).

CoMetv3 emission inventory
The core of the CoMetv3 inventory comprises annual CH 4 emissions, primarily based on data from the European Pollutant Release and Transfer Register (E-PRTR) and the Polish Wyższy Urząd Górniczy (WUG, Higher Mining Administration).

265
As in both E-PRTR and the WUG report the data is reported at the facility level, these had to be disaggregated to individual ventilation shafts for this study. Thus, we equally divided annual emissions to each shaft of the reporting mine, as more detailed data is not readily available. Such disaggregation can lead to large uncertainties, as emissions are varying due to changes in excavation activities over the year, connected to changes in mining fronts, variations in airflow driven by safety considerations (including methane concentration below ground) etc. The resulting CH 4 emissions for 2018 are displayed for the different 270 shafts in Fig. 2 and listed for the investigated clusters in Table 1 for the years 2016 and 2018.
However, minutely or hourly resolved emissions measured directly at the investigated shaft during the time of investigation should be optimally used for comparison to high-resolution measurements like those analyzed here. Therefore, for a subset of coal mines that agreed to provide such information, we derived hourly emissions for each shaft within the CoMet project. This data is based on concentrations and airflows measured directly upstream of the outlet of the ventilation shaft. The uncertainty of 275 these hourly emissions is estimated to be 20 % of the reported value due to lacking information about the calibration procedures and instrument precision levels. A detailed comparison between the measured hourly resolved emissions, the reported annual emissions, and the observed fluxes derived from MAMAP data gives Sect. 4.

Results
This section presents the results based on the methods and data described in Sect. 2. Initially, more and less favourable flight 280 days are identified using PBL-averaged wind fields from WRF and the wind lidar data. Secondly, the cross-sectional fluxes for one cluster ('cluster b') are presented in detail but then summarized for the remaining clusters.

Wind situation over the basin
For all five measurement days (28, 29 May and 1, 6, 7 June 2018), observations from the wind lidar stations are available.  in the northern part and to the south in the southern part of the field (black arrows). Additionally, the winds estimated from the three wind lidars (white arrows) agree well in speed as well as in direction with the prediction of the model simulation. Similar situations occur for 28 May and 6 June, which also exhibited easterly flows (see Fig. B1).
The situation changed on 29 May (c). According to the WRF simulations, the wind speed is significantly lower in some parts 290 of the basin and more variable than on 7 June changing from an easterly flow in the middle of the basin to a south-easterly flow in the western and eastern basin. The low wind speed is also confirmed by the wind lidars observing winds of around 2 m s −1 .
Whereas winds from the western lidar (DLR85) appear to agree with the WRF simulations, those from the lidar in the east of the region (DLR86) observe significantly lower wind speeds than predicted by the model (no observations are available for the southern lidar, DLR89, on that day). On 1 June, the wind lidars observe a strong gradient in wind speed from west to east with 295 winds blowing from the south-south-east. This is also well captured by the WRF simulations.
Overall, the WRF model simulations support the observations by the wind lidars. Exceptions might occur during low wind conditions.
During low and variable wind conditions as occurring on 1 June in the south-western basin and also on 29 May, an accumulation or recirculation of the emitted CH 4 is not entirely excluded. If clusters having a small number of shafts are investigated 300 and observations are acquired in close vicinity to the shafts, this may be less problematic. Another limitation results from the cross-sectional flux method introduced in Sect. 2.2.3. The transport through the cross-section described by Eq.
(1) must be dominated by advection and not diffusion. For wind speeds less than 2 m s −1 , however, diffusion becomes more prominent (Sharan et al., 1996).

Estimated cross-sectional fluxes 305
The following sections present the estimated cross-sectional fluxes and their corresponding errors. 'Cluster b' was investigated during all flights and, consequently, this cluster of shafts has the most comprehensive collection of measurements. It is discussed in more detail below, followed by shorter discussions concerning the three other clusters.

Cluster b
'Cluster b' comprises 7 ventilation shafts from the three mines Pniówek (3) (Fig. 4, b) and 1 June (Fig. 4, c), as already identified in Sect. 3.1. The wind speeds at 'cluster b' as derived from the lidar stations were generally around 5 to 6 m s −1 s and dropped to around 2 m s −1 on 29 May and 1 June.   The dominant error source (Table 2) of the single fluxes is the wind speed (and for some tracks the wind direction) followed by the accuracy of the retrieval and the choice of the background observations. The single-measurement precision of the MAMAP instrument is mostly negligible. The error on the wind speed is usually between 0.5 and 1.2 m s −1 , leading to errors 345 on the estimated flux of around 10 % to 25 % assuming a wind speed of~5 m s −1 . However, for example, on 1 June the magnitude of the wind was small and variable and its error is larger than the absolute value of 1.8 m s −1 used for the flux estimate. This leads to an error of over 100 % on the single flux estimate and explains the large standard error of over 50 % on the averaged flux for the Pniówek shafts alone (sub-cluster 'P').

Clusters a, c, and d 350
For the remaining clusters, the retrieved CH 4 anomalies are shown in Figs. C1, C2, and C3, and the computed cross-sectional fluxes are listed in Table C1. with those from inventories is given in the next section.
An example, in which the investigation of all ventilation shafts of one cluster on specific days is not feasible, is given for 'cluster c'. The flight track is located downwind of four shafts belonging to Brzeszcze towards the west on 6 and 7 June (Fig.   C2). However, the plume of the northernmost shaft could not be quantitatively investigated because it was always located directly over an area covered by lakes, which do not allow for passive remote sensing observations. During the flight on the 1 365 June all four shafts were covered. However, in addition to the low wind speeds, only one overflight is available and, therefore, the flux is only listed as a matter of completeness.

Comparison to inventories
As the MAMAP measurements represent a "snapshot" of the emissions of small clusters of ventilation shafts for a short time interval, comparisons to annually resolved and/or coarsely gridded inventories should be performed carefully, and even then Detailed hourly emissions were not only collected for the Pniówek but also for the Zofiówka shafts of 'cluster b' (see Table   3). For the observations on 29 May and 1 June, where only the Pniówek shafts were investigated and low winds prevailed, 380 the measured averaged hourly emissions for the time of the overflights are 4.5 t CH 4 hr −1 (~34 % lower than the reported annual emissions). The observed averaged flux derived from MAMAP data is (7.0 ± 4.4) t CH 4 hr −1 . This flux is larger than the measured hourly emissions, however, it was recorded under low wind conditions and is only based on two overflights, both of which call for caution in its interpretation.
The measured averaged hourly emissions for the Pniówek and Zofiówka shafts, which were investigated on 6 and 7 June are 385 6.2 t CH 4 hr −1 , which is~36 % lower than the annually reported emissions. Although reasonable winds prevailed and 7 tracks were acquired in total, the averaged observed flux based on MAMAP observations is (9.2 ± 1.4) t CH 4 hr −1 and thus,~49 % larger than the measured hourly emissions. Additionally, the share of emissions between the three Pniówek shafts is at a ratio of about 5:2:1 on average during the time of the MAMAP observations as indicated by the measured hourly data. The mismatch between the observed fluxes and hourly emissions might be related to missing CH 4 sources which are not explicitly accounted 390 for in the hourly data. CH 4 is for example not only ventilated through the ventilation shafts, but also drained from excavations and transported to drainage stations in the area. Consequently, CH 4 is also released from the drainage system. Those emissions are included in the annually reported emissions but not in the measured hourly data. Additionally, some tracks might also be affected by the two Jastrzebie shafts which are faintly visible in Fig. 5   For 'cluster c', which consists of four shafts, the CoMetv3 inventory only provides a monthly mean value for a one month period between 14 May and 13 June in 2018 for the two high emitting shafts of Brzeszcze-a but no hourly resolved data.
The emissions of these shafts are given as 1.9 and 1.7 t CH 4 hr −1 , which are~35 % lower than their reported annual value of 2 × 2.7 t CH 4 hr −1 (Table 1). For the two remaining lower emitting shafts, only the annual emissions of 2 × 0.5 t CH 4 hr −1 are available. The investigated sub-cluster 'B2' of 'cluster c' covers one Brzeszcze-a and the two Brzeszcze-b shafts, resulting in 405 hourly emissions of 2.8 t CH 4 hr −1 , which agrees very well with the observed averaged flux of (2.9 ± 0.5) t CH 4 hr −1 (Table   C1).
For the two remaining 'clusters a' and 'd', only the annual emissions are available. For 'cluster a', there is good agreement for the sub-cluster 'H', only observing two Halemba shafts (1.0 vs. 0.9 t CH 4 hr −1 , Table C1). However, for the sub-cluster 'HS', which includes twoŚląsk shafts, the observed averaged flux is larger than the reported annual value by a factor of three.

445
The calculation of the cross-sectional flux (Eq. (1)) implies that a good wind estimate is as important as precise CH 4 column anomalies. In the presented study, winds were derived from three wind lidar stations deployed in the USCB. Although the prevailing wind at a specific cluster was interpolated from these stations, the wind direction agrees well with the observed location of CH 4 enhancements. Larger discrepancies occur only on days with low and variable winds. On the one hand, this might be attributed to missing wind observations at the southern lidar station on those days. On the other hand, a comparison 450 to WRF v3.9.1.1 model simulations revealed that on those days the wind speed and direction have the largest variability across the basin. We infer that the number of measurements by three stationary wind lidars does not reveal the full complexity of mixing and plume propagation in these conditions. However, modelled wind fields match the wind lidar observations for the remaining days with higher wind speeds. To reflect the effect of a variable wind field across the basin also in the final result, the error of the wind was estimated as 1-σ standard deviation of the observed winds at the three lidar stations. This additionally 455 captures wind shear and the lack of knowledge of the exact vertical distribution of the emissions within the boundary layer.
An important result of this study is the accurate separation of observed fluxes to specific ventilation shafts or clusters of ventilation shafts. As the MAMAP instrument observes the total atmospheric air column, measurements can also be acquired when the emission plume is not entirely vertically well-mixed within the PBL. This allows for observations closer to the emission source than it would be possible with airborne in situ instruments. To derive reliable fluxes, they generally need to 460 acquire concentration measurements further downwind of a source, where the emissions are well-mixed. This is at the expense of an increasing probability that plumes from different sources overlap, which complicates separation. To capture vertical inhomogeneities of emissions near the source by airborne in situ observations adequately, dense flight patterns, which are time consuming, need to be performed as, e.g., described in Conley et al. (2017). However, similar issues are also encountered for the single nadir measurements of MAMAP when moving to larger scales due to the large number of shafts in that region.

465
Additionally, on larger scales, emissions of unknown origin could potentially occur and complicate interpretation. To unambiguously assign measured enhancements to sources, imaging instruments observing multiple pixels across the flight in one time step and, thus, creating a 2 dimensional gapless map of the anomalies below the aircraft are needed. Examples are the AVIRIS-NG  and Mako (Tratt et al., 2014) airborne instruments, or the MAMAP 2D instrument, which will combine MAMAP's high spectral sampling, sensitivity and specificity with imaging capability, currently being 470 developed at the Institute of Environmental Physics (IUP), Bremen.
When evaluating MAMAP observations on the scales of clusters of shafts, one also needs to consider light path errors, which would lead to changes in the retrieved CH 4 column without any real change in its atmospheric concentration (compare to Sects. 2.2.1 and A1). To reduce the light path errors, the CH 4 over CO 2 proxy method was applied. This method is only valid if the atmospheric CO 2 background concentration remains constant during the flight i.e. there are no significant CO 2 475 sources in the area. On small scales, CO 2 sources can be excluded more reliably than on larger scales. Moving to larger scales, CO 2 emissions, for example from power plants, could alter the desired CH 4 anomalies. One solution is to investigate the influence of CO 2 inhomogeneities by means of other types of measurements like in situ data as done in Krautwurst et al. (2017). The preferred option is, however, to use another gas with constant atmospheric concentration for normalization, such as O 2 (Schneising et al., 2009;Frankenberg et al., 2006), and to become independent of a homogeneous CO 2 background. For respectively. In extreme cases, when wind speed and direction were low or variable, the magnitude of the error was similar to the magnitude for the retrieved emission. However, wind speeds were usually around 5 to 6 m s −1 , which appears to be a favourable magnitude for estimating reliable fluxes with magnitudes larger than 1 t CH 4 hr −1 . It is recommended that these conditions are targeted during flight planning for future campaigns if remote sensing instruments with a similar sensitivity as Although the 1D MAMAP remote sensing instrument succeeded in estimating emissions of multiple clusters of ventilation shafts, a further breakdown into individual shafts requires a substantial increase in observations. Imaging instruments measuring multiple ground scenes simultaneously during one time step will resolve this issue in the future.

Special issue statement.
This article is part of the special issue "CoMet: a mission to improve our understanding and to better quantify the carbon dioxide and methane cycles". It is not associated with a conference.

540
Appendix A A1 The WFM-DOAS retrieval

A1.1 Algorithm description
For the retrieval of the desired CH 4 column anomalies, the WFM-DOAS algorithm  is applied as introduced in Sect. 2.2.1. It uses simulated radiances, which are representative of the real atmosphere at the time and location of the observation and are compared to the measured spectra. Deviations between the two, which may occur due to enhanced methane in the measurement emitted by a ventilation shaft, are then captured by scaling weighting functions. A weighting function describes the change of radiance due to a change of a selected atmospheric parameter (e.g., changing atmospheric concentrations of CH 4 and CO 2 ).
To simulate a reliable background model, i.e. a spectrum which is representative for the real atmosphere, and corresponding 550 weighting functions, the model needs to be provided with several parameters that influence the simulated spectrum. In the case of the MAMAP instrument working between 1590 and 1690 nm, these are primarily vertical concentration profiles of CH 4 , CO 2 and also water vapour (H 2 O), complemented by pressure and temperature profiles. As backscattered solar radiation from the surface is measured, the spectrum is also influenced by the surface spectral reflectance and by scattering effects from aerosols in the atmosphere. Also geometrical parameters like flight altitude, surface elevation and solar zenith angle are taken 555 into account.
As these parameters change from flight to flight, they are adapted to the prevailing conditions and radiative transfer model (RTM) simulations are performed for each flight. Furthermore, a 2D look-up table approach is used to account for strong variations in the light path due to changing surface elevation and solar zenith angle along the flight track. The relevant input parameters are listed in Table A1. The radiances as well as the weighting functions, which are then used as input for the 560 WFM-DOAS retrieval, are calculated by the radiative transfer model SCIATRAN (Rozanov et al., 2014).
The retrieval yields profile scaling factors (PSFs) for the desired trace gas concentrations of CH 4 and CO 2 , from which the CH 4 column anomalies are then computed as follows: where ∆V CH4 is the CH 4 column anomaly in molec cm −1 used in the cross-sectional flux method (Eq. (1)), k is a conversion factor without units derived from averaging kernels and takes into account that the sensitivity below the aircraft is around twice as high than above, CH abs col 4 is the assumed background column of CH 4 in molec cm −1 , PSF CH4 and PSF CO2 are the retrieved profile scaling factors without units, and PSF ratio denotes a normalization process with observations from the local background.

570
The formulas including the different quantities are further discussed below.
The retrieved PSFs of CH 4 and CO 2 describe the relative change in CH 4 and CO 2 in the measured spectra compared to the simulated one. If the observation was acquired over a CH 4 emission plume, the PSF CH4 is >1 and the PSF CO2 remains 1. However, the PSFs are not only influenced by the respective trace gas concentrations in the atmosphere but also by light path changes resulting from, e.g., variations in flight altitude, surface elevation or enhanced scattering, not perfectly covered 575 by the RTM simulations. These light path errors affect the absorption behaviour of both gases in a similar way due to their spectral proximity and can, therefore, be significantly reduced by applying the CH 4 over CO 2 proxy method (Krings et al., f ) The surface is assumed as a Lambertian reflector with a constant and wavelength independent surface spectral reflectance in nadir direction of 0.18, which is a common value for mid latitude vegetation and also used in previous studies (e.g, Krings et al., 2011). 2013, 2011) denoted by Eq. (A2). The drawback of this method is, however, that strong CO 2 sources must not be located in the measurement area and the CO 2 concentration remains constant during one flight, which is true on smaller scales like single shafts or small clusters of shafts, but might be invalid if the entire USCB is investigated. Finally, the PSF ratios are normalized 580 by the local background (denoted by PSF ratio in Eq. (A1)) and corrected by the conversion factor k to get the desired CH 4 column anomalies needed for the cross-sectional flux method. The local background is defined similarly to how it has been done in other publications (e.g., Krings et al., 2018;Krautwurst et al., 2017;Frankenberg et al., 2016) as observations outside of a plume in its flanks and determined by visual inspection of each single track downwind of a potential source (cluster).

A1.2 Errors
Errors in the retrieval of the CH 4 column anomalies originate from the measurement noise of the instrument or the input parameters for the RTM simulations. The measurement noise is computed as single measurement precision relative to the background column directly from the scatter of the measured data. The retrieval described above is applied and the observations, which are not influenced by a CH 4 plume, are used. For the currently investigated data set, this has been estimated to be 0.22 % relative to the background column on average.

590
The sensitivity of the input parameters on the final CH 4 column anomaly is estimated by using synthetic spectra while varying the input parameters according to their typical variation during a flight as given in Table A2. As expected and already shown in earlier studies (e.g., Krings et al., 2011), the deviations in the fitted profile scaling factors easily reach some percent and, therefore, are on the same order of magnitude as those caused by actual emissions. As most of the deviations are related to light path errors, the applied proxy method reduces these deviations significantly. Most of the remaining effects are systematic 595 and constant along a flight track (e.g., a constant offset caused by wrongly assumed CO 2 or CH 4 background concentration, background temperature or background aerosol profiles), and are corrected by the normalization using observations outside of a plume. Parameters which may not be covered by the normalization process, but also do not fluctuate randomly along a flight track and therefore may not be entirely covered by the computed single measurement precision, are surface elevation and surface spectral reflectance. In a worst case scenario, part of the flight track is located over an especially bright surface or 600 over relatively high terrain (forest vs. rangeland) compared to the remaining track. In this study, the uncertainties originating from these two factors are therefore assumed to be uncorrelated and after accounting for the conversion factor k (~0.69), they potentially lead to a systematic offset of the retrieved CH 4 column anomaly of around 0.10 %.
In combination with the single measurement precision, they are considered in the column anomaly computation by Eq. (1).
Although the values in Table A2 are computed for the flight on 7 June, they are assumed to be valid also for the other days.   Table C1.   Table C1.   high-performance cluster Mistral, for data storage and analysis.
We also gratefully thank Jeremy Gordon who safely piloted the FUB Cessna during the different flights and the administration of the airport Katowice, who not only provided us with a parking space for the aircraft and gave us easy access to the hanger to service our measuring instruments during the campaign, but also took care of our physical well-being before and after the flights.