Quantification and assessment of methane emissions from offshore oil and gas facilities on the Norwegian Continental Shelf

. The oil and gas (O&G) sector is a significant source of methane (CH 4 ) emissions. Quantifying these emissions remains challenging, with many studies highlighting discrepancies between measurements and inventory-based estimates. In this study, we present CH 4 emission fluxes from 21 offshore O&G facilities collected in 10 O&G fields over two regions of the Norwegian Continental Shelf in 2019. Emissions of CH 4 derived from measurements during 13 aircraft surveys were found to range from 2.6 to 1200 t year -1 (with a mean of 211 t year -1 across all 21 facilities). Comparing this with aggregated operator-reported facility emissions for 2019, we found excellent agreement (within 1σ uncertainty), with mean aircraft-measured fluxes only 16% lower than those reported by operators. We also compared aircraft-derived fluxes with facility fluxes extracted from 30 a global gridded fossil fuel CH 4 emission inventory compiled for 2016. We found that the measured emissions were 42% larger than the inventory for the area covered by this study, for the 21 facilities surveyed (in aggregate). We interpret this large discrepancy not to reflect a systematic error in the operator-reported emissions, which agree with measurements, but rather the representivity of the global inventory due to the methodology used to construct it and the fact that the inventory was compiled for 2016 (and thus not representative of emissions in 2019). This highlights the need for timely and up-to-date inventories for 35 use in research and policy. The variable nature of CH 4 emissions from individual facilities requires knowledge of facility operational status during measurements for data to be useful in prioritizing targeted emission mitigation solutions. Future surveys of individual facilities would benefit from knowledge of facility operational status over time.woul may always require this. However, for Ffield-specific -aggregated emissions (and uncertainty statistics), as presented here for the Norwegian Sea, can be meaningfully estimated from intensive aircraft surveys. However, field-specific estimates cannot be reliably extrapolated to other production fields without their own tailored surveys, which would need to capture a range of facility designs, oil and gas production volumes, and facility ages. For year-on-year comparison to annually-updated inventories and regulatory emissions reporting, analogous annual surveys would be needed for meaningful top-down validation. our results show that an accurate estimate of total field-level emissions simply requires a sufficiently large and representative sample of facilities, to yield meaningful comparisons and flux statistics, irrespective study demonstrates the importance and accuracy of detailed, facility-level emission accounting and reporting by operators and the use of airborne measurement approaches to validate bottom-up accounting. This study involves direct comparisons of the measured CH 4 fluxes with those reported by facility operators and global emission inventory estimates. This requires temporal unit conversions of the measured data from g s -1 and kg h -1 to t year -1 in day-to-day variability observed in et al. (2021), unavailable at the time of this study. Using a scale factor, derived as a ratio between 2016 and 2019 total reported emissions data from the offshore fields (Norwegian Oil and Gas Association, 2021), we can proportionally scale 2016 inventory estimates to better represent 2019 when comparing measured emissions to the Scarpelli inventory. Repeating the analysis above, using the scaled Scarpelli inventory, we find that total measured emissions were 52% higher than the inventory for 2019. This further highlights the limitation of comparisons with global inventories and their Tier 1 approach, and shows that a better agreement can be observed when comparing with a more specific inventory (e.g. facility-level reported emissions). Therefore, the poorer agreement between the measured fluxes and the Scarpelli inventory can be interpreted to reflect the representivity of the inventory, due to its construction methodology and the fact that it was compiled for 2016 (and thus, is not representative of emissions in 2019), rather than a systematic error in the operator-reported emissions, which agree with the measured fluxes.

study demonstrates the importance and accuracy of detailed, facility-level emission accounting and reporting by operators and the use of airborne measurement approaches to validate bottom-up accounting.

Introduction
Concentrations of atmospheric methane (CH4) have been increasing since 1850, with particularly rapid annual growth rates of over 5 ppb yr -1 observed from 2014 to 2017 (Nisbet et al., 2019). With a radiative forcing of approximately 0.5 Wm -2 (Prather 50 et al., 2001) and a global warming potential 84 times that of CO2 over a 20-year period (Myhre et al., 2013), CH4 is the secondmost important greenhouse gas. CH4 emissions reduction and mitigation strategies could aid the attainment of climate targets set in the UNFCCC Paris Agreement (Nisbet et al., 2020). In order to inform and direct such efforts, an accurate understanding of the nature and magnitude of anthropogenic and natural sources of CH4 is essential.
Emissions from the oil and gas (O&G) sector are estimated to account for approximately 22% of global anthropogenic CH4 55 emissions (80 Tg year -1 ), though this remains highly uncertain, with estimates ranging from 68 to 92 Tg year -1 (Saunois et al., 2020). This can be partly attributed to the fact that O&G emissions are associated with a wide range of variable and episodic activities such as minor failures in engineering (Zavala-Araiza et al., 2017), flaring (combustion of the gas), controlled cold venting (discharge of unburned gases into the atmosphere) and other fugitive processes. Large but rare , unexpected leaks can also result in significant releases to the atmosphere (Ryerson et al., 2012Lee et al., 2018).

60
There have been limited numbers of studies focussed on emissions from offshore O&G production, relative to onsho re facilities (EIA, 2016a). The current quantification of emissions from offshore facilities therefore often relies on bottom-up approaches that use activity data and emission factors to derive emissions from a sub-set of sources, and extrapolation to estimate a total emission. However, emission factor calculations rely on representative knowledge of all emission sources, with the potential for systematic error. The International Energy Agency (IEA) Methane Tracker bottom-up estimate of the offshore share of global O&G related CH4 emissions is 20% [IEA, 2021]. Top-down emission estimates, such as direct measurements of atmospheric mixing ratios downwind of a source or group of sources, can help to improve bottom-up inventory estimates, which in turn can more meaningfully inform emission mitigation and climate policy. However, the relatively small number of studies on offshore emissions means that there has been little independent data to validate reported emissions. The studies that have taken place (for both onshore and offshore facilities) have consistently reported inventory 70 underestimates of CH4 and non-methane volatile organic compounds (NMVOCs) from O&G extraction (Xiao et al., 2008;Pétron et al., 2012;Gorchov Negron et al., 2020).
Comparisons with the USA Environmental Protection Agency greenhouse gas inventory showed that measured CH4 emissions were consistent for deep water but were a factor of two higher for shallow water facilities. Gorchov Negron et al. (2020) attributed this discrepancy to incomplete platform counts and discrepancies in the emission factors used in the inventory. In contrast, Zavala-Araiza et al. (2021) reported airborne measurements of CH4 emissions from offshore facilities in the Sureste 90 Basin, Mexico, which were found to be an order of magnitude lower than the Mexican greenhouse gas inventory.
As part of the United Nations Climate and Clean Air Coalition (CCAC) objective to quantify global CH4 emissions from oil and gas facilities, this study quantifies CH4 emissions from active O&G facilities on the Norwegian Continental Shelf using a Lagrangian mass balancing approach, as outlined in France et al. (2020). We report measurements of CH4 mixing ratios and fluxes sampled by two research aircraft downwind of 21 emitting facilities (out of 25 facilities surveyed) during 13 flights in 95 July and August 2019. The FLEXPART dispersion model was used to confirm the facility origin of sampled CH4 plumes.
Comparisons are made with operator-supplied annualised emissions and daily activity data from individual facilities in order to identify agreements or discrepancies, as well as to evaluate the efficacy of emissions reporting procedures within the areas of the Norwegian Continental Shelf covered by this study. In particular, comparison with daily reported activity data is key when variable or episodic sources are present. Emission estimates from an annualised global inventory (Scarpelli et al., 2020) 100 are also compared against measured data, to provide insight into the relative accuracy of a hierarchy of emissions accounting approaches.
In Sect. 2, we outline the details of the research aircraft, instrumentation and sampling strategies employed to survey emissions from O&G facilities on the Norwegian Continental Shelf. In Sect. 3, we describe the methods used to derive CH4 fluxes from individual facilities and the uncertainties implicit to the mass balance method. In Sect. 4, we discuss the calculated facility-105 level flux results and compare them to estimates from both a global inventory and operator-reported emissions and activity data. In Sect. 4, we also discuss the relevance of platform operational data and CH4 loss rate calculations and provide an outlook for continued research in this field.

Methods
In this section, we describe the flight surveys, the two aircraft platforms and instrumentation used to record measurements 110 discussed in Sect. 4 and also describe the use of dispersion modelling for source attribution. In Sect. 2.1, we describe a larger Bae-146 aircraft, which is a 4-engine passenger jet, modified as a flying laboratory. In Sect. 2.2, we describe the smaller, single engine Scientific Aviation Mooney aircraft.

FAAM Bae-146 research aircraft
Three flights (labelled C191, C193 and C197) were conducted by the UK's Facility for Airborne Atmospheric Measurement 115 (FAAM) Bae-146 atmospheric research aircraft. Information regarding the full aircraft scientific payload can be found in Palmer et al. (2018). Here, we summarise the details of the measurements relevant to this study.
Dry mole fractions of CO2 and CH4 were measured using a cavity-enhanced absorption spectrometer (Fast Greenhouse Gas Analyzer (FGGA); Los Gatos Research, USA), sampling air through a window-mounted rear-facing chemistry inlet. A full description of the operation of the FGGA, along with its modification for measurements onboard the FAAM aircraft is reported 120 by O' Shea et al. (2013). Raw data measured by the FGGA was corrected for small effects associated with water vapour dilution and spectroscopic error and calibrated using a three-point reference gas approach (high, low and target concentrations).
Calibrations were performed approximately hourly in -flight using calibration gas cylinders traceable to the WMO-X2007 scale (Tans et al. 2009) and WMO-X2004A scale (Dlugokencky et al., 2005) for CO2 and CH4, respectively. A target reference gas cylinder containing CH4 with a mole fraction approximately half-way between that of the hourly high-and-low calibrations 125 (equal to 1879.58 ppb) was also sampled hourly to quantify small sources of instrumental temporal drift and non-linearity and thereby to define measurement error. For a full description of the water vapour correction, calibration regime and measurement validation, see O'Shea et al. (2013). The representative one standard deviation calibration measurement uncertainties were 3.62 ppb for CH4 and 0.84 ppm for CO2 at a sample rate of 10 Hz. The limit of detection of high precision optical cavity instruments such as those used on all platforms in this study is well below the atmospheric background concentrations of CH 4 130 and CO2. Therefore, flux calculations are not limited by the precision of such instruments, but rather, by the environmental conditions at the time of the survey (see Sect. 3.1 and France et al. (2020) for a full discussion). Using the methods, platforms and instruments described in this paper, we estimate that a flux at the order of 2 kg h -1 represents a typical flux limit of detection for the range of conditions experienced in the fieldwork presented in this paper. However, as discussed, the true limit of detection will depend on the environmental conditions at the time of each survey.

135
Thermodynamic measurements were used to diagnose boundary layer mixing processes (Sect. 3). Ambient temperature was measured using a Rosemount 102AL sensor, which has an overall measurement uncertainty of ±0.3 K and 95% confidence.
Measurements of static air pressure were recorded from pitot tubes along the aircraft, with an accuracy of ±0.5 hPa.
Measurements of 3-dimensional wind were made using a nose-mounted five-hole probe system described by Brown et al. (1983), with a horizontal wind measurement uncertainty of < ±0.5 ms -1 . A full description of the meteorological and 140 thermodynamic instrumentation on board the FAAM aircraft can be found in Petersen and Renfrew (2009).

Scientific Aviation Mooney aircraft
The Scientific Aviation airborne measurement platform consists of a single engine propeller Mooney aircraft, outfitted with trace gas instrumentation. Air was continuously drawn through rearward-facing inlets installed on the aircraft wing and delivered to instruments in the aircraft cabin through stainless steel or Teflon tubing. CH4, CO2, and water vapour (H2O) were 145 measured by wavelength-scanned cavity ring-down spectroscopy in a Picarro model G2301-f detector. Precision of the G2301f CH4 measurement was < 1 ppb at 0.5 Hz. Ambient temperature and relative humidity were measured by a wing mounted Vaisala HMP60 probe. Aircraft position was measured using a Hemisphere high-precision differential GPS system and wind speed and direction were calculated according to Conley et al. (2014).

150
Over the course of this campaign, 21 offshore O&G facilities were surveyed by both aircraft plus repeats at some facilities (see details below, 34 surveys in total).

FAAM flights
The FAAM research aircraft conducted three regional flight surveys of two regions on the Norwegian Continental Shelf in July and August 2019, as part of the "Methane Observations and Yearly Assessments" (MOYA) project, funded jointly by the 155 Natural Environment Research Council (NERC) and the United Nations Environment Programme: Climate and Clean Air Coalition (UNEP CCAC). Figure 1 illustrates the two regions surveyed by the flights, along with O&G facilities in the area.
During each of the FAAM survey flights, emissions from between two and four facilities were detected. These facilities were identified as the sources of the observed CH4 plumes, using on-board wind direction and CH4 measurements, alongside the GPS coordinates of the facilities. The atmospheric dispersion model, FLEXPART, was also used to aid source 160 identification (Sect. 2.4).

165
The two regions were selected due to the large amount of oil and gas produced by facilities in each region, as seen in Figure   170 1. Flight C191, in region 2, sampled between 134 and 370 m above sea level (masl) with straight-and-level transects at 150 masl upwind of the facilities to provide a representative background measurement. Repeated reciprocal runs at varying altitudes within the boundary layer were carried out downwind of sources to detect and characterise emission plumes. Flights C193 and C197 were conducted in regions 1 and 2, respectively. These flights involved two sets of vertically stacked transects at various altitudes. In flight C193, these transects ranged from 124 to 606 masl with altitudes in flight C197 ranging from 103 175 to 308 masl. All three FAAM flights were conducted when the cloud base exceeded 300 masl, to ensure good visibility and allow for low altitude sampling. Across the three flights, the number of stacked transects ranged from 7 to 14, at between 50 and 100 m spacing. See Appendix Fig. B2 for an example altitude-longitude projection of the stacked legs flown in flight C193. All three FAAM flights were conducted when the cloud base exceeded 300 masl, to ensure good visibility and allow for low altitude sampling. There was no contact with the operators prior to or during the flights, where the operators were 180 informed about the measurements. However, operators were aware of our study, but not the time or the sampling pattern of the flights. Figure 1. (a) Location of offshore fields on the Norwegian Continental Shelf and FAAM aircraft survey patterns (as coloured tracks). Each symbol data point represents an offshore field, coloured by extraction product type (oil, gas, condensate, or mixed) (b) Map of the FAAM flight tracks and locations of active O&G facilities in the two target regions. Each circle data point represents a distinct facility, sized and coloured according to the reported annual O&G production in (Norwegian Petroleum Directorate, 2021, with the bold, opaque circles denoting the facilities surveyed in this study. For ease of illustration, the size of each point is also scaled for the relative oil and gas production. However, the colour legend reflects the annual oil and gas production.

Scientific Aviation flights
Concentric closed flight laps were flown around each target site (individual facility), beginning at the lowest safe flight altitude 185 (20 to 190 masl) to an altitude exceeding the observed maximum emission plume height (typically 100 to 800 masl), creating a virtual sampling cylinder incorporating both upwind background and downwind plume measurement. The number of laps varied for each facility surveyed, typically ranging between 5 and 25. See Appendix Fig. B3 for an example plot of one of these surveys. The highest altitude flown for each site was determined by the absence of significant upwind/downwind variability in the trace gas signal measured onboard the aircraft (i.e., no downwind CH4 enhancements were observed). The 190 downwind lateral distance at which the plume was intercepted by the aircraft was typically 1-2 km.
The measurement sites were selected based on proximity to Bergen Airport, Norway, with facilities within approximately 200 km being investigated. Operators of target sites were informed of measurements on the common frequency for the local area during the flight itself. All airborne measurements were conducted under visual flight rules (VFR) flight conditions, meaning the aircraft was not flying in clouds, fog, or low-visibility areas. This was done to ensure that a safe flying distance 195 was maintained between the measured facilities and the sea surface.

Annualised CH4 emission & activity data from platform operators
In Norway, facility-level reporting of offshore O&G CH4 emissions is based on calculations at the source-level using recommended guidelines (Norwegian Oil and Gas Association, 2018a), the results of which are then published (Norwegian 245 Oil and Gas Association, 2021). In this study, an inventory of existing O&G related facilities in the study area as well as activity data including O&G production statistics and facility functions were obtained from public data sources (www.norskoljeoggass.no and www.norskeutslipp.no). Additional data related to temporary facility activities such as flaring status or compressor ramp up for the days of the aerial surveys were provided via direct communication with the respective operators. Operators of facilities on the Norwegian Continental Shelf are required to submit annual CH4 250 emissions data to the Norwegian Environment Agency annuallyevery year. The CH4 emissions are reported for individual sources and sub-sources (e.g. primary vent seals for centrifugal compressors and incomplete combustion in flares). The basis for the reporting is a project led by the Norwegian Environment Agency between 2014 and 2016, which focuses on direct CH 4 emissions from O&G production activities on the Norwegian Continental Shelf (Husdal et al., 2019). All installations were subject to a detailed mapping of all potential sources of direct CH4 emissions, and updated methodologies for quantifying 255 emissions at the source/sub-source level were established based on best available techniques. The industry was an active participant in the project, and detailed recommended guidelines for emission and discharge reporting were established (Norwegian Oil and Gas Association, 2019). This was followed by a handbook for quantifying direct CH4 and NMVOC emissions (Norwegian Oil and Gas Association, 2018b) and a guideline for the quantification of small leaks and fugitive emissions (Norwegian Oil and Gas Association, 2021b). The CH4 reporting methodology on the Norwegian Continental Shelf 260 is amongst the most advanced in the O&G industry globally, as each individual CH4 emission source/sub-source is configured at each installation (i.e. if gas is recycled, flared or cold vented). The detailed reporting associated with each facility is publicly available (Norwegian Oil and Gas Association, 2021c). This level of reporting is similar to Tier 3 IPCC guidelines (see Sect. 2.5.2). However, it should be noted that different countries and/or operators are likely to use different reporting procedures.

265
Measured CH4 emissions from individual facilities were compared with a regional sample of a global, gridded inventory of CH4 emissions from oil, gas and coal exploitation with a resolution of 0.1° by 0.1° for the year 2016 (Scarpelli et al., 2020).
The gridded inventory resolves contributions from individual subsectors (exploration, production, transport, refining) and from specific processes (flaring, venting, leakage). National emissions for each of these subsectors and processes were routinely compiled from UNFCCC national reported emissions using IPCC Tier 1 methods (IPCC, 2006). Such methods apply default 270 emission factors (not country-specific) and activity data which are limited to national O&G activity statistics. These national emissions are then spatially allocated on the inventory's 0.1° by 0.1° grid across specific O&G infrastructure, in order to derive spatially aggregated emission estimates for infrastructure in each grid cell. The inventory therefore acts as a spatially downscaled representation of these UNFCCC reports. Higher tier IPCC approaches are assumed to be much more rigorous and detailed. For example, Tier 2 approaches use country-specific emission factors. Tier 3 approaches apply a rigorous bottom-275 up assessment of emissions by primary source type (venting, flaring) using data reported by individual facilities (IPCC, 2006). This is a much more detailed and extensive process for compiling emissions. However, not all nations or facilities collect or report such data, meaning that it would not be an effective or consistent way to derive emissions for a global inventory . As discussed in Sect. 2.5.1, facility-level reporting of offshore O&G CH4 emissions in Norway is based on calculations at the source-level using recommended guidelines (Norwegian Oil and Gas Association, 2018). In this context, the comparisons 280 made in this study represent a comparison with a spatially downscaled estimation approach (Scarpelli et al. (2020) inventory) and the more detailed quantification approach used by O&G facility operators (facility-level reports).
Annualised gridded emission fields for O&G platforms for the year 2016 were downloaded from the Harvard Dataverse (Scarpelli et al., 2019). Equivalent inventory data for 2019 wereas not available at the time of the study. This is often a problem for inventory comparisons, as some inventories are not updated in real time, which can impact the accuracy of comparisons if 285 changes in infrastructure may be expected in the intervening time. We include the comparison here as an illustration of this challenge. CH4 emissions associated with the platforms of interest were extracted, using their geographical coordinates to identify the corresponding grid cell and CH4 emission in the inventory.

Flux analysis methodology
In this section, we describe the flux quantification method applied to sampling from the FAAM and Mooney aircraft surveys 290 and describe the quantification of flux uncertainty.

Aircraft mass balance
Fluxes can be quantified using mass balance approaches. For such approaches to be feasible, observations are typically made upwind of the source region, to establish concentrations in a background location. Downwind observations are then conducted, allowing the determination of the net enhancement attributed to the source region. Lagrangian mass balance flux quantification 295 typically requires meteorological conditions where the wind field can be assumed (and measured) to be relatively invariant over the spatial scales of plume sampling for a target emitter (Cambaliza et al., 2014;Pitt et al., 2019;Fiehn et al., 2020).
Often, it is assumed that the plume is vertically well-mixed within some layer (usually the planetary boundary layer). The vertical mixing assumption also requires that measurements are taken sufficiently downwind of the emission source so as to have had time to fully mix. The aircraft mass balance approach used in this study has been used to derive fluxes of trace gases 300 from large area sources, such as agriculture, oil and gas fields and cities (e.g. White et al., 1976;Wratt et al., 2001;O'Shea et Formatted: Subscript al., 2014;Peischl et al., 2016;Pitt et al., 2019), but has also been used for individual O&G facilities (e.g. Lee et al., 2018;Guha et al., 2020).

FAAM flights
The emission fluxes presented in Sect. 4 were calculated from the FAAM survey flight data using Eq. (1): where, F (g s -1 ) is the flux for the emission source, A and B are the horizontal boundaries of the plume, zmax is the maximum plume height, Cij is the dry mole fraction of CH4 at each point in the plume, C0 is the representative background dry mole fraction of CH4, nair is the molar air density and ⊥ is the wind speed perpendicular to the reference measurement sampling 310 plane. For the flux calculations in this study, the atmosphere was divided into discrete vertical layers, based on the mean altitudes of aircraft transects for each facility survey. The mean concentrations within each observed CH4 plume were used to calculate flux individually for each layer and summed across all layers to obtain total flux.
Representative background CH4 mole fractions were determined for each layer using the 50 neighbouring 10 Hz measurements 315 either side of the observed plume. The average CH4 enhancement above this background was calculated for each observed plume. The perpendicular wind speed was calculated as the average wind vector component perpendicular to each flight transect. Plume mixing altitude was calculated as the distance between the sea surface and either the point at which a plume was no longer observed in measured data, or the height of the mixed layer as diagnosed from FLEXPART forward modelling or the nearest available potential temperature profile measured by the aircraft. In the absence of a direct measurement of plume 320 mixing height, where the boundary layer height or FLEXPART model mixing was used to define the plume mixing height, the difference between the nearest altitude where a plume was measured, and the assumed mixing height, was used to define a quantifiable vertical mixing uncertainty used in flux error propagation (see Sect. 3.2). In summary, for surveys where the plume top could not be directly constrained by measurement, any assumed vertical mixing was conservatively accounted for within the quoted flux uncertainty reported in Sect. 4. 325

Scientific Aviation flights
A variant of the Lagrangian mass balance method, utilising Gauss' Theorem and suited to the orbital sampling conducted by the Mooney aircraft, was used to derive CH4 fluxes from the Scientific Aviation flight surveys. Gauss' theorem was used to estimate CH4 flux through the virtual cylinder created by flying concentric circles around an individual platform. This theorem equates the volume integral of the source (e.g. platform) to a surface integral of the trace mass flux which is normal to the 330 surface of a cylinder. The volume integral was converted to a surface integral, which was used to calculate the horizontal mass flow of CH4 across the cylinder's surface plane. All other flux parameters in Eq. 1 were calculated in the same way as for the FAAM flight surveys. A full description of this emission quantification method can be found in Conley et al. (2017).

335
The uncertainty in the measured flux was determined using a similar method to that used by O'Shea et al. (2014). This involves propagating the measured uncertainties associated with the individual terms in Eq. (1), including the uncertainty in the observed CH4 enhancement, the natural (measured) variability of the wind field, and any uncertainty in the plume mixing height. Instrumental uncertainties associated with the FGGA were calculated to be negligible in comparison to those associated with the wind field and plume mixing height but are implicitly accounted for within the measured variability (and hence 340 uncertainty) in the background concentration.
Non-correlated, random uncertainties (wind and background variability) were summed in quadrature and calculated as an uncertainty for each altitude layer. These were then summed for all altitude layers to derive an overall random uncertainty in the corresponding total flux. The systematic uncertainty in the plume mixing height (described in Sect. 3.1.1) was then added to the random error to obtain the total uncertainty in the flux reported for each facility. 345

Scientific Aviation flights
The uncertainties in emission flux (reported as a one standard deviation uncertainty) were calculated as follows, and analogous to those calculated for FAAM survey data. Firstly, the statistical (random) uncertainty in the wind field and the CH4 measurement from the Picarro instrument were summed in quadrature, in order to obtain uncertainty in the horizontal flux for each concentric lap. The horizontal fluxes were then binned in altitude layers, and the uncertainties of the horizontal fluxes in 350 that bin were summed in quadrature along with the standard deviation of the flux estimates for each layer. The uncertainties in each bin were added in quadrature to obtain the final error estimate for the total flux measurement for each individual survey.
Where multiple surveys were conducted over several days, this was taken into consideration when calculating the overall uncertainty for each facility. The relative error for each survey was calculated. These were then averaged to give a mean 355 uncertainty over all surveys for each facility. The mean relative uncertainty was then multiplied by the average CH4 flux, to obtain a mean-weighted uncertainty in the CH4 flux for each facility.

Results & discussion
In this section, we report the measured fluxes for each facility and compare with inventory and facility-level activity data.
Details about the observational data from the FAAM and Scientific Aviation flight surveys, and the application of the mass 360 balance approach can be found in Appendix B.

Measured flux uncertainties
Uncertainties in flux are a function of sampling density, background variability, wind conditions, as well as the instrumental uncertainty (France et al., 2020). Combined uncertainties associated with background and wind variability were observed to be less than 10% in the FAAM flight surveys of this case study. The largest source of flux uncertainty in the FAAM flight 365 surveys was found to be in the plume mixing height (typically accounting for more than 90%). As discussed in Sect. 3.1.1., this was calculated as either the height at which a plume (CH4 enhancement) was no longer observed downwind, or in the absence of a vertical measurement constraint, as the nearest available measured thermodynamic boundary layer height as a proxy for maximum possible mixing. The vertical plume was more constrained by the Scientific Aviation flight patterns due to the dense vertical sampling made possible by the more agile, smaller Mooney aircraft, reflected by the smaller flux 370 uncertainties in the Scientific Aviation surveys (see Table 1). There is also some uncertaintyHowever, there is also some additional uncertainty if the bottom of the plume cannot be sampled. This is captured in the uncertainties reported for all flights and represents an inherent limitation of all aircraft surveys. By way of forward guidance, an optimal sampling design (to minimize flux uncertainty) therefore involves repeated sampling at many altitudes around a target of interest, ensuring that the top of any plume is directly measured.

Flux comparisons with a global inventory and facility-level reported data
This study involves direct comparisons of the measured CH4 fluxes with those reported by facility operators and global emission inventory estimates. This requires temporal unit conversions of the measured data from g s -1 and kg h -1 to t year -1 .
Scaling in this way is clearly notlikely not to be a robust comparison as it cannot account for any variability in day-to-day facility operations throughout the year. Such day-to-day variability has also been observed and discussed in Tullos et al. (2021) Figure 4a with an inventory flux ~ 60 t yr -1 ), with measured CH4 emission fluxes over a factor 20 higher, whilst also noting that the measured flux uncertainty was high. However, considering the low R 2 value (0.02) in Figure 4a, we emphasise that the intercepts and gradients calculated in this regression analysis are not meaningful, due to the high variability of agreements amongst the individual facilities. We include the result here to make this valuable point, which is to say that comparisons of surveys with spatially downscaled inventories may have limited value 420 and require careful thought before drawing conclusions.
Downscaling of inventories can lead to significant discrepancies at the scale of oil and gas facilities, such as those studied here. The regression in Figure 4b does not include reported emissions of zero, as shown in Figure 4a, as the two regression lines were found to be essentially identical. These results show that on aggregate, with a sufficient number of surveys, measurements are able to replicate the facility-level reported emissions, whilst also confirming that facility-level reporting procedures can 440 provide accurate emission estimates for incorporation into inventories. Facility 2 (the outlier seen in Figure 4a, discussed above) shows good agreement between operator-reported emissions and measured data, suggesting that facility-level reported flux for facility 2 is much more accurate than that represented by the inventory, and therefore that the observed difference

475
Facilities 6 and 7 were surveyed separately by the aircraft. However, the reported emissions were grouped for the two facilities, which impedes our ability to directly compare to the reported flux for each facility separately. However, our observations show that facility 6 dominated emissions (400 t year -1 ), relative to facility 7 (9.6 t year -1 ). For individual facilities, there are notable large differences between the inventory estimates and the extrapolated measured emission fluxes, ranging from -41% (-88 t year -1 ) to 2200% (1200 t year -1 ) for facilities 5 and 2, respectively. This is expected to be associated with both the compilation with measured CH4 emissions found to be consistent for deep water but a factor of two higher for shallow water facilities.

490
Considering all facilities collectively, the measured fluxes were found to be 42% greater than the Scarpelli et al. (2020) emission inventory using Tier 1 methods. However, there is a much-improved agreement when comparing with the facilitylevel reported flux where measured fluxes are 16% lower than those reported. This aggregated comparison with facility-level reported data suggests that measurements and reported data agree within uncertainty, given a large enough sample size, and therefore we recommend that facility-level reporting is adopted more widely and used to compile more robust inventories of 495 CH4 emissions. As discussed earlier, the Scarpelli et al. (2020) inventory was compiled for 2016 as equivalent 2019 data were The relatively low absolute mean flux with a negative sign is an artifact of minor upwind CH4 contamination overwhelming the downwind CH4 enhancement. It is acknowledged that physical CH4 emissions from this facility cannot be negative. 545 c Facility was surveyed twice. Only one measured flux is reported, as upwind contamination invalidated the second measurement. d Both facilities were measured separately, but operator reports a combined estimate. e Facility is a subsea manifold station. A drilling vessel was drilling at the same location at the time of surveying. The operator reported that drilling was the main CH4 source of >99% of CH4 emissions and that CH4 emissions will only occur during 550 drilling. f Operator does not report emissions as this facility was reported inactive during 2019. Note that at the five facilities with repeat surveys on different days under normal primary operations (blue dots at facility IDs 4, 5, 6, 10, and 19 in Figure 5a), the average day-to-day variability in measured emissions for the same facility is 33% 585 (even after accounting for measurement uncertainties). That is, emissions at the same facility vary substantially over time, even on days when the operational status suggests continuous emissions. This implies that intermittency exists beyond the granularity (or the categories) of the level of reporting above. Nevertheless, as shown in Figure 5b, at the aggregate level of 19 facilities (34 surveys including repeats; number is slightly different from column 5 in Table 1 be key for prioritizing emission mitigation solutions, our results suggest that randomised but intensive field-specific surveys ensuring a sufficiently large and representative sample size remain the is key driver of sampling required to deliver for an 600 unbiased estimates of total emissions at the facility-level (repeat surveys) or the regional-level (multi-facility surveys)

The relevance of platform operational data and CH4 loss rate calculations
irrespective of operational status. While the cost of the surveys and the monetary and environmental benefits play a role in designing routine surveys, frequent surveys could ensure the most robust validation.

610
We further calculated CH4 loss rates, i.e., the measured, annualized CH4 emissions as a fraction of the marketed CH4 over the same period. This was conducted at the field-level (for which gas production data was were available; Norwegian Petroleum 615 Directorate, 2021), which includes between one and six individual facilities depending on the field. NOGA (2018) guidelines were used to approximate gas composition to convert total gas production (Norwegian Petroleum Directorate, 2021) to CH4 production. Measured CH4 loss rates range between 0.003% and 1.3%, thus spanning three orders of magnitude. This wide range in loss rates is largely driven by the equally wide range in gas production across the 10 fields, spanning four orders of magnitude. While there is no apparent correlation between absolute emission rates and leak rates, all four fields with loss rates 620 >0.1% each produce <0.15 billion Sm 3 , and all six fields with loss rates <0.1% each produce >0.5 billion Sm 3 . Thus, the very small loss rates <0.1% are largely explained by the large denominator (gas production volume). The gas production-weighted average loss rate for the 10 measured fields is 0.012%, but this value should not be considered representative of Norwegian offshore production, and it is very likely a conservative estimate for the full population of Norwegian sites. This is because the 10 measured fields in this study account for 48% of Norwegian gas production, but for only 12% of the total number of 625 producing fields (Norwegian Petroleum Directorate, 2021). In other words, measured fields in this study are strongly biased towards high gas production fields, which in turn explains the relatively small weighted average loss rate.

Outlook
In summary, these results act as . The first column includes only facilities 4, 5, 6, and 7 (representing "other" operations). The second column represents all other facilities (representing "normalprimary" operations). The third column represents all facilities collectively (magenta data point). Error bars represent 1σ uncertainties. * Facilities 6/7 and 13/14 were measured separately (and fluxes were added), but operator reported a combined estimate. Facility 9 is not included here because it was reported inactive during 2019 and measured CH4 emissions were negligible (see Table 1).
generic emission factors (used in Scarpelli et al., 2020). As outlined in Sect. 2.5.2., Tier 3 approaches are more rigorous and detailed, applying facility-level emission and activity data to calculate emissions. Results in this study show that there is better agreement between measured data and facility-level reported emissions than more generalized spatially downscaled inventory estimates, as expected. This result emphasizes the importance of facility-level emissions reporting in order to compile accurate national greenhouse gas inventories. This study exclusively considers offshore O&G facilities, adding to the findings from 635 previous work which found that spatially downscaled inventories may be significantly underestimating CH4 emissions (Gorchov Negron et al., 2020). However, other studies have also observed discrepancies in inventory estimates for onshore facilities (Zavala-Araiza et al., 2021), thus highlighting that Tier 1 inventories can be subject to very high inaccuracy across the O&G sector as a whole.
In this context, it is important that the availability of Tier 3 reported data is increased and more routinely required by 640 regulators and policymakers, and that such data is used to more meaningfully inform overall IPCC emissions scenarios, which may currently contain large underestimates for the offshore O&G sector where only spatially downscaled estimates are available. This represents both a global and field-specific challenge, as individual basins typically comprise multiple operators with potentially different performance standards and reporting frameworks. There is an urgent need for consistent, internationally agreed standards of best practice, if reported fluxes are to be of value in accurately understanding global 645 emissions from the O&G sector.
Additional measurements are needed to further test and validate global emission inventories. However, collecting such data is labour-intensive and, thus, expensive when using manned aircraft. Slow-moving, lightweight airborne measurement platforms, such as the Mooney aircraft are well-suited to this application, as they allow for much more focussed sampling, with the ability to densely sample in close proximity to individual O&G facilities. However, future improvements and 650 advances in satellite remote sensing could provide routine datasets to assess facility-level and area-emissions reporting, providing greater spatial and temporal coverage. However, flux measurements in the offshore environment via satellite remote sensing are challenging due to the use of less frequent glint mode observations (for passive near-infrared sensors). Other survey platforms, such as unmanned aerial vehicles (UAVs) also offer potential for CH4 flux quantification from numerous sources (e.g. Nathan et al., 2015;Yang et al., 2018;Allen et al., 2019;Shah et al., 2020;Shaw et al., 2021). For an interesting overview 655 of CH4 detection technologies for offshore environments, see Carbon Limits (2020). Frequent surveys could lead to measurement-based inventories, similar to that compiled by Gorchov Negron et al. (2020), as efforts continue to quantify emissions and seek to combat global climate change.

Conclusions
This study reports CH4 fluxes derived from airborne sampling campaigns on the Norwegian Continental Shelf. We conducted 660 13 flights using the FAAM and the Scientific Aviation research aircraft in July, August, and September 2019.
Measured CH4 emissions were found to range from 2.6 to 1200 t year -1 (with a mean of 211 t year -1 across all 21 facilities).
Mean measured fluxes (as an aggregate of the 21 facilities studied) were 16% lower than equivalent operator-reported data but agreed within 1σ uncertainty. Operator-reported emissions data contain an increased level of granularity concerning operational emissions and sources, better representing the reported facilities, relative to IPCC Tier 1 data used in the global inventory, 665 making it more closely analogous to IPCC Tier 3 methods. Measured CH4 emission loss rates (as a percentage of CH4 production) ranged from 0.003% to 1.3% across fieldsfacilities, with the wide range largely driven by field-level production volumes, with high-producing fields displaying proportionately lower emission rates. The aggregated comparison with facilitylevel reported data suggests that measurements and reported data agree within uncertainty., given a large enough sample size to aid statistical representation. With this in mind, we recommend that similar facility-level reporting is adopted more widely 670 by industry and that reported data areis used to more accurately compile national emissions inventories of CH4 relevant to IPCC emissions scenarios. This reporting approach is consistent with the voluntary commitment required for membership in the Oil and Gas Methane Partnership 2.0.
We also compared aircraft-derived fluxes with facility fluxes extracted from a global gridded fossil fuel CH4 emission inventory compiled, finding that the measured emissions were 42% larger than the inventory for the 21 facilities surveyed (in 675 aggregate). We interpret this large discrepancy not to reflect a systematic error in the operator-reported emissions, which agree with measurements, but rather the representivity of the global inventory due to the methodology used to construct it and the fact that the inventory was compiled for 2016 (and thus not representative of emissions in 2019). This highlights the need fo r timely and up-to-date inventories for use in research and policy.
This study also demonstrates the use of airborne sampling to obtain flux snapshots for comparison with inventories and 680 reported data. We found that measurement sampling density, especially in the vertical plane, can dominate sources of uncertainty in aircraft-based flux methods. To reduce uncertainty in flux calculations further using measurement-based approaches, we recommend the use of measurement platforms with a high degree of manoeuvrability.  775 Figure A2. Example snapshot of the calculated FLEXPART footprint for FAAM flight C193, used to aid source identification for measured CH4 enhancements. The flight track is coloured by measured CH4 mixing ratio (right colour bar; ppbv). The shaded area denotes the vertically integrated retroplume (left colour bar; s m 2 kg -1 ). The red triangles represent the locations of the nearby offshore O&G facilities. Figure B1 shows the flight track of FAAM flight C193, which took place on 30 th July 2019, along with nearby offshore O&G facilities. Figure B1a shows the measured wind speed and direction (shown as arrows) and Figure B1b shows the measured CH4 mole fraction. The FAAM data showed CH4 enhancements above background which typically lied between approximately 780 2 and 13 ppb. However, much larger enhancements were seen in region 2 overall, with a maximum of 99.3 ppb above background. A maximum of 8.9 ppb was observed in region 1. This was as expected as the facilities in region 2 were known to produce substantially more oil and gas compared to region 1, as seen in Figure 1 in Sect. 2.3 of the main paper.   https://www.eia.gov/todayinenergy/detail.php?id=28492, 2016a, last access: 5 January 2021.

900
Estimating CH4, CO2 and CO emissions from coal mining and industrial activities in the Upper Silesian Coal Basin using an