A dedicated flask sampling strategy developed for ICOS stations based on CO2 and CO measurements and STILT footprint modelling

In situ CO2 and CO measurements from five atmospheric ICOS (Integrated Carbon Observation System) stations have been analysed together with footprint model runs from the regional transport model STILT, to develop a dedicated strategy for flask sampling with an automated sampler. Flask sampling in ICOS has three different purposes: 1) Provide an independent quality control for in situ observations, 2) provide representative information on atmospheric components currently not monitored in situ at the stations, 3) collect samples for CO2 analysis that are significantly influenced by fossil 25 fuel CO2 (ffCO2) emission areas. Based on the existing data and experimental results obtained at the Heidelberg pilot station with a prototype flask sampler, we suggest that single flask samples should be collected regularly every third day around noon/afternoon from the highest level of a tower station. Air samples shall be collected over one hour with equal temporal weighting to obtain a true hourly mean. At all stations studied, more than 50 % of flasks to be collected around mid-day will likely be sampled during low ambient variability (<0.5 ppm standard deviation of one-minute values). Based on a first 30 application at the Hohenpeißenberg ICOS site, such flask data are principally suitable to detect CO2 concentration biases larger than 0.1 ppm with a one-sigma confidence level between flask and in situ observations from only 5 flask comparisons. In order to have a maximum chance to also sample ffCO2 emission areas, additional flasks need to be collected on all other days in the afternoon. Using the continuous in situ CO observations, the CO deviation from an estimated background value must be determined the day after each flask sampling and, depending on this offset, an automated decision must be made if a flask 35 shall be retained for CO2 analysis. It turned out that, based on existing data, ffCO2 events of more than 4-5 ppm will be very rare at all stations studied, particularly in summer. During the other seasons, events could be collected more frequently. The strategy developed in this project is currently being implemented at the ICOS stations.


Introduction
Since the pioneering work by Charles David Keeling who, already in the 1950s, has started monitoring with in situ 40 instrumentation continuous atmospheric carbon dioxide concentration at South Pole and Mauna Loa (Brown and Keeling, 1965), global coverage of continuous greenhouse gas (GHG) observations has considerably improved (https://ds.data.jma.go.jp/gmd/wdcgg/). However, there still exist large observational gaps in remote regions of the globe, which have partly been filled by regular flask sampling with subsequent GHG analysis in central laboratories. In the marine realm, if frequently conducted under certain conditions, data from flask sampling are often representative for monitoring the 45 large-scale distribution of GHGs in the atmosphere and, respectively, for estimating large-scale flux distributions by inverse modelling.
In the last decades, observational networks have been extended to the continents in order to closely monitor GHG concentrations and quantify terrestrial GHG fluxes. These, however, are more heterogeneous, temporally variable and often 50 less well represented by models than it is the case with modelled ocean fluxes (Friedlingstein et al., 2019). Terrestrial biospheric fluxes are prone to (regional) climatic variability and changes, and only continental observations provide the gateway for process understanding. Besides monitoring the terrestrial biosphere, measurements over continents are also conducted to observe man-made emissions, in particular from fossil fuel burning and agriculture. Due to their proximity to these highly variable sources and sinks, measurements over continents are best conducted continuously with in situ 55 instrumentation at high temporal resolution, in order to cover the variability and to fully represent the entire footprint of a station (e.g. Andrews et al., 2014). However, not all atmospheric trace components to be included in continental top-down GHGs budgeting can yet be precisely measured in situ at remote stations. The most prominent example is radiocarbon ( 14 C) in atmospheric CO2, a quantitative tracer to separate the fossil from the biospheric component in recently emitted CO2 from continental sources (e.g. Levin et al., 2003). Note that in the industrialised and highly populated areas of mid latitudes of the Northern Hemisphere, i.e. in North America, Eastern Asia or Europe, atmospheric signals from the biosphere and from fossil fuel sources are of same order (see Sect. 4.3.1). To correctly interpret absolute CO2 concentration variations in terms of source/sink attribution, separation of the fossil from the biogenic CO2 signal is, therefore, mandatory. Precise 14 CO2 measurements are, however, currently only possible in dedicated laboratories and on discrete samples.

65
In Europe the Integrated Carbon Observation System Research Infrastructure (ICOS RI) (https://www.icos-ri.eu/icos-researchinfrastructure) has been established to monitor GHGs concentrations and fluxes in the atmosphere, in various ecosystems and over the neighbouring ocean basins. ICOS atmosphere has set up a pan-European network of preferentially tall tower stations located at least 50 km away from industrialised and highly populated areas. The primary purpose is to monitor biogenic sources and sinks in Europe and their behaviour under changing climatic conditions. In addition to continuous CO2, CH4 and CO 70 observations, a subset of stations (Class 1 stations) perform two-week integrated sampling of CO2 for 14 C analysis. Class 1 stations are additionally equipped with an automated flask sampler, dedicated to three major objectives. Firstly, the collected flasks shall provide an independent quality control (QC) for the continuous in situ measurements of CO2, CH4, CO and further species mole fractions. Secondly, flasks shall be collected for analysis of additional trace components not measured in situ at the stations, and finally flasks with a potentially elevated fossil fuel CO2 component originating from anthropogenic sources 75 in the footprint of the stations shall be analysed for 14 CO2.
Dedicated sampling strategies had to be developed for ICOS, which best meet these three objectives, and which can be accomplished in the framework of the infrastructure and its available capabilities and resources. This includes technical constraints at the stations but also analysis capacity at the ICOS Central Analytical Laboratories, which are analysing all flask 80 samples in ICOS. The ICOS flask sampling strategy might change in future, e.g. when real-time GHG or footprint prediction tools become available.
In the current paper, we first give an introduction to the current ICOS atmospheric station network, and then present a strategy how to collect the flask samples for ICOS in a simple and cost-effective way. The sampling strategies have been developed 85 based on footprint model simulations with the regional transport model STILT (Lin et al., 2003), that was implemented at the ICOS Carbon Portal (https://www.icos-cp.eu/about-stilt) for ICOS station PIs and data users. First tests to develop a strategy for the quality control objective were performed at the ICOS pilot station in Heidelberg, where ICOS instrumentation and a prototype of the ICOS flask sampler have been installed, as well as at the Hohenpeißenberg station. The strategy was further tested for its feasibility based on the first years of continuous ICOS CO2 and CO observations available at the ICOS Carbon 90 Portal (ICOS RI, 2019).

The atmospheric station network and its Central Facilities
The ICOS atmospheric station network currently consists of 23 official stations (with 14 stations more to come), located in 12 countries and covering Europe from Scandinavia to Italy and from Great Britain to Czech Republic (see Fig. 1). The preferred 95 station type are tall tower sites, allowing vertical profile sampling at a minimum of three height levels up to at least 100m a.g.l.
Tall tower stations cover footprints of several 10 to 100 km distance from the sites (Gloor et al, 2001;Gerbig et al., 2006).
Although their representation in state-of-the-art regional atmospheric transport models is more difficult than in the case of tower observations, due to their often long history of GHG measurements also a number of mountain and coastal stations are part of the ICOS network. However, the flask sampling strategy developed here was designed specifically for the standard 100 ICOS tall tower stations. https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License.
All ICOS atmosphere stations are equipped with commercially available instruments measuring continuously at high temporal resolution CO2, CH4 and CO. Instruments are tested at the Atmospheric Thematic Centre (ATC), an ICOS Central Facility hosted by LSCE in Gif-sur-Yvette, France, before they are installed at the sites (Yver Kwok et al., 2015). The calibration gases 105 for the in situ measurements are prepared and calibrated at the Flask and Calibration Laboratory (FCL), which has been established at the Max Planck Institute for Biogeochemistry in Jena, Germany, as part of the ICOS Central Analytical Laboratories (CAL). This procedure guarantees best possible compatibility of observations within the ICOS atmospheric network and maintaining the link to the internationally accepted WMO calibration scales. In addition, FCL analyses the flasks with the focus on QC and additional species. Precise 14 CO2 analysis of integrated samples and selected flasks is conducted in 110 the second part of ICOS CAL at Heidelberg University, Institute of Environmental Physics, at the Karl Otto Münnich Central Radiocarbon Laboratory (CRL).
All raw data (level-0) are automatically transferred, on a daily basis, from the measurement sites to the ATC where they are converted to calibrated (level-1) concentration values (Hazan et al., 2016), based on regular on-site calibrations and FCL-115 assigned calibration values. For ongoing automatic data quality assurance of all measurements, the ATC has developed automatic procedures. Further software tools are made available by the ATC for mandatory validation of all raw data by the station PIs. These quality-assessed data form the basis of the hourly mean concentrations, which are finally released as level-2 data and made available to the user community at the ICOS Carbon Portal, hosted by Lund University, Sweden. For the latest data release see ICOS RI (2019). 120 Two station types are currently implemented in the ICOS atmospheric station network, Class 1 and Class 2. Class 1 stations are equipped with the complete instrumentation including integrated 14 CO2 and flask sampling. Class 2 stations perform only in situ continuous measurements of CO2, CH4 and CO (currently not mandatory), but with the same instrumentation and demand on data quality as for Class 1 stations. A detailed description of the specifications of instrumentation is given in the 125 ICOS Atmospheric Station Specification document (https://icos-atc.lsce.ipsl.fr/filebrowser/download/69422), which is regularly updated. To become official part of the ICOS atmospheric station network, stations have to undergo a two-step labelling process, which shall warrant their conformance with the ICOS station specifications, including smooth data transfer to the ATC as well as meeting ICOS data quality requirements.

Description of selected ICOS stations 130
For developing and testing our flask sampling strategy we selected five ICOS Class 1 tall tower stations in four different countries. A short description of these stations is given in the following.
Hyltemossa (HTM) is located few kilometres south of Perstorp, in north western Skåne, Sweden (56.098° N, 13.418° E, 115 m a.s.l.). It hosts a combined atmospheric and ecosystem station, labelled respectively as Class 1 and Class 2 site in its 135 respective networks. The site was established in 2014 in a 30 year old managed Norway spruce forest. Further than 600 m from the tower there is a mosaic consisting of forests, clear cuts and farm fields. Within the radius of 100 km, the elevation changes between 0-200 m a.s.l., while in the near vicinity of the tower elevation gently changes only by 35 m. In the larger footprint, the site is surrounded by cities, i.e. to the north Halmstad (70 km, 58 000 inhabitants), to the east Kristianstad (45 km, 36 000 inhabitants), to the south-west Lund (45 km, 111 000 inhabitants), Malmö (60km, 318 000 inhabitants) and 140 Copenhagen, DK (70 km, 1 990 000 inhabitants) and to the west Helsingborg (45 km, 124 000 inhabitants) and Helsingør, DK are installed in a container next to the tower. Air is being sampled for 5 min from each level where data for the first minute after switching are discarded. All inlet lines are continuously flushed with approx. 5 L min -1 . Meteorological sensors for air temperature, relative humidity, wind speed and direction are installed at every sampling height. For historical reasons, Gartow modelling was conducted for 344 m a.g.l. (not for the highest sampling level, at 341 m); this difference between measured and 160 modelled level is, however, not relevant for the comparisons presented in the context of this study. Hydrometeorological Institute with 30 years of practice in meteorology as well as air quality monitoring. Today, these two 165 stations form the National Atmospheric Observatory. Since the site is designed as a background station, the area is not significantly influenced by human activity. The tower is surrounded by fields and, at a greater distance, by forests and small villages (the closest in 1 km distance). There is a highway running north-east of the tower at approx. 6 km distance, however, the wind frequencies from north and east are 9 and 5 %, respectively. The closest towns, Pelhřimov, Vlašim and Humpolec, with 10 to 17 thousand inhabitants, are located approx. 20 km away from the station. As for industrial activity, a small wood-170 processing company is located 20 km to the west (which is the prevailing wind direction). Town Havlíčkův Brod with ca. Meteorological sensors (air temperature, relative humidity, wind speed and direction) are installed at every sampling height.

Atmospheric transport modelling for ICOS stations
A footprint simulation tool based on the regional atmospheric transport model STILT (Stochastic Time Inverted Lagrangian 210 Transport; Lin et al, 2003;Gerbig et al., 2006) was implemented at the ICOS Carbon Portal (https://www.icos-cp.eu/aboutstilt) as a service for ICOS station PIs and data users. The STILT model simulates atmospheric transport by following a particle ensemble released at the measurement site backward in time and calculates footprints that represent the sensitivity of tracer concentrations at this site to surface fluxes upstream. The footprints are mapped on a 1/12° latitude x 1/8° longitude grid and coupled to the EDGAR v4.3 emission inventory (Janssens-Meanhout et al., 2019) and the biosphere model VPRM (Mahadevan 215 et al., 2008) to simulate atmospheric CO2 and CO concentrations. These regional concentration components represent the influence from surface fluxes inside the model domain (covering the greater part of Europe). For CO2 the contributions from global fluxes are accounted for by using initial and lateral boundary conditions from the Jena CarboScope global analysed CO2 concentration fields (http://www.bgc-jena.mpg.de/CarboScope/s/s04oc_v4.3.3D.html), while for CO only regional contributions are evaluated in our study. 220

The automated ICOS flask sampler
The automated ICOS flask sampler was designed and constructed at the Max Planck Institute for Biogeochemistry (MPI-BGC), Jena, Germany, by the Flask and Calibration Laboratory (FCL) of the CAL to allow automated air sampling under highly standardized conditions. The sampler can hold up to 24 individual glass flasks (four drawers with sic flasks each) for separate air sampling events (Fig. 2, upper panel). The glass flasks can be individually replaced and sent to the CAL for 225 analysis. The glass flasks used within ICOS (three litre volume, product code ICOS3000 by Pfaudler Normag Systems GmbH, Germany) were developed according to ICOS' specific requirements based on well-proven designs (Sturm et al., 2004). Each https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License. flask has two valves, one at each end, that allow air exchange by flushing sample air through the flask. The flasks are attached with ½" clamp-ring connectors to the flask sampler. The flask valves with PCTFE sealed end-caps can be motor-driven opened and closed. 230 A sample is taken by flushing air through a flask at a constant over-pressure of 1.6 bar (absolute). Sampling at over-pressure increases the amount of available sample air for analysis and allows detecting flasks with leak problems. Flasks are pre-filled with 1.6 bar of dry ambient air with a well-known composition at the CAL to avoid concentration changes due to wall adsorption effects. The schematic sampler layout is depicted in Fig. 2 (lower panel). Incoming air is dried to a dew-point of 235 approx. -40° C by passing a cooled glass vessel where the exceeding humidity is frozen out. The glass vessel is placed in a silicon oil heat bath that is cooled for drying and heated for out-flushing the collected water to regenerate the trap. The drying unit is automated and consists of two independent inter-switchable drying branches that complement each other and allow a near interruption-free drying. The dryer design is inspired by an already existing system from Neubert et al. (2004). The incoming sample air is compressed with a pump (Air Dimensions J161-AF-HJ0). A mass-flow controller (MFC, Bronkhorst 240 F-201CV) between compressor and flasks allows to sample at pre-set flow rates, i.e. with a decreasing flow rate over time so that the sample represents a real average e.g. over one hour (Turnbull et al., 2012). The flask pressure during sampling (1.6 bar) is kept constant through a pressure regulator at the outlet of the flasks. An over-pressure valve set at (2.0 bar) behind the pump assures a constant flow rate through the intake line, independent of the flow rate through the mass flow controller.

245
In the case of ICOS we strive to sample real one-hour mean concentrations in 3-Litre flasks. The 1/t filling approach requires for this specific case a theoretical dynamic flowrate between 80 mL min -1 and infinity. In reality, the maximum flow rate of the selected flow controller is limited to 2 L min -1 . Therefore, the in situ measurement is averaged with the weighting function resulting from the real flow through the sampled flask. To overcome the flow limitations in the first minutes, the flask is purged for 30 minutes prior to sampling, assuring a complete air exchange. Average concentrations with the aimed uncertainty can 250 only be reached under sufficiently stable concentration conditions during sampling. For a hypothetical ambient CO2 variability of 1 ppm the upper limit of the associated CO2 flask sample concentration uncertainty was estimated to be in the order of 0.1 ppm.
With the current design of the flask sampler, technical restrictions do not allow parallel sampling of flask duplicates or 255 triplicates as a means for quality control e.g. based on flask pair agreement. The technical effort to allow exact parallel hourly averaged sampling is very high. Therefore, the ICOS Atmosphere Monitoring Station Assembly (MSA) decided to sample only single flasks. This seems appropriate because in the ICOS network the flask sampler is always collecting flasks in parallel to continuous measurements, and erroneously collected flasks or errors due to flask leakages can be detected when comparing results with the continuous data. Therefore, in contrast to general practice of duplicate flask sampling, in our network single 260 flask sampling seems to be sufficient to meet ICOS objectives. This has the additional advantage that single flask sampling allows more frequent sampling and thus a more representative coverage of the footprint of the stations. If true duplicate samples are required in the future, the flask sampler is designed to accommodate an additional mass flow controller to fulfil this task.

Aims and technical constraints of ICOS flask sampling 270
As briefly outlined above, there are three main aims for regular flask sampling at ICOS stations: 1. Flask results are used for comparison with in situ observations (i.e. CO2, CH4, CO, (N2O)). This comparison provides an ongoing quality control of the in situ measurement system, including the intake lines.
2. Flasks are analysed for components not measured continuously at the station, such as SF6 or H2, but also stable isotopes of CO2 or O2/N2 ratio. The aim is here to monitor large-scale representative concentration levels of these components, 275 which allow estimating their continental fluxes with help of inverse modelling.
3. A subset of flasks are analysed for 14 C in CO2 to allow determining the atmospheric fossil fuel CO2 component (ffCO2) and with help of these data and inverse modelling to estimate the continental fossil fuel CO2 source strength of the sampled areas.
To meet aims 1 and 2 flask sampling during well-mixed meteorological conditions is required and the sampled footprints 280 should not be dominated by particular hot spot source areas. Particularly for aim 2, we further strive at covering the entire daytime footprint of the station. In contrast, aim 3, due to the generally small fossil fuel signals at ICOS stations, requires targeted sampling of "hot spot emission areas" in the footprint to maximize the fossil fuel CO2 signal in the samples. Note that the detection limit (or measurement uncertainty) of the fossil fuel CO2 (ffCO2) component with 14 CO2 measurements is of order 1-1.5 ppm (e.g. Levin et al., 2011). 285 There are a number of technical/logistic constraints concerning flask sampling, shipment and analysis in ICOS, which need to be taken into account when designing an operational sampling strategy that best meets the three aims listed above. The most important limitations are listed in the following:

290
Timing: In order that all flask sample results are useful for flux estimates with current regional inversion models, flasks should be collected during mid-day or early afternoon at the standard ICOS tall tower stations. During this time of the day, atmospheric mixing is strong and model transport errors are smaller than during night (Geels et al., 2007). For all samplings, wind speeds should be larger than about 2 m s -1 , so that the sampled footprint is well defined. The strategy outlined below has been developed for tall tower sites that are located not directly at the coast, i.e. that are of predominantly continental character. 295 Intake height: There is only one intake line from the highest level of the tower running to the flask sampler; therefore, only the continuous observations from this height can be quality-controlled with parallel sampled flasks (aim 1). As modellers prefer using data (aim 2) from the highest level of the tower (largest footprint, most representative, etc.), all flasks will be sampled from that highest level (as specified in the ICOS Atmospheric Station Specification Document, https://icos-300 atc.lsce.ipsl.fr/filebrowser/download/69422).
Integration period: Flasks should be sampled as integrals, i.e. the collected sample should represent a real mean of ambient air (e.g., a 1-hour mean, comparable to current model resolution). Also, synchronizing in situ continuous observations and integrated flask sampling is important for the quality control aim (aim 1). This latter requirement is easier to achieve with 305 longer integration times in flask sampling. This means, however, that for comparison reasons, the continuous in situ observations must be kept at the flask sampling height during the entire flask sampling period (i.e. no calibration gas measurement, no switching of in situ intake heights during flask sampling, no profile information available). This also means that flow rates, delay volumes and residence times in the tubing, as well as time of both, flask and in-situ sampling systems must be properly monitored. 310 https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License.
Flask handling: Flasks need to be installed and removed manually from the sampler. Remote stations are regularly visited about once per month by a technician. The flasks sampled to meet aim 1 should be shipped to the FCL within one month after sampling, so that a potential bias between in situ and flask analyses is detected without major delay. 14 CO2 analysis of flasks in the CRL is less urgent, therefore a few months delay in shipment of flasks collected for aim 3 are acceptable. 315 Consequently, all flasks will be shipped from the station to the FCL and after analysis a subset will be shipped for further 320 analysis to the CRL. After all analyses have been finished all flasks including those, which were analysed at the CRL are leaktested and conditioned at the FCL before dispatch to the stations.

Solutions and testing to meet aim 1: Ongoing quality control
The ICOS atmospheric station network supported by ICOS Central Facilities (ATC and CAL) has been designed and 325 implemented to achieve the highest possible accuracy, precision and compatibility of atmospheric GHGs measurements. ICOS aims to meet the compatibility goals agreed on by the international WMO/GAW measurements community (WMO, 2018) for all its measured components. These compatibility goals were chosen by the community to detect small inter-station gradients and to be used to estimate flux distributions by means of inverse models. For ICOS CO2 observations, a compatibility goal of 0.1 ppm or better is compulsory. Similarly, ICOS needs to meet the WMO compatibility goals for CH4 and CO, which are 2 330 ppb for both gases (WMO 2018). First evaluations of ICOS CO2 measurements indeed yield monthly mean afternoon differences between stations in the free troposphere above 100 m of typically very few ppm (Ramonet et al., 2020), underlining the importance of excellent precision and compatibility of these observations. With a regular and frequent comparison of flask and in situ measurements, ICOS aims at independently monitoring their 335 compatibility and provide respective alerts if e.g. the average difference of CO2 exceeds 0.1 ppm over a few weeks comparisons. Using flasks sampled from a dedicated intake line to cross-check the in situ measurements is an important part of the ICOS quality management. It allows an independent end-to-end QC of the entire in situ measurement system consisting of inlet system, drier, analyser and calibration. As mentioned above, for logistical reasons, about once per month or every five weeks a box with 12 flasks is scheduled to be shipped from a remote station to the FCL. After analysis, the flask results 340 covering about one month of time will be compared with the corresponding in situ data. In the following paragraph we elaborate the minimum number of comparison flasks and the corresponding time delay to detect a significant CO2 bias between flask and in situ measurements larger than 0.1 ppm. Therefore we tested experimentally at the ICOS pilot station in Heidelberg the envisaged flask sampling procedure, and present here its first application at an ICOS field station.

Flaskin situ CO2 comparisons in Heidelberg 345
Similar to the official ICOS atmosphere stations, Heidelberg is equipped with an ICOS-conform CRDS instrument continuously measuring CO2, CH4 and CO in ambient air. Also the Heidelberg instrument is calibrated with standard gases provided by the FCL and its continuous data are automatically evaluated at the ATC. All flasks have been analysed at the FCL.
However, since the site does not have a high tower and is located in an urbanized environment, the variability of the signal can complicate the flask-in situ comparison. 350 https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License.
In order to collect a real hourly integrated air sample in the flask, the flow rate through the flask has to be adjusted during the filling process (Turnbull et al., 2012, see Sect. 2.4). First tests with a decreasing (1/t) flow rate through the flasks were conducted in Heidelberg during the period of September 2018 to February 2019, and with a better suited flow controller for the 1/t decreasing flow rate from May to October 2019. Ambient air for continuous measurements as well as for flask sampling 355 was collected via by-pass from a permanently flushed intake line from the roof of the institute's building about 30m above local ground. These flasks have been collected not only at low ambient air variability during afternoon hours, but also during other times of the day, when within-hour concentration variations for CO2 at this urban site were higher than 10 ppm. The results of the concentration differences between in situ and flask measurements for CO2 are displayed in Fig. 3 (left panel).
During the first experimental period we obtained three outliers, where flask CO2 results have been up to more than 3 ppm 360 higher than the in situ measurements. CH4 and CO in the flasks (not shown) did, however, compare very well within a few ppb with the continuous in situ data. Although one of the mass flow controllers had some problems to exactly regulate the flow over the large range of flow rates, we did not find obvious reasons for malfunction of the sampling system. The only explanation for the outliers may, thus, be contamination of these flasks with room air, which is elevated in CO2 but not in CH4 or CO compared to outside air. 365 If we disregard the three outliers in the first testing period (one at a low variability situation, see Fig. 3

, right panel) and
consider only observations with ambient air CO2 variability < 0.5 ppm, the limited results from the (polluted) Heidelberg site give confidence that flask samples collected over one hour at low ambient CO2 variability are well suited to meet our aim 1 of ongoing quality control at Class 1 stations. It is important though that the different air residence times in the intake systems of 370 flask sampler and in situ instrument are properly adjusted; they may significantly differ, e.g. if a mixing volume system is installed in the intake lines (as at Hyltemossa). The mean differences between in situ and flask measurements for CO2 in Heidelberg have been 0.02 ppm at an ambient CO2 variability of less than 0.5 ppm, with a standard deviation of ±0.06 ppm (n=18) (see also Fig. 3 right panel, which shows that only one out of the 18 low-variability comparisons lies outside the ±0.1 ppm compatibility range indicated by the dashed red lines). For CH4 we observed for ambient variability smaller than 10 ppb 375 a mean difference of 0.18 ppb with a standard deviation of 0.74 ppb (n = 111). CO comparison data have not been evaluated here as the CRDS in situ data were not finally calibrated and thus not fully compatible with the flask results.
The test measurements in Heidelberg did clearly show that meaningful quality control results can best be obtained during situations of low ambient concentration variability. Individual concentration differences increase with increasing ambient 380 variability within the one-hour comparison period. The reason for this increase may be uncertainties in the synchronization of the measurements (note that a few minutes shifts in the timing of the integration already introduces a significant bias) or also due to incorrect flow rates through the flasks in the 1/t sampling scheme. For the QC aim, flask samples should preferentially be collected during low variability situations. We therefore evaluated how frequent afternoon events with less than 0.5 ppm variability occur at typical ICOS stations. In the years 2016 to 2019, except for few stations and for few summer months, we 385 find at all five stations at least 10 hours per month at mid-day (13 h local time (LT)) with hourly CO2 standard deviations smaller than 0.5 ppm. On average over the year more than half of all midday hours had CO2 standard deviations below 0.5 ppm. Based on this evaluation, we decided that we will not need to pre-select sampling days with low ambient variability but can pursue a very simple sampling scheme, e.g. sampling every three or four days, to be able to detect a mean bias larger than 0.1 ppm between flask and continuous measurements within a period of 4-5 weeks. On average we can expect that every 390 second flask we sample is suitable for precise intercomparison with in situ measurements. This simple methodology will help us meeting aim 2 (see below). https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License.

Flaskin situ CO2 comparisons at the ICOS station Hohenpeißenberg 395
A very first field test of our flask sampling scheme for QC was conducted at the ICOS station Hohenpeißenberg (HPB). From the highest level of the tall tower (131 m) ambient air for continuous measurements as well as for flask sampling was collected via two separate lines. Collecting flasks at HPB started in July 2019. The flasks were always sampled with a decreasing 1/t flow rate and between 12:30 and 14:00 UTC as we aim for conditions with low ambient variability, which is largest in well mixed conditions during the afternoon. Up to now, 48 flasks have been collected, which could be used for QC of this ICOS 400 Class 1 station. The overall results of the concentration differences for CO2 for the complete test period are shown in Fig. 4 (left panel).
Our first results of the comparison between continuous measurements and flasks were available in October 2019 and showed larger differences between in situ and flasks measurements than expected. A mean difference of 0.33 ppm with a standard 405 deviation of ±0.13 ppm (n=4) was determined for situations with an ambient variability of less than 0.5 ppm. Based on these results the intake system and the entire CO2 instrumentation was carefully checked. Whilst the last regular leak test on April For the period after the leak elimination, the calculated differences between in situ and flask measurements for an ambient variability of less than 0.5 ppm lie all within the compatibility goal for CO2 (0.1 ppm), see blue dots in Fig. 4 (right panel).
The mean difference between flasks and in situ measurements is 0.01 ppm with a standard deviation of ±0.06 ppm (n=5). 415 These results of the first field test of the flask sampling scheme for QC are promising, e.g. enabling detection of potential leaks at the stations. Once the flask QC procedures have been set up operational, potential system malfunctions can be detected within a month, complementing the half-yearly compulsory ICOS leak tests.

Solutions and testing to meet aim 2: Representative flask sampling
In the preceding section we could show that low ambient variability situations would be best suited to meet aim 1. Moreover, 420 a potential bias between flask and in situ measurements could be detected with better confidence with an increased number of comparisons. However, for meeting aim 2, a scheme collecting flasks only during low variability situations may cause a significant bias in the sampled footprint. We have tested if such a sampling bias would be visible in the European ICOS network and calculated with STILT all afternoon (13 h LT) footprints of the five selected stations for the year 2017. Figure 5 shows respective aggregated footprints for the month of October 2017. The left panels in each of the five rows show the 425 aggregations if every afternoon hour (13 h LT) was sampled, the middle panels the aggregated footprints for every third day and the right panels show the 10 footprints with the lowest variability during this month. As expected, we can see that regional coverage of the entire station footprint is generally better when sampling randomly every third day than when sampling the 10 days with the lowest variability.

430
In addition to the footprint analysis, which gives a visual qualitative idea of the effect of different flask sampling schemes, we evaluated the first three years of continuous CO2 measurements from the five ICOS stations to quantify the effect of random sampling every three days versus sampling only low variability situations. Figure 6 shows, in the upper panels for each station, all available hourly atmospheric CO2 data as grey dots, while the blue lines, each shifted by one day, connect the 13 h LT data every three days. The red dots in the upper panels highlight the 10 lowest variability afternoon values in each month. As 435 expected, all summer afternoon concentrations generally fall into the lower concentration range of the bulk of data. At all https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License. stations, the variability changes from a diurnal shape during the summer months to a more synoptic variability in the winter half-year (for more details see also Fig. 8 and 9). This synoptic variability is also represented in the afternoon sampling. In the five middle panels of Figure 6 we have plotted as black dots monthly means calculated from all afternoon hours between 11h and 15 h LT as well as their standard deviations. The blue dots show the monthly mean values obtained from sampling every 440 third day (the three different 3-day patterns are shown in individual shifted blue lines), while the red dots represent the monthly means calculated from the 10 samples with the lowest variability (the coloured dots were shifted by one day each for better visibility). It is obvious that regular sampling provides much more representative monthly means, deviating only in few cases from the all-afternoon means in CO2 by more than 2 ppm (Fig. 6, lowest panels). If samples were collected at low variability only, they would often underestimate monthly mean values, in some cases by more than four ppm (red lines in Fig. 6 lowest 445 panels). Although also regular sampling every third day introduces some variable deviations from the correct afternoon means, sampling only at low variability may introduce rather large biases mainly towards lower CO2 concentrations. Note that inversion models also select measured data for their inversion runs only for time of the day and not for low variability data to estimate fluxes (Rödenbeck, 2005).

450
We have investigated here only potential sampling effects on CO2 concentrations, however, also other tracer concentrations are expected to be affected in a similar way. For the ICOS atmosphere network we, therefore, choose the simpler sampling scheme of one flask every third day. This sampling scheme is expected to serve aims 1 and 2, where those flasks with low within-hour variability (on average one flask per week, see Sect. 4.1) could be used for the quality control aim, while all flask samples would deliver as much as possible representative data for all additional trace components analysed in the FCL solely 455 on flasks.

Solutions and testing to meet aim 3: Catching potentially high fossil fuel CO2 events
First 14 C analyses on integrated CO2 samples at ICOS stations showed rather low average fossil fuel CO2 (ffCO2) concentrations, therewith confirming that ICOS stations primarily monitor the terrestrial biospheric signals. Figure 7 (upper panels of the graphs for the individual stations) shows our first 14 CO2 results from the two-week integrated CO2 sampling at 460 Hyltemossa, Křešín, Observatoire Pérenne de l'Environnement and Hohenpeißenberg. Particularly during summer, the monthly mean regional fossil fuel CO2 offsets, if compared to a background level calculated from the composite of two-week integrated 14 CO2 measurements at Jungfraujoch in the Swiss Alps and Mace Head at the Irish coast, are often lower than a few ppm (Fig. 7, lower panels). Only during winter, regional ffCO2 offsets can reach two-week mean concentrations of more than 5 ppm. These signals, although providing good mean ffCO2 results for the average footprints of the stations, are often too small 465 to provide a solid top-down constraint of regional fossil fuel CO2 emission inventories and its changes when evaluated in regional model inversions (Levin and Rödenbeck, 2008;Wang et al., 2018). One of the aims of flask sampling in ICOS is, therefore, to explicitly sample air, which has passed over fossil fuel CO2 emission areas. Ideally we would like to obtain signals and analyse flasks for 14 CO2 only in cases when the expected fossil fuel CO2 component is larger than 4-5 ppm. This would allow to obtain an uncertainty of the estimated ffCO2 component below 30 % (Levin et al., 2003;Turnbull et al., 2006). Further, 470 as sample preparation for 14 C analysis is very laborious and the capacity of the CRL is limited to about 25 flask samples per station per year, one should know beforehand, if a sample potentially contains a significant regional fossil fuel CO2 component.
This could either be found out with Near Real Time transport model simulations or directly using the in situ observations at the station.

475
A good indicator for the potential regional fossil fuel CO2 concentration at a station is the ambient CO concentration (Levin and Karstens, 2007), a trace gas that is monitored continuously at all ICOS Class 1 sites. It would then depend on the average CO/ffCO2 ratio of fossil fuel emissions in the footprint of the stations to estimate from measured CO the expected ffCO2 https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License. concentration. Mean CO/ffCO2 emission ratios can be very different in different countries, they mainly depend on the energy production processes and on domestic heating systems. In this respect, also the share of biofuel use may be relevant. In our 480 study we first analysed our selected ICOS stations for regional fossil fuel CO2 signals larger than 4 ppm, and determined the frequencies of those events. Note that, in order for the flask results to be used in transport model investigations, similar to all other flask samples, also 14 CO2 flasks should be collected during early afternoon when atmospheric mixing can be modelled with good confidence. During these situations, however, any ffCO2 signals will be highly diluted. Similar to the approach in the previous section, we investigated the potential ffCO2 levels for the five stations Hyltemossa, Gartow, Křešín, Observatoire 485 Pérenne de l'Environnement and Hohenpeißenberg, first theoretically with STILT model simulations transporting EDGARv4.3 emissions to the five measurement sites. As a second step, we evaluated real continuous CO2 and CO observations from 2017 and 2018 (see Table 1).  Fig. 8). At the same time, the modelled CO offset was elevated, but did not 500 reach 0.04 ppm (second panel). CO offsets were estimated relative to the minimum modelled CO concentration of the last three days (grey line in second panel). In October 2017, the modelled (Fig 9, upper panel) and measured CO (Fig. 9, lowest panel) offsets do, however, rather frequently exceed 0.04 ppm. The generally good correlation between simulated ffCO2 and CO offset can therefore be used as a criterion for ffCO2 in collected flasks, and 0.04 ppm may be a good threshold for Gartow to predict a ffCO2 signal of more than 4 ppm in sampled ambient air. This is supported by real observations displayed in the 505 two lowermost panels of Figs. 8 and 9, where observed CO offsets > 0.04 ppm (marked by magenta crosses) coincide with high total CO2, and also with STILT-simulated ffCO2 (see for example the synoptic event on October 18-19, 2017).

Investigation of afternoon fossil fuel CO2 events in 2017 at Gartow
The aggregated footprints of the three afternoon situations with STILT-simulated ffCO2 > 4 ppm in July 2017 are displayed in Fig. 10 (upper panels). They show south-westerly trajectories and a dominating surface influence from the highly populated 510 German Ruhr area, but also some influences from large emitters (e.g., power plants) in north western Germany and at the Netherland's North Sea coast. The main influence area with high ffCO2 emissions in October 2017 (Fig. 10, lower panels), show also Berlin as a significant emitter and some "hot spots" close to the German-Polish border in the south-east.

Investigation of afternoon fossil fuel CO2 events in 2017 and 2018 at Hyltemossa, Křešín, Observatoire Pérenne de l'Environnement and Hohenpeißenberg 515
Overlapping measurements and STILT model runs are also available for the other four ICOS stations. The general picture is similar here as in Gartow, but the number of elevated ffCO2 events is often even smaller at these stations than at Gartow. For example we find no ffCO2 events at HTM, GAT, KRE and HPB, and only three at OPE in July 2018 (Table 1). Simultaneously observed CO elevations relative to background are often only small in summer and do not reach the (preliminary) threshold of https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License. 0.04 ppm. Starting in October or November, ffCO2 elevations become more frequent coupled to the more synoptic variability 520 of GHGs in the winter half-year (cf. Fig. 6 upper panels). The number of modelled fossil fuel CO2 events larger than 4 ppm for all months in 2017 and 2018 or based on observed CO offsets larger than 0.04 ppm using the same estimate for the CO background as for the model results displayed in Fig. 8 and 9 are listed in Table 1. Only in the winter half year we can potentially sample well measurable fossil fuel CO2 signals. Lower CO thresholds could be used for summer, then accepting larger uncertainties of the ffCO2 component. A better alternative would probably be to restrict 14 C analysis on flasks collected 525 in autumn, winter and spring, with the additional advantage that the variability of biospheric signals is smaller during these seasons (cf. Fig. 9).
To give some indications of the main ffCO2 emission areas influencing the four stations, Fig. 11 shows aggregated footprints as well as the respective surface influence areas contributing to modelled ffCO2 concentrations larger than 4 ppm in October 530 2017. At all four stations and also at Gartow (Fig. 10) the areas potentially contributing significantly to the fossil fuel signals are located rather far away and many of them are associated to large coal-fired power plants or other point sources. But also a few big cities such as Prague at Křešín occasionally contribute.

Implementation of the flask sampling scheme at ICOS stations
Sampling one flask every third day, independent of ambient CO2 variability can easily be implemented at ICOS stations since 535 sampling of all 24 flasks in the sampler can individually be programmed in advance. Assuming that flasks can be exchanged about once per month, during this time span 12 flasks would have been collected and could then be shipped in one box to the FCL for analysis. The remaining 12 flasks in the sampler would be reserved for ffCO2 event sampling. In order to have a realistic chance to catch all possible events at a station, the sampler would be set to fill one of these flasks on each day between the regular every third day sampling. As continuous trace gas measurement data are transferred from the station to the ATC 540 every night, level-1 CO data are available on the next morning after flask sampling the day before. These data will then be automatically evaluated at the ATC for potentially elevated CO to decide if the flask that had been collected on the day before has potentially an elevated ffCO2 concentration and should be retained for 14 CO2 analysis. If yes, the flask sampler will obtain a respective message from the ATC. If not, the flask can be re-sampled. Based on our analysis of modelled ffCO2 for the year 2017 and 2018, the likelihood is small that more than 12 ffCO2 events are sampled within one month. Also, some of the events 545 may already have been sampled in one of the "regular" every third day flasks. If this has been the case, these flasks will be marked, so that they are passed on to the CRL after analysis of all other components in the FCL. In the future, especially the flask sampling strategy for ffCO2 events might change once real-time GHG prediction systems or prognostic footprint products are available, which would allow more accurate targeting of certain emission areas. First tests, using prognostic trajectories to automatically trigger 14 CO2 flask sampling are made at the ICOS CRL pilot station and at selected ICOS Class 1 stations, but 550 are not yet mature enough to be implemented in the entire ICOS network. It is, however, also worth to mention that sampling flasks also during night time could largely increase the significance of 14 C-based ffCO2 estimates. Currently we optimize our sampling strategy to meet the inability of transport models digesting also night time data. This situation is unfortunate and must urgently be improved in order to increase our ability to monitor, in a top-down way, long-term changes of the envisaged ffCO2 emissions in Europe. 555

Conclusions
Developing a flask sampling strategy for a network like ICOS is a new approach, which, to our knowledge, has not yet been taken in any other sampling network. It may contribute to optimizing efforts at the (remote) ICOS stations as well as the analytical capacities and capabilities of the ICOS Central Analytical Laboratories. Our strategy was designed to meet, on one https://doi.org/10.5194/acp-2020-185 Preprint. Discussion started: 17 March 2020 c Author(s) 2020. CC BY 4.0 License. hand, the requirements for quality control, making sure by comparison of flask results with the parallel in situ measurements 560 that ICOS data are of highest precision and accuracy. Our first results showed that this strategy of independent quality control is successfully working. At the same time, our sampling scheme will provide flask results that can be optimally used in current inverse modelling tasks to estimate continental fluxes, not only of core ICOS components, such as CO2 and CH4, but also of trace substances, which are not yet measured continuously. Trying to monitor also fossil fuel CO2 emission hot spots at ICOS stations during well-mixed afternoon hours will be a particular challenge, because the ffCO2 influence at that time of the day 565 is often very small. There is thus an urgent need for transport model improvement so that also night time data can be used for the inversion of fluxes. Experience of the coming years will show if our current strategy is successful to meet all aims or needs further adaption. Author contributions: IL and UK designed the study, UK developed the Jupyter notebook and conducted the STILT model runs, ME built the flask sampler and developed its software, FM and SA conducted the flask sampling and evaluated the comparison data, DR was responsible for flask, SH for 14 CO2 analysis, MR was responsible for ICOS data evaluation, GV, SC, MH, DK and ML were responsible for the measurements at the ICOS stations. IL and UK prepared the manuscript with 580 contributions from all other co-authors.

Competing interests:
The authors declare that they have no conflict of interest.