Using atmospheric trace gas vertical profiles to evaluate model fluxes: a case-study of Arctic-CAP observations and GEOS simulations for the ABoVE domain

. Accurate estimates of carbon-climate feedbacks require an independent means for evaluating surface flux models at regional scales. Bulk quantities derived from the Arctic Carbon Atmospheric Profiles (Arctic-CAP) project demonstrate the utility of an altitude-integrated enhancement (AIE) diagnostic that leverages background mole fraction values from the middle free troposphere, are agnostic to uncertainties in boundary layer height, and can be derived from model estimates of mole 20 fractions and vertical gradients. To demonstrate the utility of the bulk quantity, six airborne profiling surveys of atmospheric carbon dioxide (CO 2 ), methane (CH 4 ) and carbon monoxide (CO) throughout Alaska and northwestern Canada between April and November 2017 were completed as part of NASA’s Arctic-Boreal Vulnerability Experiment (ABoVE). The Arctic-CAP sampling strategy involved acquiring vertical profiles of CO 2 , CH 4 and CO from the surface to 5 km altitude at 25 sites 25 around the ABoVE domain on a 4- to 6-week time interval. All Arctic-CAP measurements were compared to a global simulation using the Goddard Earth Observing System (GEOS) modeling system. Comparisons of the AIE bulk quantity from aircraft observations and GEOS simulations of atmospheric CO 2 , CH 4 and CO highlight the fidelity of the modeled surface fluxes. The model-data comparison over the ABoVE domain reveals that while current state-of-the-art models and flux estimates are able to 30 capture broadscale spatial and temporal patterns in near-surface CO 2 and CH 4 concentrations, more work is needed to resolve fine-scale flux features that are observed in CO observations. carbon neutral uncertainties allowing for a small to moderate sink or small 2016). These findings are also consistent with Wunch used satellite data and TCCON ground-based column measurements to determine that interannual variability in Northern Hemisphere CO 2 uptake was dominated by changes in the boreal forest. More recent studies, such Welp et al. and Commane et al. have also used inversions to highlight that >90% of the carbon sink in the northern high latitudes reside in the boreal forests. Our simple forward model simulations and the Arctic-CAP data provide a unique opportunity to assess the validity of these previous findings over the ABoVE domain. Sub-regional flux estimates within the ABoVE domain are part of ongoing investigations and will be captured in future studies.


Introduction
There are many uncertainties to predicting the impact of increased emissions of CO2 and CH4 in the atmosphere. Among the most important is the uncertainty we have in our estimate of carbon-climate 35 feedbacks (Arora et al., 2020). Without a better understanding of how changes in temperature, CO2 itself, water and nutrients are magnifying or reducing the impact of increased emissions of greenhouse gases (GHG), it will be difficult to use climate models to accurately predict climate change. This uncertainty not only stems from a poor mechanistic understanding of how the biosphere will respond at the smallest scales but also how changes in the landscape drive changes in local environments. 40 The Arctic, in particular, is a region where carbon-climate feedbacks are critical to understand given the vast quantities of carbon sequestered in the permafrost soils of the northern high latitudes (Hugelius et al., 2014), have led to concerns about the potential for significant carbon emissions due to changes in ecosystems, permafrost and large-scale disturbances like fires (Schuur et al., 2015;McGuire et al., 2018;Turetsky et al., 2020). Our understanding of the magnitude and behavior of the carbon system 45 response to these changes is rudimentary (Koven et al., 2015). For instance, release of carbon from the permafrost pool could result in increased emissions of CH4 from anaerobic degradation; increased emissions of CO2 from aerobic degradation; increased uptake of carbon due to new availability of nutrients and above-ground ecosystem growth; or an increase in mobilization of carbon through runoff. Alternatively, increases in disturbances such as fires may significantly impact below-ground carbon 50 storage, uptake of CO2 and emissions of CH4, CO, and CO2. Limitations in our understanding of the accuracy of modeled fluxes of CO2, CO and CH4 have increased uncertainties in predictions of the magnitude of Arctic carbon-climate feedbacks (e.g., Koven et al., 2011;Schneider von Deimling et al., 2012;Lawrence et al., 2015;Schuur et al., 2015). The lack of observations from which to build and evaluate models of the biosphere is a significant 55 source of the problem and leads to both enhanced uncertainty and reduced fidelity in our model simulations. In general, land-and ocean-atmosphere fluxes from climate models are most commonly evaluated using flux measurements made with eddy covariance or flux chamber techniques (Sasai et al., 2007). While flux measurements of these types are widely available over many ecosystem types, they represent the impact of limited spatial domains that are rarely more than a 1000 m radius around a given 60 site (Schmid, 2002;Gockede et al., 2005) and may be significantly smaller depending on topography, wind direction, boundary layer stability, and measurement approach. Land surface inhomogeneities within these small footprints (Baldocchi et al., 2005) and regional-scale (100-1000 km scales) variability of these ecosystems can lead to significant biases when eddy covariance measurements are scaled up to represent large areas (e.g. Mekonnen et al., 2016). This is especially true in the Arctic 65 where microtopography can result in fluxes varying by orders of magnitude on a scale of 1-100 meters (Johnston et al., 2014). An alternative to the "bottom-up" evaluation approach, which relies on the eddy covariance measurements, is the "top-down" approach, which makes use of atmospheric measurements of species like CO2, CH4 and CO and modeled atmospheric transport patterns to infer the surface fluxes needed to 70 reproduce observed atmospheric concentrations (Pickett-Heaps et al., 2011;Thompson et al., 2017 are examples in the Arctic) over large regional scales. In a data limited region, this inverse approach generally takes a forward-flux model, or a set of observations that are likely correlated with the flux, as a prior or first guess. The inverse approach then estimates the flux by scaling the prior. While the inverse approach results in a flux estimate that meets the constraint of the trace gas 75 measurements and modeled transport, the variability in surface flux from these analyses cannot be directly attributed to mechanisms such as temperature changes, CO2 fertilization, nutrient enrichment and water stress and, therefore do not have any predictive capabilities. Also, inverse methods are influenced by errors in atmospheric transport and assumptions about error covariances, which are difficult to characterize (Gourdji et al., 2012;Lauvaux et al., 2012;Mueller et al., 2018;Chatterjee and 80 Michalak, 2013). In this study, a hybrid approach is taken to evaluate and benchmark the accuracy of current state-of-theart bottom-up land-surface flux models using a bulk quantity calculated from atmospheric vertical profiles of trace gas mole fractions. The goal is to present an approach to evaluate land-surface flux models that capture complex carbon cycle dynamics over the northern high-latitudes. NASA's Goddard 85 Earth Observing System (GEOS) general circulation model (GCM) is used with a combination of surface flux components for CO2, CH4 and CO to create 4D atmospheric fields; these fields are subsequently evaluated using the altitude-integrated enhancements (AIE) calculated from profiles collected during the Arctic Carbon Atmospheric Profiles (Arctic-CAP) airborne campaign. Both the Arctic-CAP project and the GEOS model runs for the domain are part of NASA's Arctic 90 Boreal Vulnerability Experiment (ABoVE, www.above.nasa.gov), a decade-long research program focused on evaluating the vulnerability and resiliency of the Arctic tundra and boreal ecosystems in western North America . One of the primary objectives of the ABoVE program is to better understand the major processes driving observed trends in Arctic carbon cycle dynamics, in order to understand how the ecosystem is responding to environmental changes and to characterize the impact 95 of climate feedbacks on greenhouse gas emissions. ABoVE has taken two approaches to better understand critical ecosystem processes vulnerable to change. The first is through ground-based surveys and monitoring sites in representative regions of the ABoVE domain. These multi-year studies provide a backbone for intensive investigations, such as airborne deployments. The Arctic-CAP campaign discussed here was one such airborne deployment that was conducted during the spring-summer-fall of 100 2017 (Section 2.1). The subsequent analysis described here illustrates how improvements in surface models develop through ground-based surveys, and monitoring sites can be evaluated and tested over larger spatial scales using aircraft profiles (Section 3). This study uses the bulk quantify from Arctic-CAP aircraft profiles to directly evaluate the terrestrial surface flux models of CO2, CH4 and CO. For the sake of demonstration, we rely on one transport model and one flux scenario for each tracer (i.e., 105 CO2, CH4 and CO) to show the utility of the three carbon species to diagnose and identify deficiencies in the land flux models. Ongoing and future studies build upon the results discussed here and further diagnose transport and flux patterns from multiple models based on additional aircraft and ground-based observations throughout the ABoVE domain. This approach demonstrates the value of aircraft profiles.

Arctic-CAP Flight Planning and Sampling Strategy
Arctic-CAP was designed to measure vertical profiles of atmospheric CO2, CH4 and CO mole fraction to capture the spatial and temporal variability of carbon cycle dynamics Parazoo et al., 2016)  Lake and Scotty Creek flux tower sites were overflown on the way to and from the Yellowknife area. The Canadian Circuit expands upon the ecoregions covered in the CARVE missions to include the Boreal Cordillera, Taiga Plain, Taiga Shield and the Southern Arctic Tundra ecoregions. Approximately 25 vertical profiles were acquired during each campaign (Fig 2). The majority of each flight day was spent in the well-mixed boundary layer with 2-4 vertical profiles up to altitudes of 5000 130 m above sea level (masl). Using missed approaches to get as near to the ground as possible, profiles diagnosed temporal changes in the boundary layer and residual layers above where surface fluxes may have recently (< 3 days) influenced that atmospheric column.  Sweeney et al., 2015). The focus of this study will be on the CO2, CH4 and CO data acquired during Arctic-CAP and, in particular, utilizing the profiles acquired during each flight to separate signals from near field surface fluxes from large-scale deviations in a way that is agnostic to model errors due to 140 inaccurate vertical transport.

Aircraft and Payload
Arctic-CAP flights were performed with a Mooney Ovation 3 aircraft (tail number N617DH, Scientific Aviation). The Mooney operated at a cruise speed of 170 kts and reached profile altitudes of 5 km 145 (17,000 feet) on each flight, with most legs lasting 4-5 hours and covering an average distance of ~1350 km. The average ascent and descent rates were limited to ~100 m/min to minimize hysteresis in the temperature and relative humidity measurements. The basic research payload flown on all six research missions included continuous in-situ CO2, CH4, CO, H2O, temperature and horizontal winds. The in-situ measurements (Sweeney and McKain, 2019) followed the methodology described in Karion et al. 150 (2013), and wind measurements followed the protocol outlined in Conley et al. (2014). During Arctic-CAP, insitu measurements of CO2, CH4 and CO were made every ~2.4 s and aggregated to 10 s averages for comparison the GEOS 4D fields (latitude, longitude, altitude and time). Sampling at the 10 s resolution reduces the spatial representativeness error between the model grid cell and the aircraft observations. 155 Programmable flask packages (PFPs; Sweeney et al., 2015) provided an independent check of the calibration scale of the continuous in situ CO2, CH4 and CO measurements, as well as samples for more than 50 different species including N2O, SF6, and a variety of hydrocarbons, halocarbons and isotopes of carbon . Carbonyl sulfide measured in the flask samples can be used as a tracer of gross primary productivity (GPP) (Montzka et al., 2007), while ethane, propane and C-13 isotope of 160 CH4 provide another constraint on the source of the CH4 emissions. Each flight sampled a single 12flask package providing a total of ~84 flasks per research mission to better understand the factors controlling local fluxes of CO2, CH4 and CO and the long-range transport of these species from low latitudes.

GEOS Earth System Model & Atmospheric CO2, CO and CH4 Modelling 165
The GEOS (Rienecker et al., 2011;Molod et al., 2015) model is a complex yet flexible modeling system that describes the behavior of the land and atmosphere on a variety of spatial (~12.5-100 km) and temporal (hourly to decadal) scales. GEOS includes both an atmospheric General Circulation Model (GCM) and data assimilation system that have been used to produce the widely-used Modern-Era Retrospective Analysis for Research and Applications (MERRA) (Rienecker et al., 2011) and 170 MERRA-2 (Bosilovich et al., 2015;Gelaro et al., 2017). The GEOS Forward Processing (GEOS FP) system produces atmospheric analyses and 10-day forecasts in near real-time, which are used to provide forecasting support to NASA field campaigns and satellite instrument teams (e.g. . GEOS has also been used extensively to study atmospheric carbon species (e.g. Allen et al., 2012;Ott et al., 2015;Weir et al., 2020). 175 The GEOS setup utilized in this work simulates CO2, CO and CH4 simultaneously at nominal 0.5° horizontal resolution, 72 vertical layers (up to ~0.1 hPa) with trace gas output saved every 3-hours. For CO2, the surface fluxes consist of 5 different components from a Low-order Flux Inversion (LoFI) package (Weir et al., 2020): 1) net ecosystem exchange (NEE) from the Carnegie Ames Stanford Approach -Global Fire Emissions Database (CASA-GFED) mode with a parametric adjustment applied 180 to match the atmospheric growth rate (Weir et al., 2020), 2) anthropogenic biofuel burning emissions, i.e., harvested wood product ( snow and freeze/thaw cycles on soil temperature and moisture, and thus the CH4 emissions. Table 1 provides a summary of the flux components, their specifications and associated references.

AIE calculation
As will be explained in the following results section the surface fluxes of CO2, CH4 and CO in GEOS is 205 compared to aircraft observations by first subtracting the average daily free tropospheric value (>3000 m for CO2 and CH4 and >4000 m for CO, XFT) from each measurement below 3000 m and comparing that to the altitude integrated sum Eq. 1 where DX is altitude-integrated sum of the mole fraction of species X minus XFT divided by the nBL 210 where and n is the atmospheric number density. It is assumed that the mole fraction of each trace gas species measured at the lowest point in each profile is constant to the ground level. Ground level altitude is taken from USGS (USGS, 2017). Thus, the AIE is equivalent to average enhancement in the boundary layer after accounting for altitude changes in number density. 215  Welp et al. (2016) and Commane et al. (2017) have also used atmospheric inversions to highlight that >90% of the carbon sink in the northern high latitudes reside in 235 the boreal forests. Our simple forward model simulations and the Arctic-CAP data provide a unique opportunity to assess the validity of these previous findings over the ABoVE domain. Sub-regional flux estimates within the ABoVE domain are part of ongoing investigations and will be captured in future studies.  1 in Saunois et al., 2019); instead, we adopt a much simpler approach of repeating the EDGARv4.3.2 from 2012 for the year 2017. Contrary to the emissions from the coal, oil and gas sector, our wetland methane flux emissions are obtained from the LPJ-wsl model (Table 1). LPJ-wsl is one of the prognostic models that provide wetland emission estimates to the global methane budget ( Table 2 in Saunois et al., 2019). It is not surprising then that our global wetland CH4 emission estimates for 2017 is 265 in line with both the bottom-up (100-183 TgCH4 yr -1 ) and top-down (155-217 Tg CH4 yr -1 ) estimates used in the global methane budget estimate.  (Table  4). Figure 4 presents the composite vertical profile data for each campaign. The monthly composite CO2, CH4 and CO vertical profiles capture the expected variations in the seasonal cycle. The composite profiles also show more variability in the boundary layer (altitudes < 3000 masl) within each month and 275 across months than in the free troposphere for CO2 and CH4 (altitudes > 3000 masl). Unlike CO2 and CH4, CO variability in the free troposphere is significantly greater in July and October than the boundary layer showing either long-range transport of CO or CO injected high (>3000 masl) into the troposphere by local wildfires. A clearer picture of the vertical gradients between the free troposphere and the boundary layer can be 280 seen by subtracting free tropospheric means from measurements below 3000 masl. The CO2 gradients between the measurements below 3000 masl and average daily free troposphere values show a drawdown in the boundary layer for most of the profiles starting in June and lasting until the end of the September campaign (Fig. 5). The drawdown signal in CO2 over the Northern Alaska Tundra (often referred to as the "North Slope") was most pronounced in mid-July and continued through the 285 September campaign. The CO2 drawdown in the more southerly regions of the Boreal Cordillera and Alaskan Boreal Interior peaked in August. By the October campaign many regions were showing significant enhancements in the boundary layer CO2 mole fraction relative to the free troposphere. On the other hand, for both CH4 and CO, significant enhancements were observed from June through early November. Methane enhancements over the Northern Alaska Tundra were observed from July onward, 290 consistent with patterns observed at the long-term surface monitoring station in Utqiaġvik . Similarly, boundary layer CO2 and CH4 are both most enhanced in September and October on the North Alaska Tundra. Due to the high variability in CO above 3000 masl during July and October (Fig. 4), it is more difficult to use this approach to derive CO enhancements from surface fluxes. To avoid the impact of fire-based CO that has been injected into the free troposphere, the mean 295 background value is taken from measurements above 4000 masl. This analysis shows that Canadian Taiga and Alaskan Boreal Interior are the predominant sources of boundary layer CO emissions reflecting fires in these regions at that time. It should be noted that large enhancement values for CO2, CH4 and CO were observed with the Alaskan Boreal Interior, which were the result of samples taken in the early morning (10:00 local time) before the boundary layer had fully developed (typically around 300 11:00-12:00 local time). This trapping of night-time emissions results in significant enhancements that quickly taper off with altitude. These measurements were typically taken during the first profile out of Fairbanks where the majority of the Arctic-CAP flights originated.

Model Data Comparisons
Aircraft profiles that measure the gradient from the boundary layer into the free troposphere are 305 particularly useful for evaluating atmospheric models and for separating errors and uncertainties related to atmospheric vertical transport and surface flux model simulations. This is demonstrated by comparing surface flux models for CO2, CH4, and CO using a single GCM to evaluate the land surface flux model.

Point by Point Comparison 310
In the GEOS model run used for these comparisons, an effort was made to match the global atmospheric burdens of CO2, CH4 and CO; however, given the uncertainties in the sources and sinks of these trace gases and in the representation of long-range and local atmospheric transport, it is not uncommon to have mean offsets between the observed and the modeled mole fractions. To evaluate surface fluxes in the ABoVE domain, it is important to consider both the impact of regional-scale fluxes 315 and long-range transport processes that control the mole fractions of CO2, CH4 and CO throughout the ABoVE domain. A time series comparison of the modeled and the observed CO2, CH4 and CO mole fractions (Fig. 6) suggests that gross features of the seasonal cycles are matched, although some significant differences require detailed analysis by considering different elements of each vertical profile. 320

Free Troposphere Comparisons
As demonstrated from the analysis of the boundary layer enhancements (Fig. 6) observed during Arctic-CAP, it is useful to subtract the average free tropospheric mole fraction from each profile to better understand the local influences within a particular profile. Differences in the mean free tropospheric values, however, can be a valuable indicator of how large-scale biases in the model influence point-to-325 point comparisons.
In the case of CO2, the mean daily CO2 mole fraction in the observed free troposphere is increasing faster than modeled values over the course of 6 research missions. The largest offset exceeds a mean value of ~2 ppm (observed -modeled) during the September campaign (Fig. 7). Based on the available model runs, it is difficult to diagnose what causes this offset, although a few hypotheses can be put 330 forward. Given the decreasing latitudinal gradient for CO2 in the free troposphere at this time of year, the offset could be explained by sluggish meridional transport in the model. Alternatively, exaggerated biological uptake in the model in regions outside the study area could be pulling down the CO2 in modeled free troposphere more rapidly than the drawdown observed over the ABoVE domain. Likewise, measured CH4 increases faster than modeled CH4 over the course of the campaign. Given the 335 decreasing meridional gradient for CH4 that exists during the summer months, sluggish transport could explain the difference between model and observations. Alternatively, modeled June-July-August emissions of CH4 in areas contained by the ABoVE domain could be underestimated, leading to slower increase in modeled free tropospheric CH4. Finally, the difference between modeled and observed mole fractions of CO in the free troposphere is 340 mainly driven by inaccuracies in the modeled CO from fire plumes both within and outside the ABoVE domain. Figures 4, 6 and 7 show observations of large CO enhancements above 4000 masl during the July, August and October/November campaigns. Given the large excursions in the free tropospheric CO between different profiles, local fires were likely responsible for these enhancements. Accurately simulating the injection height of fire plumes is challenging (Freitas et al., 2007;. 345 The GEOS model distributes biomass burning emissions throughout the planetary boundary layer (PBL) to represent injection above the surface layer, but this method can result in underestimated local emissions for fire plumes detraining in the free troposphere. In regions remote to the ABoVE domain, emissions can be mixed and lofted by large-scale weather systems, which may explain why the model performs better in simulating long-range CO plume transport than it does in capturing the CO 350 enhancements from local fires. The observation-model mismatch is likely compounded by the inability of the model to accurately simulate the subgrid-scale vertical mixing necessary for capturing vertical profiles for local sources.

Boundary Layer Comparisons
Accurately modeling boundary layer mole fractions of CO2, CH4 and CO depends on an accurate 355 representation of two key factors. First, there is a need to accurately model the local surface-atmosphere flux and second there is a need to correctly model the physical evolution of the PBL, as well as horizontal transport and vertical mixing out of the PBL into the free troposphere. GCMs have limited horizontal and vertical resolution and require parameterizations to predict both the rate of change and the absolute value of the PBL height over the course of the day. Errors in PBL mixing directly impact 360 the tracer mole fraction estimate. Overestimation of the PBL height causes an artificial dilution of the impact of surface flux. Conversely, underestimation of the PBL height results in amplification of the impact of a surface flux on the simulated PBL mole fraction. Additionally, GCMs typically simulate large-scale horizontal gradients more accurately than PBL height unless there are large topographic changes that occur on horizontal scales less than the model resolution (for GEOS, 0.5 degree). This is 365 because such large-scale patterns are generally well-constrained by the millions of in situ and satellite observations incorporated into meteorological analyses while PBL mixing is represented by highly simplified parameterizations The three carbon species that we investigate in this study provide different diagnostic information about the model transport and flux specifications. In the case of a gas like CO that often comes from a specific 370 point source in the Arctic, accurate placement of the emissions, both in the horizontal and the vertical, and the modeled wind direction are critical factors. The ABoVE domain is made up of large expanses of forest and tundra in which CO2 fluxes are more uniformly distributed, making the transport accuracy of individual plumes a less critical factor for simulating CO2. Accurately estimating CH4 mole fractions may be more sensitive to horizontal transport in the PBL if CH4 emissions are dominated by specific 375 features such as lakes or wetlands, or anthropogenic point sources from oil and gas production such as those observed on the North Slope (Floerchinger et al., 2019). However, we observed consistent PBL CH4 enhancements throughout each campaign (Fig. 5), suggesting a spatial homogeneity in CH4 emissions rather than emissions from specific point sources.

Altitude-integrated Enhancements (AIEs) 380
While individual mole fraction measurements are challenging to reproduce given errors in both modeled surface fluxes and transport, the vertical profile provides a unique opportunity for removing significant uncertainties in transport in order to better assess the surface flux model of a specific long-lived tracer. Assuming that horizontal transport is a relatively small source of bias and the upper part of the free troposphere (>3000 masl) is largely unaffected by local processes, it is possible to use the information 385 in the vertical profile to reduce the effects of vertical transport. This can be estimated by vertically integrating the net change in the PBL due to a surface flux from the surface to a specific altitude that is well above the boundary layer. For this study, almost all the enhancements for CO2 and CH4 were observed below 3000 masl. By subtracting the average free tropospheric (FT) values in both the model and the measurements and 390 averaging the resulting enhancements or depletions for each profile mapped on equal altitude bins from surface to 3000 masl (Eq. 1), we quantify a total enhancement (AIE) resulting from the surface flux (Fig. 8). The resulting measured and modeled AIE show good correlations for CO2 and CH4 but the CO correlations are not as promising. The average measured enhancement in CO2 and CH4 below 3000 masl is correlated with the forward 395 model such that more than 50% and 36%, respectively, of the observed variability is captured by the model (Fig. 8). The average CO enhancements in the lower 3000 masl is captured by the model with lesser accuracy -in fact, the model only captures 26% of the observed variability along with a significant bias throughout the growing season.

CO2 AIE 400
To understand the true value of the aircraft profile in evaluating the ability of the surface flux model to reproduce observed fluxes over large regional expanses, it is useful to rigorously compare the differences between modeled and observed near-surface enhancements. The enhancements of CO2 below 3000 masl shown in Fig. 8 for both data and the GEOS model are well correlated. As expected, during April/May we see very little change in the AIEs below 3000 masl, while June and July and 405 August show significant drawdown, followed by enhancements in September and October/November ( Fig. 6 and 8). The modeled AIEs in the lower 3000 masl reproduce the observations suggesting that the surface flux of CO2 throughout most of the ABoVE domain is accurately modeled by GEOS. Despite the overall agreement indicated by aggregated statistics, a closer look shows significant differences in observed and modeled CO2 enhancements for many individual flight days (Fig. 9). 410 Inspection of individual profiles (Fig. 10) reveal that in some cases the model is not capturing nearground stratification observed in the river valleys of the interior parts of the ABoVE domain. This is not surprising given that the observations have a much higher vertical resolution than the model's vertical resolution, which is ~100m in the PBL. Consequently, the observed mole fraction values are much higher than the model estimates because the model is not able to capture the stratification. However, the 415 overall modeled vertical gradients in CO2 match the observations suggesting that the large-scale vertical transport of emissions is accurately simulated above ~1000 masl. As an example, the set of profiles from July 10 ( Fig. 10) demonstrates that, although infrequent, high PBL heights and emissions from fires (as indicated by large (>400 ppb) enhancements in CO) add some uncertainty to the AIE values. Both of these factors impact the mean free tropospheric correction and altitude of integration that we 420 have chosen to accurately capture the total CO2 enhancement from the surface fluxes.

CH4 AIE
Although the correlation between the observed and modeled AIEs of CH4 is significant, they are not as good as they are for CO2. In particular, we see some clear biases in the seasonality where the enhancements in the early part of the season are underestimated by the model while the enhancements 425 in the later part of the season are overestimated. This is demonstrated both by the comparisons of AIEs (Fig. 8) and of mole fraction enhancements below 3000 masl (Fig. 9) where the mean difference (observed -modeled) switches from positive to negative over the course of the study period. The Arctic-CAP profile observations provide a critical point of comparison to which future surface flux models of CH4 can be compared, helping to identify areas where process improvements are needed. 430

CO AIE
The comparison of observed and modeled AIEs of CO is less useful because some of the critical assumptions made for this comparison are designed to shed light on surface processes affecting CO2 and CH4. The biggest limitation in the CO simulation for interpreting vertical profile observations appears to be in the accuracy of the vertical distribution of CO emissions. While the model shows an increase in 435 mole fractions during the July and October/November campaigns, the extreme mole fractions in the observations are twice that of the model (Fig. 6). A good example of how the model and the observed mole fractions are different can be seen on July 10, 2017 (Fig. 10) during a flight up the Mackenzie River in the Northwest Territories of Canada. Here, large enhancements of CO (>400 ppb) are observed at altitudes between 3000 and 5000 masl while CH4 and CO2 boundary layer enhancements are 440 observed below 3000 masl in most of the profiles measured that day. The ~100 ppb CO/ppm CO2 ratio and the large CO enhancement not only support the idea that a fire is the source but that the fire is nearby (<100 km). Both the magnitude and altitude of the CO enhancement point to a few critical limitations in the model that was less important for CO2 and CH4. First, most GCMs, including GEOS, do not take into account the massive heat source that fires provide to correctly model the injection of 445 fire emissions above the boundary layer. Second, the fire radiative power observations used to estimate emissions can be obscured by thick clouds or aerosols resulting in the emissions estimates missing some fire hotspots. Third, the heterogenous nature of fires as a surface source of CO means that any inaccuracies in horizontal transport or location of the fire will play a large role in the ability of the model to accurately reproduce the observations. Fourth, the lack of diurnal cycle in biomass burning 450 emissions from the emission database (QFED; Table 1) may result in 'temporal aggregation errors', whereby the model simulations may miss the high emission values that coincide with the daytime aircraft observations.

Model-data mismatch over ecoregions
The bulk quantity AIE can be used to evaluate surface flux models with aircraft profiles at the regional-455 scale (Fig. 11). For most regions and times of year, the difference in CO2 AIEs is not statistically significant; however, there are certain regions such as the Northern Tundra of Alaska, where the modeled CO2 AIEs are significantly different and amplify a pattern that is observed over other regions. In early spring, the model slightly overestimates observed boundary layer enhancements but a month later the model underestimates drawdown. Figures 6 and 11 suggest that the peak in early-summer 460 model drawdown in CO2 is preceding the observed CO2 drawdown. The difference between observed and modeled enhancements change sign again during the July flight in Northern Tundra Alaska with an underestimation of the drawdown. Similar patterns can be observed in the Canadian Boreal Cordillera, suggesting that the timing of the summertime drawdown is too early in the model in this region. Over the same period, however, comparisons over the Western Alaska Tundra depict opposite patterns 465 (although far more subtle). While the offsets in the fall months are smaller, there is the suggestion that the enhancements in the Southern Arctic and Canadian Taiga ecoregions are both underestimated in the model. For CH4, the seasonal bias (underestimation in the spring and overestimation between July-September) in the AIEs between observations and models stands out as the most significant feature. The notable exceptions are again the Northern Tundra of Alaska and Canadian Boreal Cordillera, where CH4 470 AIEs in July and at the end of October are significantly underestimated. For reasons explained earlier, the CO comparison is less informative. However, if one were to analyze data from the month of September, which had no significant influence from fires in the free troposphere, it would suggest that the model continues to underestimate the impact of CO emissions across all regions.

Separating local, region and global vertical gradients 475
By extracting enhancements below 3000 masl from the observations and the model we have largely separated two major sources of biases and uncertainty in a model-data comparison -vertical transport and offsets in background mole fraction. However, it should be acknowledged that gradients between the boundary layer and free troposphere are not controlled exclusively by local fluxes and that in the Arctic, in particular, vertical gradients can be controlled by non-local influences. To explore the impact 480 of long-range transport Parazoo et al. (2016) preformed three simulations to better understand the drivers of the vertical gradient over Alaska and found that 48% of the amplitude (April/May-July/August) in the seasonal vertical gradient was driven by local fluxes from Alaska while the rest was driven by fluxes from the rest of the Arctic (11%) and low latitude (<60N, 41%). For CO2, the impact of long-range transport to the vertical gradient is complicated by the difference in timing of the initial 485 drawdown in the spring and the uptick in the fall at low latitudes verses that of high latitudes. The earlier drawdown of CO2 at low latitudes and the transport of that air via the free troposphere to Arctic significantly reduces the negative vertical gradient in the Arctic. At the same time, the early uptick of CO2 mole fraction in the Arctic relative to the low latitudes enhances the positive vertical gradient in the early fall (Parazoo et al., 2016). 490 To account for the background vertical gradient in CH4 entering the contiguous US, Baier et al. (2020) and Lan et al. (2019) subtracted 12-15 ppt from the vertical gradient to account for a preexisting gradient in CH4 coming onto the continent. Analysis of the background gradient suggests that this preexisting vertical gradient is a combination of upstream emissions and wind shear which separates the origin of the boundary layer air from that of the free troposphere. Large meridional gradients in CH4, 495 such as those observed in the mid latitudes, will drive depletion of the free troposphere relative to that of the boundary layer over the Arctic. Similarly, CO vertical gradients will also be affected by non-local fluxes and wind shear between the boundary layer and the free troposphere. In the case of CO and CH4 there is also likely to be a vertical gradient that is influenced by the oxidation of these molecules.
However, given the relatively long residence time of these molecules and the low sampling altitude in 500 the free troposphere (between 3000 and 5000 masl) of this experiment, this effect is small. From this perspective, the preexisting vertical gradient outside the domain of interest illustrates the importance of the model accuracy in non-local fluxes and the importance of long-range transport in the analysis. One approach ensuring a better boundary conditions is to use a global inversion (e.g. CarbonTracker (Peters et al., 2007)) to initialize the local region where the prognostic flux model is 505 then run to simulate local fields as is done to initialize regional Legrangian inversion models (e.g. Hu et al., 2019).

AIEs as a tool for benchmarking fluxes
This comparison of AIEs from Arctic-CAP and GEOS demonstrates one of the many values of the aircraft profiles as metric for evaluating model performance. In a similar vein, Stephens et a.l (2007)  510 used the vertical gradient to evaluate the model performance which pointed significant errors both from the surface flux models and the vertical transport in the Transcom 3 inversions (Gurney et al., 2002;2004). The AIE approach has also been used extensively in the Amazon and Arctic as means of optimizing fluxes in an inversion framework. Zhou et al. (2002),  and Gatti et al. (2010; have all used some form of AIE from aircraft profiles to estimate surface fluxes of CO2 515 and CH4 in the Amazon basin. Similarly, Zhang et al. (2014), Hartery et al. (2018) and Commane et al. (2017) use the AIE to produce a set of optimized fluxes CH4 and CO2 in the Alaska region. This approach to quantifying regional fluxes has significant advantage over other approaches because it less dependent on an accurate simulation of vertical transport and boundary layer height as point out in section 3.2.3. However, even in this instance there is a need to calculate the average influence of the 520 boundary layer enhancements and this can change dramatically depending on the accuracy of the modelled boundary layer height relative to the integration height of the AIE. In the comparison between observed and modelled AIE presented in this study the focus is on benchmarking a given model's ability to reproduce the AIE in different regions and seasons to objectively quantify how this model might do as conditions change as is expected with changing climate. From this perspective the need for 525 an accurate simulation of vertical transport largely disappears because the near-field fluxes are not being computed but just evaluated. The obvious caveat to this approach is that changing climate will bring with it different covariations in temperature, water, radiation and nutrient availability that cannot be reproduced over this time and space domain. While this approach does not replace model bench marking using eddy covariance measurements, it provides an important view of how modelled 530 processes reproduce observations over scales of 1-3 days and 10-100s of kms.

Conclusions
The Arctic-CAP campaign was composed of 6 different research missions from April to November 2017. It sampled CO2, CH4 and CO vertical profiles from the surface to 5000 masl across the ABoVE domain in Alaska and Northwestern Canada, covering 6 major Arctic ecoregions. Arctic-CAP airborne 535 surveys included large Tundra and Boreal ecosystems that are the likely sources of large changes in the seasonal cycle of CO2 and have been the subject of great speculation about future emissions of CH4.
Arctic-CAP's CO2, CH4 and CO profiles provide an excellent basis for evaluating the surface flux models used within state-of-the-art atmospheric transport models, and thus, are an important tool for understanding carbon cycle feedbacks. Comparisons of Arctic-CAP CO2, CH4 and CO observations 540 against GEOS model show the following main results. For CO2, the flux model (land and ocean biosphere and fossil fuel) reproduces seasonal and regional depletions and enhancements observed by aircraft profiles after adjusting for small systematic offsets. For CH4, the model simulations agree reasonably well with the observed vertical profiles, but the model underestimates CH4 in the spring and overestimates it in the fall. Modeled North Slope CH4 is underestimated throughout the measurement 545 period pointing to deficiencies in the wetland flux specifications over this ecoregion. For CO, the comparison between modeled and observed values were confounded by large biomass burning enhancements in the free troposphere that were not captured in the model. Despite these minor shortcomings, the forward model estimates for CO2 and CH4 represent a marked improvement in model-data differences compared to those done previously for CARVE (Chang et al., 2014;Commane 550 et al., 2017). Results and the flux budgets demonstrate that model representation of CO2 and CH4 for northern high-latitude ecosystems have advanced significantly since the state-of-the-science survey by Fisher et al. (2014). Inversions of the Arctic-CAP data using these fluxes as the prior estimate should further refine the flux estimates and the budget for the ABoVE domain. We note that our comparisons used only GEOS forward model values and slightly different model-data mismatches may be obtained 555 by using a different transport model. This study highlights the value of collocated airborne CO2, CH4 and CO vertical profiles for quantifying model strengths and weaknesses and for benchmarking fluxes over larger spatial and temporal scales than is offered by EC comparisons. Such evaluation information is essential to improve model characterization of both surface-atmosphere fluxes and to improve our confidence in the accuracy of 560 projections of future conditions. We strongly recommend regular, systematic CO2, CH4 and CO vertical profile observations across the Arctic as an important and cost-effective method to monitor the Arctic for abrupt transformations or potential tipping points in the permafrost-carbon system.

Competing interests
The authors declare that they have no conflict of interest.

Acknowledgements
This research was supported by the NASA Terrestrial Ecology Program award #NNX17AC61A, "Airborne Seasonal Survey of CO2 and CH4 Across ABOVE Domain", as part of the Arctic-Boreal 585 Vulnerability Experiment (ABoVE). A portion of the research presented in this paper was performed at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National Aeronautics and Space Administration. GEOS model runs and the work of AC was supported by funding from the NASA ROSES-2016 Grant/Cooperative Agreement NNX17AD69A.        Figure 11. Average observation-model integrated enhancement differences by ecoregion. Standard deviation of 885 differences for each region are shown with black and red bars. Red (black) bars signify a negative (positive) average enhancement below 3000 meters relative to the daily mean tropospheric value above 3000 masl for CO2 and CH4 and above 4000 masl for CO.