Spatial distributions of X CO 2 seasonal cycle amplitude and phase over northern high latitude regions

. Satellite-based observations of atmospheric carbon dioxide (CO 2 ) provide measurements in remote regions, such as the biologically sensitive but under sampled northern high latitudes, and are progressing toward true global data coverage. Recent improvements in satellite retrievals of total column-averaged dry air mole fractions of CO 2 ( X CO 2 ) from the NASA Orbiting Carbon Observatory 2 (OCO-2) have allowed for unprecedented data coverage of northern high latitude regions, while maintaining acceptable accuracy and consistency relative to ground-based observations, and ﬁnally providing sufﬁcient data in 5 spring and autumn for analysis of satellite-observed X CO 2 seasonal cycles across a majority of terrestrial northern high latitude regions. Here, we present an analysis of X CO 2 seasonal cycles calculated from OCO-2 data for temperate, boreal, and tundra regions, subdivided into 5 ◦ latitude by 20 ◦ longitude zones. We quantify the seasonal cycle amplitudes (SCA) and the annual half drawdown day (HDD). OCO-2 SCA is in good agreement with ground-based observations at ﬁve high latitude sites and OCO-2 SCA show very close agreement with SCA calculated for model estimates of X CO 2 from the Copernicus Atmospheric Monitoring Services (CAMS) global inversion-optimized greenhouse gas ﬂux model v19r1 and the CarbonTracker2019 model (CT2019B). Model estimates of X CO 2 from the GEOS-Chem CO 2 simulation version 12.7.2 with underlying biospheric ﬂuxes from CarbonTracker2019 (GC-CT2019) yield SCA of larger magnitude and spread over a larger range than those from CAMS, CT2019B, or OCO-2; however, GC-CT2019 SCA still exhibit a very similar spatial distribution across northern high latitude regions to that from CAMS, CT2019B, and OCO-2. Zones in the Asian Boreal Forest were found to have exceptionally large SCA and early HDD, and both OCO-2 data and model estimates yield a distinct longitudinal gradient of increasing SCA from west to east across the Eurasian continent. In northern high latitude regions, spanning latitudes from 47 ◦ N to 72 ◦ N, longitudinal gradients in both SCA and HDD are at least as pronounced as latitudinal gradients, suggesting a role for global atmospheric transport patterns in deﬁning spatial distributions of X CO 2 seasonality across these regions. GEOS-Chem surface contact tracers show that the largest X CO 2 SCA occurs in areas with the greatest contact with land surfaces, integrated over 15-30 days. The correlation of X CO 2 SCA with these land surface contact tracers are stronger than the correlation of X CO 2 SCA with the SCA of CO 2 ﬂuxes or the total annual CO 2 ﬂux within each 5 ◦ latitude by 20 ◦ longitude zone. This indicates that accumulation of terrestrial CO 2 ﬂux during atmospheric transport is a major driver of regional variations in X CO 2 SCA. EM27/SUN of compar- and TCCON observations with biases less than 0.25 ppm on average. In some cases offsets between EM27/SUN and TCCON observations to be as large as 2 ppm, proven stability of the EM27/SUN allow for a bias correction would yield good agreement between TCCON and EM27/SUN retrievals. The EM27/SUN X CO 2 in a of validate OCO-2 other satellite-based good agreement between observations in the eastern zones of North America result in opposing longitudinal gradients in SCA and HDD across the North American continent, such that OCO-2 observations yield increasing SCA from east to west, while model estimates yield increasing SCA from west to east. In order to assess the relative inﬂuences of the accumulation of CO 2 exchanges during atmospheric transport or the magnitudes of ﬂuxes within 5 ◦ latitude by 20 ◦ longitude zones, we compare GEOS-Chem surface contact tracers with 5 observed spatial distributions of SCA and HDD from OCO-2. GEOS-Chem surface contact tracers revealed that the largest X CO 2 SCA occur in areas with the greatest inﬂuence from land tracers with 15 or 30 day lifetimes. The correlations of X CO 2 SCA with land contact tracers are stronger than the correlations of observed X CO 2 SCA with SCA of CO 2 ﬂuxes or with the total annual CO 2 ﬂux within a given 5 ◦ latitude by 20 ◦ longitude zone. This indicates that accumulation of terrestrial CO 2 ﬂux during atmospheric transport on roughly monthly timescales is a major driver of regional variations in X CO 2 SCA, which is 10 at least as important in shaping observed X CO 2 seasonality as the terrestrial ﬂux magnitudes within zones. However, there is some correlation between the total average annual ﬂuxes used in the GC-CT2019 and X CO 2 SCA, and the Asian Boreal region was still determined to have by far the largest negative ﬂuxes of any of the regions in addition to having the largest X CO 2 SCA and earliest HDD. Three important insights about the drivers inﬂuencing X CO 2 seasonality come out of this analysis. First, a combination of ﬂuxes within zones and the accumulation of CO 2 ﬂux during atmospheric transport affects the observed spatial 15 distributions of X CO 2 seasonal cycle parameters. Second, a robust understanding of atmospheric transport patterns on roughly monthly timescales is essential for accurate interpretation of X CO 2 seasonality for northern high latitudes. Third, seasonality in X CO 2 in northern high latitude regions is almost completely dictated by seasonality in the

There is still some controversy regarding what mechanisms are driving changes in CO 2 SCA and how spatial distributions or temporal trends in CO 2 seasonality are influenced by atmospheric transport patterns or regional changes in carbon exchange.
Recent work by Liu et al. (2020) suggests that global increases in CO 2 SCA since the 1960's are a result of increases in growing season mean temperatures, and polar amplified warming would then explain the increase in the latitudinal gradients in SCA.
Studies by Piao et al. (2017), Forkel et al. (2016), and Graven et al. (2013) used global models to show that increasing latitudinal 5 gradients in SCA are driven by the ecological effects of climate change and changes in vegetation, primarily suggesting CO 2 fertilization as the dominant mechanism. This point is confirmed by findings from Bastos et al. (2019) that attribute enhanced SCA in Boreal Asia and Europe to increases in net biome productivity as a result of CO 2 fertilization. Although they do not address the increase in latitudinal gradients over time, Zeng et al. (2014) and Gray et al. (2014) argue that agricultural expansion in the Northern Hemisphere midlatitudes has resulted in increases in seasonal carbon exchange, which, in turn, result in larger 10 SCA of CO 2 concentrations on a global scale. Barnes et al. (2016) suggest that it is actually the Temperate Forest between 30 • N and 50 • N that is the dominant driver of seasonal carbon exchange on global scales. Yet another study by Yin et al. (2018) found evidence that challenged previous assumptions about the relationship between seasonal cycle amplitude and spring and autumn temperatures in northern high latitudes, emphasizing the need for continued data-driven model validation for these regions. Despite their disagreements, most agree that the seasonality in atmospheric CO 2 at northern high latitudes, 15 and specifically the Boreal Forest, require continued attention as carbon dynamics continue to change. While this paper does not consider temporal changes in SCA, an assessment of spatial distributions of SCA implied by satellite-based observations over northern high latitude terrestrial regions can provide a good foundation for exploring temporal changes in these spatial distributions in later analyses.
Satellite-based infrared spectrometers like the NASA Orbiting Carbon Observatory 2 (OCO-2) (O'Dell et al., 2018;Wunch 20 et al., 2017;Crisp et al., 2017), SCIAMACHY Bovensmann et al., 1999;Burrows et al., 1995), and GOSAT (Basu et al., 2013;Yoshida et al., 2013;Hamazaki et al., 2005) provide global measurements of column-averaged dry air mole fractions of CO 2 (X CO2 ), and particularly can quantify X CO2 in remote, un-instrumented regions. Retrievals and instrument technologies have been advancing rapidly, and boreal-forest-specific methods of X CO2 bias correction and quality control filtering have been developed and validated where ground truth exists (Jacobs et al., 2020b;Kiel et al., 2019;O'Dell et al., 25 2018). In addition, the development of collaborative networks of ground-based solar-viewing spectrometers, including the Total Carbon Column Observing Network (TCCON) and the Collaborative Carbon Column Observing Network (COCCON), has provided a framework for robust global validation of similar passive satellite-based observations (Frey et al., 2019;Wunch et al., 2011). These combined efforts of satellite-based and ground-based total atmospheric column measurements of CO 2 offer a wealth of opportunities for gaining insights into the global climate system as a whole. 30 In this manuscript, we quantify and analyze seasonal cycle parameters derived directly from satellite-based observations of X CO2 , across the northern high latitude terrestrial regions. This work represents progress in the application of global monitoring of atmospheric CO 2 to the continued evaluation of global scale carbon dynamics, and shows how satellites like OCO-2 can be used to monitor CO 2 biospheric exchange. In this analysis, OCO-2 data over terrestrial northern high latitudes is used to explore spatial distributions of seasonal cycle amplitude (SCA) and seasonal cycle phase. Interpretation of these spatial dis-35 tributions can be used to test previous claims and provide new insights into what is driving carbon exchange at northern high latitudes. In particular, we explore how seasonality in X CO2 differs for the North American, European, and Asian Boreal Forest regions, and how the Boreal Forest fits within the broader context of northern high latitude regions. In addition, seasonal cycle parameters derived from OCO-2 observations are combined with those from ground-based TCCON and COCCON observations, then compared with seasonal cycle parameters from three model frameworks: the Copernicus Atmospheric Monitoring 5 Services (CAMS) global inversion-optimized greenhouse gas flux model estimates of X CO2 (Chevallier, 2020b), with in situ data assimilation, CarbonTracker2019 posterior X CO2 estimates (CT2019B; Jacobson et al. (2020a)), and the GEOS-Chem CO 2 simulation (Nassar et al., 2010) with underlying biospheric fluxes from CarbonTracker2019 (GC-CT2019). Ultimately, we use simulations of GEOS-Chem surface contact tracers and flux estimates from CAMS, CarbonTracker2019, as well as estimates of fossil fuel emissions from the Community Emissions Data System (CEDS; Hoesly et al., 2018) and estimates of 10 biomass burning emissions from the Global Fire Emissions Database, Version 4.1 (GFED4.1s; Randerson et al., 2018;van der Werf et al., 2017) to address the question of how much spatial variability in X CO2 seasonal cycle parameters may be attributed to magnitudes of fluxes within the observation zones and how much may be attributed to the regional and continental scale accumulation of CO 2 fluxes during atmospheric transport.

OCO-2 data
The NASA Orbiting Carbon Observatory 2 (OCO-2) launched in 2014 and began collecting data in September of that year.
Daily averages of X CO2 are calculated for each zone using observations from OCO-2 B9 Lite files (OCO-2 Science Team/Michael Gunson, Annmarie Eldering, 2018). Ongoing improvements in the ACOS retrieval algorithm and previous efforts by Jacobs et al. (2020b) to develop quality control thresholds tailored to OCO-2 B9 retrievals over Boreal Forest regions (Boreal QC) have 20 allowed sufficient data over our 5 • latitude by 20 • longitude zones to construct X CO2 time-series that yield robust seasonal cycle parameterization. The Boreal QC was evaluated for use with terrestrial OCO-2 B9 retrievals north of 50 • N (Jacobs et al., 2020b), and the zones considered here cover the majority of land north of 50 • N. The southern boundaries of the southern-most zones of North America are at 47 • N, but the 3 • of latitude is not expected to significantly impact the effectiveness of the Boreal QC filtering. Instead of the standard B9 bias correction, we use a modified bias correction that includes temperature at 700 25 hPa (T700), as discussed by Jacobs et al. (2020b), because it was found in previous results to reduce the seasonality of OCO-2 bias relative to ground-based TCCON and EM27/SUN measurements. Seasonal cycle fits to OCO-2 retrievals of X CO2 with the standard B9 bias correction, as well as fits to OCO-2 B10 retrievals, were also calculated and compared to model-derived seasonal cycle fits in the supplement (see Sect. S2). The spatial distribution of seasonal cycle parameters across northern high latitude regions is similar for all three types of OCO-2 retrievals, but the alternative bias correction yields improved agreement 30 with model-derived seasonal cycle parameters.

TCCON and EM27/SUN data
The Total Carbon Column Observing Network (TCCON) is a ground-based network of sites observing X CO2 using high spectral resolution solar-viewing Fourier transform infrared spectrometers (FTS). Data are included from four TCCON sites: East Trout Lake, Canada in North American Boreal zone 3 ; Sodankylä, Finland in European Boreal zone 6 (Kivi et al., 2014;Kivi and Heikkinen, 2016); Białystok, Poland in European Temperate zone 2 ; Bre-5 men, Germany in European Temperate zone 3 ) (see site details in Table 1 and locations mapped in Fig. 1).
The Collaborative Carbon Column Observing Network (COCCON) is a network of sites observing with the Bruker EM27/SUN FTS (Gisi et al., 2012), which are lower resolution mobile solar-viewing spectrometers that serve as complement to TCCON measurements. EM27/SUN observations have been compared to TCCON observations in multiple studies, most notably Sha et al. (2020), Tu et al. (2020), Frey et al. (2019, Velazco et al. (2018), and. In most of these compar-10 isons EM27/SUN and TCCON observations agree with biases less than 0.25 ppm on average. In some cases offsets between EM27/SUN and TCCON observations are reported to be as large as 2 ppm, but the proven stability of the EM27/SUN should allow for a bias correction that would yield good agreement between TCCON and EM27/SUN retrievals. The EM27/SUN instruments have measured X CO2 in a number of campaigns to validate OCO-2 and other satellite-based observations, including work by Jacobs et al. (2020b), Velazco et al. (2018), and Klappenbach et al. (2015), suggesting good agreement between 15 EM27/SUN observations and satellite-based observations. In this analysis, observations with an EM27/SUN FTS in Fairbanks, Alaska, USA (65.859 • N, 147.850 • W;Jacobs et al. (2020a)) are used as a fifth ground-based comparison in the Boreal Forest.
Fairbanks is an established COCCON site as of 2018, so the instrument participates regularly in performance and calibration checks at the central facility operated by KIT and data processed in compliance with COCCON recommendations are available.
In this study, we use the GGG2014 retrieval algorithm coupled with the EM27/SUN GGG interferogram processing suite (EGI; 20 Hedelius and Wennberg (2017)) instead of the standard COCCON retrieval methods for consistency with TCCON retrievals and because this data product has already been bias corrected to TCCON using side-by-side observations at Caltech, as described in detail by Jacobs et al. (2020b). Seasonal cycle fits for ground-based observations at these five sites use daily averages of retrievals collected within two hours of local solar noon, weighted by retrieval error, as described by Jacobs et al. (2020b).
We refer to these daily averages as near noon ground-based (NNG) observations, and this time frame is chosen because OCO-2 25 overpasses tend to occur within approximately one hour of local solar noon over these northern high latitude regions.

Regions and zones
We define regions in North America, Europe, and Asia, which are further subdivided into 5 • latitude by 20 • longitude zones.
For the purposes of this analysis, the classification of zones as temperate, boreal, or tundra, as well as the longitudinal division between the European and Asian regions are guided by maps of ecoregions from Hayes et al. (2011) and Euskirchen et al. 30 (2007) (see Fig. 1). The Boreal Forest zones considered cover a narrower range of latitudes than the boreal regions defined for the Transcom 3 ecoregions (Gurney et al., 2000), which include all high Arctic tundra and portions of temperate Asia as part of the boreal regions. Otherwise, the North American Boreal region and Eurasian Boreal region defined by Transcom 3 are very Also shown in Fig. 1 are the locations of five ground sites where long-term observations of X CO2 have been collected (see Table 1 for details). These ground sites include two sites in the European Temperate region (Białystok and Bremen), one site in the European Boreal region (Sodankylä), and two sites in the North American Boreal region (East Trout Lake and Fairbanks).
Ground-based data are explained further in Sect. 2.2 and seasonal cycles of ground-based data are compared to satellite and model-derived seasonal cycles in Sect. 3.1.

X CO2 seasonal cycle modeling and parameters
The primary focus of our analysis is characterizing seasonality in X CO2 and exploring how and why this seasonality differs across northern high latitude regions, with particular emphasis on the Boreal Forest. To this end, time-series are constructed using daily averaged X CO2 from satellite retrievals, ground-based solar-viewing FTIR spectrometers, and model estimates (see previous methods sections for details). Seasonal cycles are characterized following methods used by Lindqvist et al. (2015), in 15 which daily X CO2 are fit to a skewed sine wave of the form where t is days, ω = 2π 365.25 , the interannual trend is defined by a 0 + a 1 t, and the seasonal cycle amplitude (SCA) is defined by 2|a 2 |. As a metric for seasonal timing we define half drawdown day (HDD) as the day of year when the detrended seasonal cycle fit, f (t) − a 0 − a 1 t, crosses zero from positive to negative. The fit is calculated using nonlinear least squares optimization 20 with the standard error, σ defined as the mean of daily standard deviations in X CO2 over all days in the time-series. If there is no daily standard deviation, as is the case with single point model estimates near ground sites from the CT2019B and GC-CT2019 models, standard error is assumed to be 0.25 ppm. Specifically, we implemented a least squares fitting algorithm (we use the Levenberg-Marquardt algorithm for improved convergence behavior), which seeks to minimize and yields variance for each parameter in the fit equation given by where J is the Jacobian and W = V −1 xi , the inverse of the covariance matrix. In this case, the variances are scaled by χ 2 , so 5 that V xi is defined as χ 2 σ 2 I. The variance in fit parameters are taken to be a direct estimate of fit uncertainty and translate to uncertainty in SCA, defined as 2σ a2 and depicted in figures as errorbars, where appropriate.
The seasonal cycle fitting approach used by Lindqvist et al. (2015) was found to be more numerically stable than fitting to a truncated Fourier Series, as has been employed in previous studies (Wunch et al., 2013;Thoning et al., 1989), because periods of missing data can produce unrealistic oscillations in a fit to a truncated Fourier series. Even in cases with continuous 10 data, the fit to a truncated Fourier series can yield unrealistic oscillations within the overall seasonal cycle that do not appear to accurately depict variability in the data. These unrealistic oscillations are more pronounced when there are gaps in the timeseries of data. The fits to Eq. 1 also show a degradation of the seasonal cycle shape with larger gaps in the time-series, but yield more stability in fitting time-series with some data gaps than the fits to a truncated Fourier series. This is an important consideration for high latitude regions, which have winter gaps in observations for most passive atmospheric remote sensing 15 measurements, due to lower solar elevation angles or night at satellite overpass time (near solar noon).

CAMS model estimates
Model estimates of X CO2 from the Copernicus Atmospheric Monitoring Services (CAMS) global inversion-optimized greenhouse gas flux model v19r1 are used here as a model comparison to OCO-2 and NNG data. The modeling framework for CAMS CO 2 flux inversions is described in detail by Chevallier (2020b). Quality assessments for the Northern Hemisphere 20 by Chevallier (2020a) report that nearly all biases in both CAMS estimates of in situ CO 2 relative to unassimilated aircraft observations and CAMS estimates of X CO2 relative to TCCON observations are within 1 ppm, with standard deviation in bias around 2 ppm. CAMS estimates of X CO2 are available as 3-hourly estimates with 1.9 • latitude by 3.75 • longitude spatial resolution, which is sufficient for providing multiple grid-points within each zone and coincidence with most ground sites within approximately 100 km (see Table S1 in the supplement for exact coordinates of grid-points nearest to the ground sites). 25 Daily averages and standard deviation in CAMS X CO2 , used to calculate seasonal cycle fits are taken from combined spatial aggregations within zones or coincidence regions and temporal aggregations for each 24 hour date in UTC. We use CAMS model estimates with data assimilation from a global network of surface in situ observations at 119 locations, but without any satellite data assimilation. In addition to the X CO2 estimates, the CAMS model output includes surface flux estimates, which will be considered further in the Discussion, Sect. 4.2. Both CAMS flux estimates and CAMS X CO2 estimates use the same 30 atmospheric transport modeling framework.

CarbonTracker2019 model estimates
The CarbonTracker2019 (CT2019) model is an inverse model that provides estimates of global CO 2 fluxes with a 1 • by 1 • spatial resolution and estimates of global X CO2 fields with a atmospheric transport simulated by the TM5 model (Krol et al., 2005) with a spatial resolution of 2 • latitude by 3 • longitude (Jacobson et al., 2020a, b). The model assimilates in situ measurements of atmospheric CO 2 concentrations from aircraft, AirCore (Karion et al., 2010), tall tower, and surface 5 measurement platforms at 460 sites around the world. For CT2019 fluxes considered in this analysis and used in GEOS-Chem simulations we use the original CT2019 release from May 2020 (Jacobson et al., 2020a), but for the posterior estimates of X CO2 we use results from an updated release, CT2019B (Jacobson et al., 2020b). This was a matter of practical necessity because of the timing of the updated CT2019B release and the fact that X CO2 estimates from the previous CT2019 version were unavailable after this release. The purpose of the release of CT2019B was reported as, "Correction of a minor bug  (1993). In this analysis we calculate seasonal cycle fits using daily CT2019B posterior X CO2 and compare these to corresponding seasonal cycle fits from ground-based and satellite-based observations. The CT2019 estimates of biospheric CO 2 exchange are also used within a GEOS-Chem model framework, described in Sect. 2.7, and considered in an assessment of the role of fluxes in the shaping X CO2 seasonality.

GEOS-Chem CO 2 and Transport Tracer simulations 20
The GEOS-Chem atmospheric transport model version 12.7.2 (more detailed information at www.geoschem.org) has 2 • latitude by 2.5 • longitude spatial resolution, using MERRA-2 meteorology (Gelaro et al., 2017). We use the GEOS-Chem CO 2 simulation (Nassar et al., 2010) and GEOS-Chem surface contact tracers, for 2014-2016, to examine the relationships between seasonal cycle parameters and atmospheric transport patterns and speculate on the role of atmospheric transport in determining spatial distributions of X CO2 seasonality across northern high latitudes. The GEOS-Chem CO 2 simulation provides daily 25 X CO2 estimates and source attributions for total column CO 2 , with biospheric fluxes for land and ocean taken from the CT2019 model (Jacobson et al., 2020a), so we refer to this combination of GEOS-Chem atmospheric transport and CT2019 biospheric exchange as GC-CT2019 throughout this paper. Similar to CT2019, this GC-CT2019 simulation uses GFED4.1 to estimate emissions from biomass burning, but unlike CT2019, it uses the Community Emissions Data System (CEDS; Hoesly et al., 2018) for fossil fuel emissions and results from Bukosa et al. (2021) for the chemical production of CO 2 in the atmosphere. emissions to match the global total in those later years, as reported by ODIAC. We used this approach, rather than using ODIAC alone, because the CEDS inventory includes anthropogenic biofuel emissions that are not in ODIAC. The GC-CT2019 simu-lation is initialized on January 1, 2007 with observed global ocean surface mean CO 2 (Dlugokencky and Tans, NOAA/GML, www.esrl.noaa.gov/gmd/ccgg/trends/, accessed 2020-12-15), giving the model 7 years of spinup before generating the output for 2014-2016 used here. GC-CT2019, like CT2019B posterior X CO2 estimates are obtained as daily values for grid points with daily averages and standard deviation calculated spatially for each 5 • latitude by 20 • longitude zone or 5 • latitude by 10 • longitude satellite coincidence region; there is no temporal averaging performed on these estimates for this analysis. Unlike the 5 CAMS model and CT2019B X CO2 estimates, which provide optimized CO 2 flux and X CO2 estimates using internally consistent atmospheric transport models, GC-CT2019 uses CT2019 fluxes that are estimated using TM5 to simulate atmospheric transport rather than the GEOS-Chem transport model.

Treatment of CO 2 flux estimates
The role of fluxes in determining X CO2 seasonality at northern high latitudes is assessed using flux estimates from CAMS, as well as the sum of CO 2 flux estimates from the CEDS and GFED4.1s inventories (Hoesly et al., 2018;Randerson et al., 2018;van der Werf et al., 2017) and CarbonTracker2019 biospheric fluxes from land and ocean (Jacobson et al., 2020a) used to generate the GC-CT2019 X CO2 estimates. While the CAMS v19r1 and CT2019B model frameworks have internally consistent atmospheric transport because both CO 2 flux and X CO2 estimates are generated using the same atmospheric transport model (Chevallier, 2020b;Jacobson et al., 2020a), GC-CT2019 includes biospheric fluxes from CarbonTracker2019 using the TM5 transport model, but then applies GEOS-Chem atmospheric transport modeling to estimate X CO2 . In addition, the fossil fuel 5 fluxes from CEDS and biomass burning fluxes from GFED4.1s may include other assumptions about atmospheric transport.
First, CO 2 fluxes from each model are averaged spatially within each 5 • latitude by 20 • longitude zone (see Fig. 1

Results
3.1 X CO2 seasonal cycles near ground sites 15 Before attempting to interpret spatial distributions of seasonal cycle parameters on continental scales, it is of value to get a better idea of how seasonal cycle parameters from observations at a single location compare to those from spatially averaged data. To this end, five high latitude sites are considered with long term ground-based observations, as described in Sect. 2.2.
There are three spatial scales considered with seasonal cycle fits to near noon ground-based (NNG) observations and seasonal cycle fits to spatially averaged data over a commonly used satellite coincidence region of 5 • latitude by 10 • longitude centered 20 on the location of the ground site, as well as over the 5 • latitude by 20 • longitude zone in which the ground site is located (reference zones in Fig. 1). In Fig. 2, observed SCA and HDD from NNG and OCO-2 are correlated against model-derived SCA and HDD from CAMS, CT2019B, and GC-CT2019 at each of the three spatial scales, and the corresponding linear regression equations and correlation coefficients are reported in Table 2  All three models tend to yield slightly larger SCA than observations at all sites, as well as later HDD than observations with the exception of CAMS HDD at Bremen and Sodankylä. These results stand in contrast to earlier work by Yang et al. (2007) who found that the Transcom model underestimated SCA of CO 2 mixing ratios relative to aircraft observations at nearly every altitude. Details of the full time-series, plots of seasonal cycle fits, and seasonal cycle fit parameters, with uncertainties, for these ground sites, coincidence regions, and encompassing zones are reported in supplemental materials.
In Fig. 3 and Table 3  Trout Lake, Sodankylä, and Fairbanks. Results in the supplement show that the alternative bias correction improved agreement in both SCA and HDD between NNG seasonal cycles and coincident OCO-2 seasonal cycles. The results in Fig. 3 and Table   3 indicate that HDD correlations across scales tend to be slightly weaker and markedly different depending on whether one 10 considers observed or model-derived seasonal cycles. For the coincidence region and the encompassing zone, OCO-2 data consistently yield later HDD than NNG, while spatially averaged model estimates tend to yield HDD that is in good agreement or slightly earlier than the point nearest to the ground site. SCA for ground sites are well correlated and mostly in close agreement with both SCA for 5 • latitude by 10 • longitude coincidence regions and SCA for encompassing 5 • latitude by 20 • longitude zones, demonstrating that SCA scales well with spatial averaging and is a credible metric to consider in the context  Fig. 2, implies that HDD may display more random variability than SCA, and may therefore be a less useful metric in this context.
3.2 X CO2 seasonal cycles by zone 20 The full set of seasonal cycle fit parameters, corresponding uncertainty estimates, as well as plots of time-series and seasonal cycle fits for each zone and ground site in Fig. 1 are reported in the supplement. While the fits to model estimates are generally similar in shape with fits to observations, there are some zones that yield an unrealistic drop in wintertime values in the OCO- cycle fits when selecting only days with OCO-2 observations available rather than the full continuous time-series of model estimates is presented in the supplement, which indicates that shifts in CAMS SCA due to imposed data gaps are -0.044 ± 0.197 ppm, with shifts in SCA for all zones less than 0.5 ppm, while biases in CAMS SCA for the full time-series relative to OCO-2 SCA are 0.547 ± 0.720 ppm and range from -1.0 to 3.0 ppm. These shifts in CAMS SCA are also small relative to the 10 approximately 5 ppm variability in SCA seen across the northern high latitude regions. In addition, the Lindqvist method of fitting (Lindqvist et al., 2015) still provides a more constrained shape than a fit to a truncated Fourier series, which can yield highly unrealistic oscillations in time-series with only minimal gaps in data coverage. The close similarities between spatial distributions of seasonal cycle parameters from OCO-2, CAMS, CT2019B, and GC-CT2019 are apparent in Fig. 4. with observed SCA, but the slope of this correlation is steeper than for CAMS and CT2019B model estimates.

Spatial distributions of SCA and HDD
Seasonal cycle amplitudes (SCA) and half drawdown day (HDD) are mapped in Fig. 4, showing results from OCO-2 observa- Seasonal cycles of direct observations are expected to display more heterogeneity than seasonal cycles of model estimates, which depend on mathematical modeling of atmospheric transport to calculate X CO2 , even if the underlying fluxes are based 15 on in situ data assimilation. Overall, the spatial distributions in SCA and HDD from OCO-2 agree more with those from CAMS and CT2019B than those from GC-CT2019. GC-CT2019 yields larger magnitudes of SCA for many regions, as well as SCA and HDD spread across a larger range for northern high latitude regions, relative to SCA from OCO-2, CAMS, or CT2019B.
In addition, OCO-2 observations yield notably smaller SCA than CAMS, CT2019B, or GC-CT2019 in the western zones of the Asian Boreal Forest, in the Asian Tundra, and in the eastern zones of North America.

20
Panels (a), (c), (e), and (g) of Fig. 6 show a clear increase in SCA from west to east across the Eurasian continent in both model-derived and observational results. In North America, longitudinal gradients are more subtle. While OCO-2 observations exhibit a slight gradient in SCA across North America that increases east to west, CAMS, CT2019, and GC-CT2019 yield SCA that increase from west to east, in the same direction as gradients across Eurasia. This discrepancy in North America hinges primarily on the zones of Boreal Forest and Tundra in eastern North America, which have the smallest SCA for that 25 continent when using OCO-2 data, but have the largest SCA for that continent when using CAMS, CT2019B, or GC-CT2019 model estimates. As expected, panels (a), (c), (e), and (g) in Fig. 6 show latitudinal gradients with increasing SCA from south to north for both observed and model-derived seasonal cycles. However, the Asian Boreal Forest zones stand apart from other regions in all panels of Fig. 6, particularly when plotted against latitude, with larger SCA than other data at similar latitudes or longitudes. Results in Fig. 7 demonstrate that HDD are far more scattered and do not follow the distinct trends with latitude 30 and longitude that SCA does. HDD spatial gradients seem to be inverted relative to SCA with a vague tendency toward later HDD at more northern latitudes and more western longitudes, and HDD also exhibits similar discrepancies between observed and model-derived longitudinal gradients for North America.   Table 4 show that there is a negative correlation between HDD and SCA for both observed and model-derived seasonal cycle fits, such that earlier HDD corresponds to larger SCA. CAMS model estimates yield correlations between HDD and SCA that are more similar to those from OCO-2 and NNG measurements, Furthermore, the separate linear regressions for the temperate, boreal, and tundra zones plotted in Fig. 8 have much larger R 2 than the linear regressions for all the zones together (see Table 4), suggesting that there are different dynamics in different 10 biomes that affect relationships between extent and timing of apparent CO 2 uptake. The strength of this correlation in observed and CAMS seasonal cycles was highest for tundra, and lowest for temperate zones with the boreal zones falling in between.

Comparing GEOS-Chem surface contact tracers to observed SCA and HDD
GEOS-Chem surface contact tracers were used to simulate the release of tracers from land and ocean yielding relative concentrations of tracers with lifetimes of 5, 15, 30, and 90 days for a given grid point and day. In these results the relative  Fig. 10. Figure 10 shows that the observed SCA are most correlated with land-based surface contact tracers with 15 day and 30 day atmospheric lifetimes, while observed HDD are most correlated with ocean-based surface contact tracers with 15 day and 30 day atmospheric lifetimes. Correlations between HDD and land tracers are weak (see supplement Fig. S45) and seem to follow a curve rather than a line, or may be representative of two different linear relationships for different groups of zones. The correlations with ocean tracers were always inverted relative to those with land tracers, such that reduced contribution from 5 ocean tracers and increased contributions from land tracers consistently correspond with larger SCA and often correspond with earlier HDD.

Discussion
In this analysis, methods described by Lindqvist et al. (2015) were used to fit daily average time-series of daily X CO2 to a skewed sine wave (see Eq. 1) and subsequently calculate seasonal cycle amplitude (SCA) and half drawdown day (HDD), 10 as described in Sect. 2.4. These fitting methods have been found to yield more stable and realistic fits for time-series with winter gaps than fitting to a truncated Fourier Series. Increased OCO-2 throughput O'Dell et al., 2018;Osterman et al., 2018) and use of a bias correction and quality control methods tailored to northern high latitudes (Jacobs et al., 2020b) improve the availability of OCO-2 data at the edges of the growing season and assist in generating stable and realistic seasonal cycle fits. Results in Fig. 3 and Table 3 indicate close agreement between SCA from NNG observations at 15 five ground sites and corresponding SCA from OCO-2 data in the 5 • latitude by 10 • longitude spatial coincidence regions.
Relatively weak correlations in HDD at the five ground sites, both across spatial scales and between observed and modelderived seasonal cycles, are likely to be at least partly attributable to spatial heterogeneity in seasonal cycle timing within 5 • latitude by 20 • longitude zones. It is possible that HDD as a phase metric is not as well constrained by the seasonal cycle fitting methods used here as SCA. Many of the greatest discrepancies in HDD when comparing coincidence regions or zones 20 to ground sites (see Fig. 3 and Table 3) occur in observed seasonal cycles and may arise from disagreement between NNG and OCO-2 observations. Discrepancies between observed and model-derived HDD, which are most apparent for GC-CT2019 X CO2 estimates, could reflect an inaccurate representation of ecosystem respiration in the CASA terrestrial biosphere model, which underlies the CarbonTracker2019 biospheric fluxes used in the GC-CT2019. Byrne et al. (2018) found that differences in the seasonal timing of NEE maximum drawdown were primarily driven by differences in the timing of ecosystem respiration 25 in spring and fall, while the amplitude of NEE was largely influenced by the magnitude of peak gross primary production (GPP). This could explain the higher degree of spatial heterogeneity in seasonal cycle timing because ecosystem respiration is driven by soil temperature, soil moisture, and litter accumulation, which could all be expected to exhibit a higher degree of spatial heterogeneity than ambient temperature and sunlight,. If there is, in truth, more spatial heterogeneity in the timing of seasonal CO 2 uptake, then a failure to accurately represent these spatial distributions in timing could be exacerbated by errors 30 or differences in atmospheric transport modeling and result in larger or more variable discrepancies between observed and model-derived HDD. In this case, the relatively spatially homogeneous values of SCA would be easier to accurately predict in the models than HDD. Alternatively, if SCA is primarily defined by the magnitude of maximum GPP, it may be better constrained in the models because data products like SIF and NDVI can be used to validate model estimates of GPP, while ecosystem respiration does not have a direct data proxy.
OCO-2, CAMS, CT2019B, and GC-CT2019 all yield very similar spatial distributions of SCA and HDD across the northern high latitude regions (see map in Fig. 1), and both had large SCA and early HDD in the Asian Boreal Forest as well as a clear increase in SCA from west to east across the Eurasian continent. We found an inverse linear relationship between SCA 5 and HDD across northern high latitudes for both observed and model-derived seasonal cycle fits (see Fig. 8 and Table 4), and these correlations were stronger when separating by ecological type (Temperate, Boreal, or Tundra). This relationship is an interesting result of this analysis that warrants a more in depth exploration in future work, though we cannot speculate on the cause of this phenomenon in the context of this study. Discrepancies between SCA from GC-CT2019 and SCA from CAMS, CT2019B, and observations are consistent with the assessments of seasonal bias resulting from the GEOS-Chem transport 10 modeling framework with MERRA-2 meteorology, as described by Schuh et al. (2019), but may also be partly explained by the lack of internally consistent atmospheric transport modeling for CO 2 flux and X CO2 estimates in the GC-CT2019 framework. Schuh et al. (2019) found that the GEOS-Chem CO 2 simulation overestimates X CO2 in winter and underestimates X CO2 in summer relative to the TM5 transport model, yielding an exaggerated seasonal oscillation and larger SCA. Byrne et al. (2018) found that CarbonTracker2016, using CASA to constrain biospheric carbon exchange, estimated later NEE drawdown than flux 15 inversions with either GOSAT or TCCON data assimilation, and this appears to be consistent with the later HDD estimated by the GC-CT2019. Despite these discrepancies, we contend that the strong correlations between GC-CT2019 and observational results (see panels (e) and (f) for Fig. 5) suggest that GEOS-Chem simulations remain a useful tool for investigating the broader implications of spatial distributions in SCA and HDD from OCO-2 observations. Two limiting hypotheses for the origin of the spatial patterns in X CO2 SCA shown in Fig. 4 are that they arise from differ-20 ences in flux magnitudes within the 5 • latitude by 20 • longitude zone or that they arise from transport patterns accumulating CO 2 exchanges across multiple zones or regions. To investigate the relative influences of atmospheric transport or fluxes within zones on X CO2 seasonal cycles, we consider source apportionment from the GEOS-Chem CO 2 simulation (Nassar et al., 2010) and GEOS-Chem surface contact tracers, as well as surface CO 2 flux estimates from CarbonTracker2019 and CAMS models.

4.1
The role of atmospheric transport in shaping X CO2 seasonality 25 Fisher et al. (2010) and Wang et al. (2011) demonstrated the ability of GEOS-Chem to represent synoptic transport from mid-latitudes to the Arctic. Schuh et al. (2019) evaluated meridional transport of CO 2 and SF 6 in GEOS-Chem, using the same MERRA-2 meteorology used here, and suggested that the model may underestimate vertical and meridional mixing, which slightly increases CO 2 seasonal cycle amplitude in GEOS-Chem in high northern latitudes compared to some other models. Despite some discrepancies in SCA and HDD from GC-CT2019, relative to observed SCA and HDD, which may 30 arise from differences between the GEOS-Chem and TM5 atmospheric transport models, the GEOS-Chem transport model reproduces surface temperature and pressure patterns and thus is a reasonable representation of atmospheric transport (Wang et al., 2011;Fisher et al., 2010). Therefore, comparing these GEOS-Chem surface contact tracers with observed SCA and HDD should provide useful insights into the influence of atmospheric transport patterns on observed X CO2 seasonality. The higher correlation coefficients obtained when comparing OCO-2 SCA to land and ocean tracers with 15 day and 30 day lifetimes suggest that accumulation of CO 2 flux due to atmospheric transport on roughly monthly timescales plays an important role in affecting X CO2 SCA.

4.2
The role of CO 2 fluxes in X CO2 SCA Figure 11 shows maps of average annual fluxes and flux SCA for GC-CT2019 and CAMS, which show that flux SCA in 5 panels (c) and (d) are distributed without the apparent longitudinal gradient seen in X CO2 SCA in panels (a), (c), (e), and (g) of Fig. 4. The weak correlations (R 2 < 0.2) between flux SCA and X CO2 SCA, shown in panels (a), (c), (e), and (g) of Fig.   12 and quantified in Fig. 13, combined with the relatively strong correlations (R 2 > 0.6) between X CO2 and surface contract tracers with 15 and 30 day lifetimes, suggest that accumulation of CO 2 flux due to atmospheric transport on roughly monthly timescales is more influential in determining X CO2 SCA than fluxes within a 5 • latitude by 20 • longitude zone. However, there 10 are slightly stronger correlations between average annual flux and X CO2 SCA, shown in panels (b), (d), (f), and (h) of Fig. 12 and quantified in Fig. 13, which suggest some possible link between X CO2 SCA and the relative source or sink strength of a given zone. Panels (a) and (b) of Fig. 11 show that both models predict large negative average annual fluxes for Asian Boreal zones, designating the Asian Boreal region as a major sink for CO 2 , and suggesting that anomalously large X CO2 SCA in the Asian Boreal region may be partially influenced by enhancements in CO 2 uptake within that region. In another instance, the 15 European Temperate zone 3 and Bremen TCCON site both have exceptionally large positive average annual CO 2 flux due to a significant contribution from fossil fuel emissions (as shown in Sect. S5), but they did not yield anomalously large flux SCA because the fossil fuel emissions do not exhibit significant seasonal variability. The European Temperate zone 3 and Bremen also yielded smaller X CO2 SCA and later HDD than most of the other zones and ground sites for both observed and modelderived seasonal cycles. In this case, the large fossil fuel emissions may be indirectly influencing the X CO2 SCA despite the 20 fact that these emissions are not directly contributing to the seasonal variability in atmospheric CO 2 in this zone or at this site.

Conclusions
Satellite-based instruments, such as OCO-2, open the possibility to study CO 2 exchange and transport throughout the vast and largely un-instrumented northern high latitudes. Improvements in retrieval and quality control methods for satellite-based observations of atmospheric CO 2 have allowed for a data-driven investigation of X CO2 seasonality over regions, like Siberia, 25 that have previously been largely inaccessible and unobserved. Model-derived X CO2 SCA from CAMS, CT2019B, and GC-CT2019 agree (R 2 > 0.68) with observed SCA patterns from around the northern high latitudes (see Fig. 5). Correlations in Fig.   5 show that CAMS and CT2019B have near unity slopes of model predicted SCA as compared to OCO-2, while GC-CT2019 has a higher than unity slope but still strongly correlated. Our results show that the Asian Boreal Forest region is distinct from other northern high latitude regions with larger seasonal cycle amplitude (SCA) and earlier half drawdown day (HDD) (see 30 Sect. 2.4), and gradients of increasing SCA and earlier HDD span from west to east across the Eurasian continent. Longitudinal gradients in SCA and HDD across the North American continent are more subtle than longitudinal gradients across the Eurasian continent. Discrepancies between observed (OCO-2) and model-derived (CAMS CT2019B, and GC-CT2019) SCA and HDD in the eastern zones of North America result in opposing longitudinal gradients in SCA and HDD across the North American continent, such that OCO-2 observations yield increasing SCA from east to west, while model estimates yield increasing SCA from west to east. In order to assess the relative influences of the accumulation of CO 2 exchanges during atmospheric transport or the magnitudes of fluxes within 5 • latitude by 20 • longitude zones, we compare GEOS-Chem surface contact tracers with 5 observed spatial distributions of SCA and HDD from OCO-2. GEOS-Chem surface contact tracers revealed that the largest X CO2 SCA occur in areas with the greatest influence from land tracers with 15 or 30 day lifetimes. The correlations of X CO2 SCA with land contact tracers are stronger than the correlations of observed X CO2 SCA with SCA of CO 2 fluxes or with the total annual CO 2 flux within a given 5 • latitude by 20 • longitude zone. This indicates that accumulation of terrestrial CO 2 flux during atmospheric transport on roughly monthly timescales is a major driver of regional variations in X CO2 SCA, which is 10 at least as important in shaping observed X CO2 seasonality as the terrestrial flux magnitudes within zones. However, there is some correlation between the total average annual fluxes used in the GC-CT2019 and X CO2 SCA, and the Asian Boreal region was still determined to have by far the largest negative fluxes of any of the regions in addition to having the largest X CO2 SCA and earliest HDD. Three important insights about the drivers influencing X CO2 seasonality come out of this analysis. First, a combination of fluxes within zones and the accumulation of CO 2 flux during atmospheric transport affects the observed spatial 15 distributions of X CO2 seasonal cycle parameters. Second, a robust understanding of atmospheric transport patterns on roughly monthly timescales is essential for accurate interpretation of X CO2 seasonality for northern high latitudes. Third, seasonality in X CO2 in northern high latitude regions is almost completely dictated by seasonality in the exchange of CO 2 with the terrestrial biosphere. In future work, it would be of value to expand this analysis to assess both long-term temporal trends in X CO2 seasonality, as well as interannual anomalies that may result from global weather patterns such as the polar vortex or El Niño.

20
Code availability. TEXT Data availability.
Code and data availability. OCO-2 data and quality control parameters used here are taken from OCO-2 Lite files (version 9, "B9"), and quality filtering and bias corrections are applied following Jacobs et al. (2020b) Table 1).