The historical ozone trends simulated with the SOCOLv4 and their comparison with observations and reanalysis

. There is evidence that the ozone layer has begun to recover owing to the ban on the production of halogen-containing ozone-depleting substances (hODS) accomplished by the Montreal Protocol and its Amendments and Adjustments (MPA). However, recent studies, while reporting an increase in tropospheric ozone from the anthropogenic NO x and CH 4 and conﬁrming the ozone recovery in the upper stratosphere from the effects of hODS, also indicate a continuing decline in the lower tropical and 5 mid-latitudinal stratospheric ozone. While these are indications derived from observations, they are not reproduced by current global chemistry-climate models (CCMs), which show positive or near-zero trends for ozone in the lower stratosphere. This makes it difﬁcult to robustly establish ozone evolution and has sparked debate about the ability of contemporary CCMs to simulate future ozone trends. We applied the new Earth system model SOCOLv4 to calculate long-term ozone trends between 1985 and 2018 and compare them with trends derived from BASIC ozone composite and MERRA-2, ERA-5, and MSRv2 10 reanalyzes. We designed the model experiment with a six-member ensemble to account for the uncertainty

There is evidence that the restrictions on halogenated ozone-depleting substances (hODS) introduced by Montreal Protocol and its Amendments and Adjustments (MPA) have clearly taken effect over the past 25 years, leading to observable ozone 25 increases at certain latitudes and altitudes (WMO, 2014(WMO, , 2018McKenzie et al., 2019). The positive role of MPA has also been confirmed by model simulations of ozone depletion under the "world avoided" scenario Garcia et al., 2012;Egorova et al., 2013;Goyal et al., 2019). Using the future simulation of multiple CCMs, Dhomse et al. (2018) estimated the approximate time of the return of ozone concentrations to pre-1980 levels for different regions, again highlighting the success of MPA in protecting ozone. 30 In the mid-to-late 1990s, hODS mixing ratios reached their maximum and thereafter started dropping again by virtue of the MPA. The annual mean total column ozone in the mid-1990s is quantified to be about 5% below its pre-1960 level globally, and about 17% lower over Antarctica (WMO, 2018). Since the turn of the century, total ozone has been expected to increase, but given the long lifetimes of some hODS and the simultaneously changing climate, it is not easy to project how long it will 35 take for ozone to return to the natural pre-1960 level. The near-global total column ozone (stratospheric plus tropospheric) remained stable since 2000, showing none or a slightly positive trend, albeit statistically not significant (WMO, 2014(WMO, , 2018Ball et al., 2018). Nonetheless, ozone concentrations are expected to exceed the pre-ozone hole level in the mid-and high-latitudes because of the increasing greenhouse gas concentrations (WMO, 2010). 40 Although the evolution of the total column ozone is well studied, the evolution of ozone in different atmospheric layers has its own characteristics. For instance, in the troposphere, the ozone concentration has been continuously increasing (Mickley et al., 2001) due to the continuous enhancement in tropospheric ozone precursors, mainly NO x and CO (Griffiths et al., 2021).
Hence, a positive trend in tropospheric ozone is expected (Ziemke et al., 2019). At the same time, simulating the evolution of tropospheric ozone remains challenging because its changes depend on complex, interdependent interactions among precur-45 sor emissions, atmospheric transport, photochemical production, deposition, and stratosphere-troposphere exchange. Recent results indicate that models cannot properly capture the tropospheric ozone trends and that tropospheric ozone variability diverges widely among models (Parrish et al., 2014;Revell et al., 2018;Zhang and Cui, 2022).
In the upper stratosphere (above 10 hPa), the ozone abundance declined by about 4-8% between 1980 to the late 1990s due 50 to a continuous increase in stratospheric chlorine loading (e.g., Steinbrecht et al., 2017;WMO, 2018), but since the late 1990s, it has been steadily increasing (Ball et al., 2018;WMO, 2018). The positive trends in ozone in the upper stratosphere are statistically robust according to various analyses of satellite observations and model simulations (Harris et al., 2015;Steinbrecht et al., 2017;Ball et al., 2018;Godin-Beekmann et al., 2022).The main driver of the upper stratospheric ozone increase is the reduction of halogen loading. In addition, the temperature decrease caused by increases in greenhouse gasses (GHGs), 55 primarily CO 2 , slows down the temperature-dependent catalytic ozone loss cycles (Hitchman and Brasseur, 1988;Zubov et al., 2013;WMO, 2018). Via the reaction CH 4 + Cl → CH 3 + HCl, the increase in CH 4 further promotes the enhancement of the ozone layer in the upper stratosphere (Hitchman and Brasseur, 1988). It should be noted that not all greenhouse gasses (like CO 2 or CH 4 ) contribute to increases in ozone concentrations and, for instance, N 2 O is expected to be the most important anthropogenic compound that will delay the return of ozone to pre-1960 level (Chipperfield, 2009). 60 Lower stratospheric ozone (LSO) has increased more slowly than expected, if at all, despite its recovery from the effects of hODS (Ball et al., 2018. Ball et al. (2018) based their analysis on dynamical linear regression modeling (DLM), a class of non-stationary time series models, which are the next generation of multiple linear regression (MLR) that considers the level of trend non-linearity, thereby increasing the accuracy of the estimate. Ball et al. (2018) applied DLM to the homogenized BAyeSian Integrated and Consolidated (BASIC) composite ozone time series, which was derived from satellite observations (Ball et al., 2017). Using partial ozone columns, a statistically robust negative tendency was found for the layer between 146 and 32 hPa on the near-global scale (55°N-55°S). Ball et al. (2018) reasoned that the detection of this negative trend in LSO was previously prevented by not properly excluding the contribution from increasing tropospheric ozone, which partially compensates for the negative LSO changes in TCO analyses (Ziemke et al., 2019). The results of Ball et al. (2018) 70 have sparked intense debate in the scientific community regarding the nature of the decline in extratropical LSO, and also expressed doubts about its existence. Chipperfield et al. (2018) found a strong increase in LSO in 2017 by about 60% in the Southern Hemisphere, prompting them to suggest that large natural inter-annual variability might be behind the apparent negative tendencies in LSO. However, in a subsequent study, Ball et al. (2019) have analyzed the extended ozone time series, showing that the short-term LSO increase in 2017 could be traced back to changes in the quasi-biennial oscillation (QBO), but 75 that the negative trend in LSO continued to be observed after 2017. This indicates that the LSO is still poorly understood in terms of its long-term variability.
Besides a change in the relative strengths of the lower and upper branches of the meridional Brewer-Dobson circulation (BDC, Butchart et al., 2006) resulted in ozone changes (Keeble et al., 2017;Li et al., 2009;Oman et al., 2009) there might 80 be additional reasons for the decline in LSO, including: the recent reduction in solar activity (Arsenovic et al., 2018); the influence of halogen-containing very short-lived species and other gases unaccounted for by the MPA (Hossaini et al., 2015;Oman et al., 2016;Oram et al., 2017); increased emissions of inorganic iodine (Cuevas et al., 2018;Karagodin-Doyennel et al., 2021); increased aerosol loading (Andersson et al., 2015); unexpected increase in emissions of CFC-11 violating the MPA (Fleming et al., 2020); altitude changes in the extratropical tropopause (Bognar et al., 2022). 85 Recently, the ozone simulated by 31 CCMs participating in the Chemistry-Climate Model Initiative Phase-1 (CCMI-1) has been analyzed, and it was shown that the pattern of the signal in LSO as a function of latitude and pressure levels varies greatly even between different realizations of the experiment performed with the same CCM (Dietmüller et al., 2021). The question why models cannot reproduce the observed persistent and statistically robust declining trends in near-global LSO remained 90 open. Stone et al. (2018) investigated the impact of changes in atmospheric dynamics on the stratospheric ozone trends. To this end, they performed linear regression analysis on a nine-member ensemble of free-running and nudged simulations with the Whole Atmosphere Community Climate Model, version 4 (WACCM4). They have concluded that the large dynamical vari-95 ability in the extratropical lower stratosphere prevents the detection of a stable ozone trend. These results suggest that for the validation of the past long-term ozone trends in the dynamically controlled regions, it is essential to have many ensemble members and analyze these experiments separately to establish whether there is any member reproducing observed trends better. In contrast, Dietmüller et al. (2021) suspect that models have difficulties reproducing the observed LSO trend because of extreme natural variability at the beginning or end of the observed period , or because natural variability in QBO and BDC 100 is not adequately simulated. In addition, in models, insufficient treatment of diffusion and transport processes (Dietmüller et al., 2017(Dietmüller et al., , 2018 may decrease the accuracy of LSO trends simulations. Thus, small inadequacies in atmospheric dynamics might be the reason why models do not demonstrate robust LSO negative tendencies, amplified by the strong internal ozone variability (Shangguan et al., 2019). However, even when using models with assimilated dynamics (i.e., specified dynamics or nudged simulations), the negative tendencies in LSO are still not revealed in model simulations (Ball et al., 2018). At this 105 point, it should be mentioned that reanalysis data used for nudging in CCMs can introduce noise, e.g., in the vertical fluxes, which may impact model performance, as shown by the CCMI-1 analysis (Chrysanthou et al., 2019). Thus, the inter-model spread of nudged simulations in the stratosphere can result in a similar or even larger spread than in free-running simulations.
Therefore, it is still unclear whether LSO will reach the natural 1960-level in the future. An intensification of air parcel rising 110 associated with an acceleration of the BDC (Butchart et al., 2006) that causes less time available for photochemical ozone production, was suggested as the primary mechanism for the declining ozone trend in tropical LSO (Avallone and Prather, 1996).
In addition, the enhanced meridional transport to the mid-latitudes contributes to this decline in LSO (Wargan et al., 2018;Orbe et al., 2020). A warming of the tropical upper troposphere increases the tropical-to-extratropical temperature gradient, which in turn pushes the subtropical jet upward, lifting the tropopause and accelerating the meridional transport, especially via data sets. Section 2 introduces the SOCOLv4 model, the ensemble reference experiment setup, as well as data sets used in our study. Section 3 outlines the dynamic linear approach employed in this study to derive the trends. Section 4 provides the results, starting with the DLM analysis of trends in reactive species and temperature from SOCOLv4 and continuing with the comparison of ozone trends from SOCOLv4 to those from the BASIC observational composite and from the reanalysis data.
A discussion of the results and general conclusions are presented in Section 5.  (Rozanov et al., 1999;Egorova et al., 2003) and the size-resolving sulfate aerosol microphysical module AER (Weisenstein et al., 1997;Sheng et al., 2015;Feinberg et al., 2019). MPI-ESM1. tral dynamical core and contains modules to compute radiation, convection, cloud processes, and atmospheric transport. The 140 transport scheme is based on the flux-form semi-Lagrangian scheme (Lin and Rood, 1996). The chemical module MEZON includes roughly 100 chemical species linked by 216 gas-phase reactions, 72 photolysis reactions, and 16 heterogeneous reactions in/on aqueous sulfuric acid aerosols and polar stratospheric clouds (STS, NAT, ICE types) using the implicit iterative Newton-Raphson scheme (Ozolin, 1992;Stott and Harwood, 1993). The photolysis rates are calculated using an online lookup-table approach (Rozanov et al., 1999), including the effects of the solar irradiance for the spectral interval 120-700 nm. Conventionally, the SOCOLv4 is formulated on the horizontal spectral resolution grid with T63 triangular truncation (96 x 192 or 1.9°x 1.9°grid spacing) and 47 vertical levels from the surface to 0.01 hPa in a sigma-pressure coordinate system. The In this study, we analyze long-term ozone time series from the SOCOLv4 ensemble reference experiment carried out using standard conditions. The runs were initiated from the MPI-ESM1.2 restart files for the year 1970, while chemistry was initiated from the SOCOLv3 runs (Revell et al., 2016). This experiment was performed for the period 1949-2018. In 1980, the 155 experiment was branched into six ensemble members which are initialized with slightly varying initial conditions namely a first-month small (about 0.1%) perturbation in the CO 2 concentration in order to have different realizations of further ozone evolution thereby describing the internal model variability.
The boundary conditions and forcing data for the SOCOLv4 reference experiment are based on CMIP6 recommendations 160  and given by the input data sets from the Model Intercomparison Projects (input4MIPs) database 1 . All climate forcings are historical before 2015 and for the years 2015 to 2018, they are switched to the SSP2-4.5 scenario (O'Neill et al., 2016). A detailed description of all modules incorporated into SOCOLv4 as well as more details on the conducted reference experiment can be found in the SOCOLv4 validation paper (Sukhodolov et al., 2021). In this study, the SOCOLv4 runs are in a free-running mode except for QBO, which is nudged to observations. 165 Here, we analyze the historical period from 1985 to 2018 which is well-covered by observations. In our trend analysis, we split this period into two parts: 1985-1997 is the ozone depletion phase, and 1998-2018 is the ozone recovery phase. Also, in our study, to compare the obtained stratospheric ozone trends in SOCOLv4, we use the BASIC ozone composite. The BASIC composite is the ozone time-series dataset built from a Bayesian joint self-calibration analysis of several composite ozone 170 datasets 2 . A detailed description of how the BASIC ozone composite was produced can be found in Ball et al. (2017). The BASIC was specifically designed for the trend analysis, hence it is a good candidate to compare the modeled ozone trends.
For a number of atmospheric layers between the ground and the mesosphere we compare the SOCOLv4 simulations with

The description of the DLM approach
To derive ozone changes from the model and other datasets, we applied the regression analysis using DLM with the state-space approach described in Laine et al. (2014); Ball et al. (2018); Alsing and Ball (2019), and a tutorial on the method can be found 180 here 6 . DLM is an advanced stochastic model for diagnosing long-term changes in the prognostic variables imposed by wellknown processes driven by explanatory/proxy variables and unaccounted processes. The DLM consists of the slowly varying background level, seasonal components, reaction to the external forcing of well-known processes modeled by explanatory variables, and stochastic noise allowing for residuals in a first-order autoregressive (AR1) process. Compared to simpler statistical multivariate analysis (such as multiple linear regression (MLR)), the application of DLM allows more accurate conclusions 185 1 More information on the input4MIPs database can be found here: https://esgf-node.llnl.gov/search/input4mips/ 2 The BASIC composite dataset can be downloaded here: https://data.mendeley.com/datasets/2mgx2xzzpk/3 3 The MSRv2 reanalysis dataset can be found here:https://www.temis.nl/protocols/O3global.php 4 The MERRA-2 reanalysis dataset can be found here: https://disc.gsfc.nasa.gov/datasets?project=MERRA-2 5 The ERA-5 reanalysis dataset can be found here: https://cds.climate.copernicus.eu/!/search?text=ERA5type=dataset 6 Dynamic linear model tutorial: https://mjlaine.github.io/dlm/dlmtut.html, about the variability due to considering the degree of trend non-linearity.
In this work, the used proxies (see Figure A1 in the Appendix) of atmospheric variability are chosen following Ball et al. (2018). We use the solar forcing represented by 30 cm solar radio flux (F30, sfu) (Dudok de Wit et al., 2014), and equatorial zonal winds at 30 and 50 hPa (in m/s) which are the two principal components of the Quasi-biennial oscillation (QBO) 7 variabil-190 ity, a latitude-dependent stratospheric aerosol optical depth (SAOD, dimensionless) to represent volcanic activity (Thomason et al., 2018); the El Niño-Southern Oscillation variability represented by ENSO's 3.4 index (ENSO, degree K) 8 , and the Arctic (AO) and Antarctic (AAO) Oscillation indexes (hPa) 9 are used to explain the variability in total/partial ozone.
The above regressors (except solar forcing) are used only to analyze the ozone changes in the BASIC composite and re-195 analysis data sets. In SOCOLv4, for every ensemble member, the AO, AAO, ENSO, and SAOD proxy variables are calculated directly from the simulated meteorological fields of geopotential height (at 1000 and 700 mb), sea surface temperature (SST), and aerosol extinction at 300-500 nm band. To analyze the model data, we took the QBO proxies at 25 hPa instead of at 30 hPa, because the low model resolution caused a high correlation between QBO at 30 and 50 hPa, which is inappropriate for regression analysis. The solar forcing used to derive ozone trends from SOCOLv4 is the same as for observations (Solar F30 200 index).
The independence of regressors is a crucial point of such an analysis. To this end, we performed a correlation test for each proxy from the six ensemble members of the experiment and 2 periods of analysis. Figure 1 illustrates the results of proxies correlation analysis.  (a) for ozone depletion phase [1985][1986][1987][1988][1989][1990][1991][1992][1993][1994][1995][1996][1997]; (b) for ozone recovery phase . Error bars represent the 2-σ standard deviation of correlation coefficients between ensemble members.
The correlation test indicates generally low mutual dependencies, except for the correlation between solar radio flux and SAOD for the ozone depletion phase. The correlation between solar fluxes and volcanic activity has been investigated before (Kuchar et al., 2017). It is related to strong volcanic eruptions like Pinatubo and El Chichón that affected the atmosphere during the 1985-1997 period, which happened to synchronize with the solar cycle. The remaining proxies show the average correlation coefficients between -0.2 and 0.2, which is considered to be a "weak" mutual dependency.

210
It is important to note that in this study, all DLM calculations using model data as well as ozone composite and reanalyzes were performed for the entire 1985-2018 period. The evolution of the predicted variable is characterized in DLM by the socalled "trend-term" or background level. In simple terms, it is the evolution of a variable filtered from the known part of its variability, induced by external forcings, which are represented in DLM by proxies. The state of the predicted variable for the 215 one-step-ahead is predicted by the Kalman filter, and the Kalman smoother is used for the marginal probability distribution of the state. The Markov Chain Monte Carlo (MCMC) method  is used to infer the posterior distributions of the background level. Then, 200 samples of the background level were drawn from the DLM states, which describe the posterior distribution uncertainty. It was done to determine the standard deviation of background level between these samples used to calculate the statistical significance of the results. Afterwards, the ozone trends per decade are calculated separately for 220 the phases of the ozone evolution ([between 1985 and 1997] and [between 1998 and 2018]) from the mean background level by applying the conventional linear regression (αx + β). In linear regression, α means a slope term, and, hence, a trend per decade at each grid point (latitudes x heights) are calculated as: α*length of month-to-month time series/number of decades. Finally, the statistical significance is determined for each ensemble member using the mean background level and its standard deviation between MCMC samples by the Student's t-test. The ensemble mean trends of the SOCOLv4 experiment are calculated as 225 an average of the trends from all individual ensemble members. The statistical significance for the ensemble mean trends is calculated using a standard deviation of trends between individual ensemble members. Trends in BASIC ozone composite and reanalysis data sets were calculated using the same methodology but applying observed proxy variables in DLM, the same as presented in Ball et al. (2018) as it was mentioned above.

230
4.1 Simulated long-term trends in O 3 , ClO x , BrO x , NO x , HO x , and temperature for the ozone depletion and recovery phases The explanation of simulated ozone trends for different atmospheric layers cannot be properly done without demonstrating the trends trends in reactive species involved in the ozone destruction cycles. In addition, the temperature is accounted for, because it controls the rate of chemical reactions for the formation of radicals and is an indicator of ozone change, since ozone is a 235 radiatively active gas. Decadal trends in ozone as well as radicals and temperature from the SOCOLv4 ensemble mean results in both the ozone depletion and ozone recovery phases are presented in Figure 2.   (Chameides et al., 1977;Sauvage et al., 2007). The NO x due to aircraft emissions may also contribute to the positive trend in upper tropospheric NO x (Köhler et al., 1997). Also, some tropospheric NO x decrease related to the improved air quality is visible over the Northern Hemisphere. In addition, more intensive convective mixing can also intensify the upward transport of ozone precursors. In the mesosphere, NO x strongly decreases in both periods. This is not 245 related to solar activity, DLM excludes its contribution from evolution. It can be explained by the suppressed NO x production due to the CO 2 -related cooling in the mesosphere, which accelerates cannibalistic NO + N → N 2 + O reaction. The same can be said about the negative trends in HO x in the mesosphere, which are more pronounced and statistically significant in the region where the temperature decreases. In the upper stratosphere, HO x trends follow ozone behavior because the HO x production from water vapor strongly depends on the O( 1 D) produced from the ozone. In addition, the decline in HO x is observed in 250 the upper tropical troposphere, which may be related to some dehydration processes (Ueyama et al., 2014). and ERA-5 are more reasonable but still hardly resemble those from BASIC and SOCOLv4.
due to an additional heterogeneous reactive uptake of N 2 O 5 onto sulfuric acid particles, leading to a suppression of the NO x catalytic ozone cycle (Prather, 1992;Solomon, 1999;Rozanov et al., 2002).BASIC does not show a similar increase in ozone, which might be because the effect of the Pinatubo eruption was filtered from the data due to some reported problems in satellite ozone retrieval during strong stratospheric aerosol loading (Davis et al., 2016;Ball et al., 2017). Unlike SOCOLv4, MERRA-2 and ERA-5 show a continuous decrease in middle stratospheric ozone until 1998 in MERRA-2 and around 2003 in ERA-5.

280
In ERA-5, the start of ozone recovery due to the hODS decrease is biased relative to the BASIC and SOCOLv4 towards the beginning of the 2000s.
In the lower stratosphere, the ozone decline is seen in all presented data sets for the ozone depletion phase. The ozone minimum in this region has been visible for a few years after 1991 because of the enhancement of chlorine activation after  Overall, the total column ozone evolution modeled with SOCOLv4 is within the range of evolution from other data sets, for which the total column ozone data are available. The total column ozone evolution simulated with SOCOLv4 agrees better with MSRv2. For ERA-5, we see a much stronger increase in total column ozone than in the other data presented, resulting from the abnormal ozone evolution in the stratosphere. MERRA-2 shows the marginal decline of total column ozone compared to SOCOLv4 and other reanalyses, which may also be due to some flaws in this data, primarily for the stratosphere.

305
Nevertheless, the presented time series display a general increase of the extra-polar total column ozone. Yet, based on the obtained results, the important message is that the ERA-5 and MERRA-2 reanalyses still do not fit well for ozone trend analysis. We cannot demonstrate the source of this discrepancy, but it could be related to some issues with their underlying models, specific observational data processing, or assimilation approaches. We can be confident in saying that this is not related to DLM, as the evolution of ozone from these reanalyses prior to the application of DLM (dashed lines in Figure 3) also exhibits 310 this anomalous behavior. Yet, the MSRv2 reanalysis behavior is much smoother and more understandable, which agrees rather well with the SOCOLv4 simulations.

Simulated and observed long-term ozone trends for the ozone depletion and recovery phases
In this study, we first applied the regression model to the whole period , and then ozone changes for two phases of ozone evolution were computed from the DLM results. Figure (Griffiths et al., 2021). The chemical reactions that involve these precursors lead to ozone formation in the troposphere. We would like to emphasize that the positive trend of tropospheric ozone is getting stronger in the tropics [20°N-20°S], showing an increase of more than 4%/decade that is attributed to increasing NO x (see Figure 2).

325
In high latitudes of both hemispheres, strong negative changes in ozone are observed in the lower stratosphere. Even if the Antarctic ozone hole is not a continuous event and occurs only during austral springtime, the pronounced and statistically significant negative changes of more than 6%/decade are seen in the Southern Hemisphere, indicating the expansion of the ozone towards lower latitudes. The decline in ozone in the lower stratosphere of the Southern Hemisphere appears in SOCOLv4 even 330 in the mid-latitudes [55°S-40°S]. As shown in Sukhodolov et al. (2021), the expansion of the ozone hole in SOCOLv4 is larger than in observations. It is related to overestimated isolation of the southern polar vortex area during winter time (Sukhodolov et al., 2021). In the Northern Hemisphere, the negative lower stratospheric ozone changes also appeared. It should be noted that the strong ozone hole events over the Arctic are rather irregular, and usually, the Arctic ozone depletion is weaker than the one over the Antarctic (WMO, 2018). We obtained prominent, statistically significant negative ozone changes of about 335 4-5%/decade over mid-to-high latitudes in the upper stratosphere of both hemispheres. This effect is observed because during this period the continuous increase in halogen content strongly influenced the loss of ozone in these regions, and the catalytic ozone destruction cycle involving chlorine is more effective in the upper stratosphere due to the presence of a sufficient number of oxygen atoms (Chipperfield et al., 2018). Negative ozone trends from BASIC revealed in the upper stratosphere and over tropical lower stratosphere exceed those from SOCOLv4, showing depletion in ozone even more than 6%/decade.

340
As it was mentioned above, the notable feature can be seen in the mesosphere, where SOCOLv4 demonstrates positive changes in ozone in the upper part of the mesosphere. It might be resulting from the NO x and HO x decline as was discussed above (see Figure 2).

345
The profile of decadal ozone trends between 1998 and 2018 from all individual ensemble members and ensemble mean of SOCOLv4 reference experiment and BASIC observational composite are depicted in Figure 5. For the recovery phase, we are particularly interested in analyzing and comparing the trend patterns from individual ensemble members to reveal how slightly different evolutions of the system, resulting in changes in natural unforced variability, affect the trend patterns. For the ozone recovery phase, the tropospheric positive ozone changes with a similar magnitude remained stable, as the 350 production of ozone precursors continued to increase for this period too (see Figure 2). The tropical free-tropospheric ozone shows an increase of ∼4-5%/decade. In the lower stratosphere, in defiance of recovery, the ozone in high latitudes of both hemispheres is not showing the increase in all presented ensemble members of the experiment and even demonstrates signs of negative changes. In three out of six ensemble members, there are regions with a manifestation of negative signal in ozone either in the northern or in the southern high latitudes, which might be artifacts as they are not significant. This may be related 355 to an underestimation of ozone production or abnormal dynamical perturbations in these ensemble members during the recovery phase. The modeled extra-polar LSO changes are controversial. Yet, some slightly negative trends of ∼1-1.5%/decade are observed in all ensemble members. It should be stated that positive statistically significant trends in the extratropical lower stratosphere are also not observed and the ozone trends in this region diverge significantly between ensemble members as a result of the natural unforced variability. Despite the statistical significance being less than 90% for a major part of the lower stratosphere and the magnitude being lower, the simulated trend patterns in 4 and 5 ensemble members largely correspond to the trend distribution revealed in the observations.
The middle stratospheric ozone from both SOCOLv4 and BASIC demonstrates a near-zero state, showing some signs of positive and negative changes but in general, the trend is not so different from zero.

365
In the upper stratosphere, the ozone increase of about 3-4%/decade in mid-to-high latitudes is visible and statistically robust and observed in all presented realizations of the SOCOLv4 reference experiment and BASIC composite as expected because of hODSs limitation by Montreal Protocol, mainly ClO x as well as temperature decrease (see Figure 2). The simulated ozone changes in the upper stratosphere are slightly overestimating the BASIC by ∼1% and have a more pronounced tendency 370 to increase in a poleward direction. Note that the simulated ozone trends in the upper stratosphere do not vary much between ensemble members, showing generally similar distributions and magnitudes of trends. The mesospheric ozone shows an upward trend with a higher significance over the mid-latitudes of the Southern Hemisphere in most realizations, where a pronounced decline in NO x and HO x is observed (see Figure 2). We would like to stress again that the possible ozone increase in the mesosphere is worth considering, since it can slightly contribute to the total column ozone increase and compensate for some 375 ozone decline in lower atmospheric levels to some extent.

Discussion and summary
In this paper, we examine long-term ozone trends computed with the SOCOLv4 in an ensemble with six members of a reference experiment and compare these realizations to data from the BASIC ozone composite and from a number of available reanalyses. The evolution of long-term ozone changes is quantified using dynamic linear modeling. In general, the ozone trends 380 integrated across atmospheric layers (i.e., troposphere; lower, middle, and upper stratosphere; mesosphere) are well captured by SOCOLv4, although there are also some discernible deviations from the available observations and reanalyses. The model shows a continuous ozone increase during the entire period in the troposphere and for the recovery phase in the mesosphere, upper, and middle stratosphere, which is important to consider because they contribute positively to the overall enhancement of the total column ozone.

385
Although ozone trends in SOCOLv4 are well simulated in most atmospheric layers, our study infers that the modeled ozone trends in the extra-polar lower stratosphere show significant variability in these trends across ensemble members. Indeed, detecting clear and robust ozone trends outside the polar regions in model simulations is difficult because of the high dynamical variability perturbing the individual ensemble members . This justifies analyzing each ensemble member 390 separately to study extra-polar LSO variability, as we do in the present work. We show that the model cannot fully reproduce the observed trends in LSO from 1998 to 2018. It may just be that our particular ensemble members do not follow the same dynamical trajectory that happened in reality. Therefore, the use of ensemble mean is not going to help if what happened in Earth's atmosphere 1998-2018 history is a statistical outlier. A better way to proceed is to conduct a larger ensemble of simulations and identify those simulations which do look a lot like what happened in reality, even if there are relatively very few of 395 them. Then diagnose that subset for the underlying causes of the shown LSO trends, as was done by Smith et al. (2020). Our analysis of six individual realizations for the ozone recovery phase revealed that the LSO trend patterns in ensemble members 4 and 5 show a better resemblance with observations than other members of the ensemble, suggesting that the natural variability in simulations may coincide, by chance, with the dynamical storyline that we have in our reality i.e. in observations. Increasing the number of ensemble members to 100 or 1000 will increase the probability to get this pattern in the model atmosphere, 400 which would allow obtaining ozone trends as closely as possible to the observed ones. Such complication comes from the fact that the considered period is rather short and therefore such small, not properly resolved dynamical fluctuations cause a very strong influence on trends. We cannot perform 1000 simulations, due to obvious computational limitations, however, the obtained range of results for this region in our and Stone et al. (2018) work already allows us to state that internal variability is a strong player in this game and its specific combination can produce various "technically" statistically significant results. found to be more pronounced in DLM than in MLR, and significant in the tropics (>95% confidence), but not necessarily at 420 mid-latitudes (>80% confidence). They further highlight that in tropopause relative coordinates, most of the negative trends in the tropics lose significance, highlighting the impacts of a warming tropopause and increasing tropopause altitudes. We should emphasize the need to find out what the ozone changes in the extra-polar lower stratosphere are related to.
They may be to a certain degree induced by uncertainties in emissions, in particular on greenhouse gases, ozone-depleting 425 substances, anthropogenic very short-lived substances (Alejandro Barrera et al., 2020) including iodine-containing species (Karagodin-Doyennel et al., 2021), and volcanic aerosol loading. Other factors might concern structural uncertainties of the models, such as their horizontal and vertical resolution, in the upper troposphere and lower stratosphere as well as insufficient treatment of diffusion and transport processes (Dietmüller et al., 2017(Dietmüller et al., , 2018.
Regardless of the lower stratospheric ozone issues in the model, the modeled total column ozone shows a continuous in-430 crease during the ozone recovery phase that is in good agreement with the MSRv2 reanalysis. Yet, we should stress that other reanalyses used in our study, such as the MERRA-2 and ERA-5, are less suitable for ozone trend analysis because they have artifacts from the unknown origin that interfere with a correct trend estimate. It is beyond the scope of this study to explain the biases in the different reanalysis data sets. They could stem from some issues with their underlying models, from observational uncertainties, specific data retrievals, or be dependent on the assimilation technique.

435
In summary, the trends identified by SOCOLv4 are largely consistent with previous findings and confirm the general understanding of ozone recovery, including the effects of climate change on the ozone layer. The model results further confirm that there are marginally significant negative ozone changes in parts of the low latitude lower stratosphere. This result agrees in general with the negative trends extracted from satellite data composite, however, the simulated magnitude and significance are 440 lower than in observations. Thus, further efforts should be directed to carrying out model experiments with a higher number of ensemble members to address the higher part of the system uncertainty, which have to increase the chances to reproduce the observed pattern of atmospheric variability in the model. It is also necessary to continue work on reducing the uncertainty of the boundary conditions. Nevertheless, the presented results indicate a good level of agreement between the model and observations, allowing the production of reliable estimates of future ozone evolution using contemporary Earth system models. (b): ozone recovery phase . The dashed line indicates the region with significance at the 90% level for positive or negative changes, the solid line is the same at the 95% level.