A semi-empirical potential energy surface and line list for H2O extending into the near-ultraviolet

Accurate reference spectroscopic information for the water molecule from the microwave to the near-ultraviolet is of paramount importance in atmospheric research. A semi-empirical potential energy surface for the ground electronic state of H2O has been created by refining to almost 4 000 experimentally determined energy levels. These states extend into regions with large values of rotational and vibrational excitation. For all states considered in our refinement procedure, which extend to 37 000 cm−1 and J = 20, the average root mean squared deviation is approximately 0.05 cm−1. This potential energy surface 5 offers significant improvements when compared to recent models by accurately predicting states possessing high values of J . This feature will offer significant improvements in calculated line positions for high temperature spectra where transitions between high J states become more prominent. Combining this potential with the latest dipole moment surface for water vapor, a line list has been calculated which extends reliably to 37 000 cm−1. Obtaining reliable results in the ultraviolet is of special importance as it is a challenging spectral region 10 for the water molecule both experimentally and theoretically. Comparisons are made against several experimental sources of cross sections in the near-ultraviolet and discrepancies are observed. In the near-ultraviolet our calculations are in agreement with recent atmospheric retrievals and the upper limit obtained using broad band spectroscopy by Wilson et al. (J. Quant. Spectrosc. Radiat. Transf., 2016, 170, 194) but do not support recent suggestions of very strong absorption in this region.


15
Water vapor is a major absorber of light in the terrestrial atmosphere and it interferes with atmospheric retrievals from the microwave to the near-ultraviolet (Lampel et al., 2015). The water molecule dissociates at 41 145.92 cm −1 (Boyarkin et al., 2013), and there are almost no rovibrational transitions beyond that. Although the absorption of water vapor in the nearultraviolet is known to be weak, particularly when compared to features in the infrared, it obscures retrievals of electronic spectra of important (from atmospheric and pollution monitoring perspective) molecules with trace abundances in the terrestrial 20 atmosphere (Fleischmann et al., 2004;Cantrell et al., 1990;Stutz et al., 2000). Retrievals performed in the visible and nearultraviolet have a long record of success (Gonzalo Gonzalez Abad et al., 2019). Water vapor is one such molecule where retrievals have already been performed in the visible spectrum using the Ozone Monitoring Instrument (OMI) (Levelt et al., strong absorption features provided by Du et al. (2013). Wilson et al. report an upper bound on the water vapor absorption in this region of 5 × 10 −26 cm 2 molecule −1 which is at least a factor of ten lower than the peaks reported by the other studies.
In this work we create a new semi-empirical potential energy surface that accurately models both the rotational behavior of those high J states while also predicting states near dissociation to a reasonable degree of accuracy. With this surface, a new line list that extends into the near-ultraviolet is calculated and used to investigate the available laboratory and atmospheric 70 measurements of water vapor absorption in the blue and near-ultraviolet.

Fitting the Ab Initio Surface
Approximately 16 000 electronic structure calculations were previously performed for a dipole moment surface at the MR-CI (multi-reference configuration interaction) level of theory utilizing an aug-cc-pCV6Z basis set (Dunning, 1989;Woon and 75 Dunning Jr., 1995;Peterson and Dunning, 2002) and the Douglass-Kroll-Hess Hamiltonian to order two (DKH2) (Conway et al., 2018). These calculations span water bond lengths in the range of 1.3 -4.0 a 0 with angles between 30 -178 o . Setting the energy at the equilibrium configuration, r e = 1.8141 a 0 and θ e = 104.52 o , to zero, the maximum energy of these ab initio calculations that we consider is 57 423 cm −1 .
These points need to be fitted to a functional form to obtain an ab initio PES; in the fit each data points was weighted 80 as a function of their energy, with weights w i smoothly reducing towards zero as energy increases. The weighting function considered here is similar to the function used by Partridge and Schwenke (1997 surface could not accurately predict energies from the bottom of the well up to dissociation, hence they follow the procedure of Varandas (1996) and define a piece-wise potential. The same methodology was recently used to creating a PES for the C 3 molecule (Rocha and Varandas, 2018). We are also interested in accurately predicting energies that extend into the near-ultraviolet and so, we too use a piece-wise defined potential as given by where χ E is a switching function dependent upon energy (E): and r 1 , r 2 and θ are the corresponding values of the bond lengths and inter-bond angle. This function ensures smoothness and the parameters ζ s and β control the range of the switch. Our values are similar to those of the POKAZATEL PES, except our switching point ζ s is different. By lowering our ζ s from the 35 000 cm −1 value of POKAZATEL to 30 000 cm −1 , we allow 95 high order parameters in V low to have greater influence on the upper levels.
Due to the difficulty of fitting data in different energy regions, it is helpful to begin with a well defined functional form, hence the starting point for V up in our new PES is the V up function of the POKAZATEL potential. However for V low , we employ a new functional form defined as where r ie = (r i −r e ) for i = 1, 2. r 12 is the separation between the two hydrogen atoms, while r e = 1.8141 a 0 is the equilibrium bond length and θ e = 104.52 o is the angle at equilibrium. α was determined from a series of optimizations and the optimal value was found to be 1.24. D 1 and D 2 were also floated during our initial linear least square fits and are set to 42778.44 and 683479.329404 cm −1 respectively. The expansion variables ζ 1 , ζ 2 and ζ 3 are defined as 105 ζ 1 = (r 1 + r 2 )/2 − r e , ζ 2 = (r 1 − r 2 )/2, ζ 3 = cos θ − cos θ e .
G(θ) and F (r 1 , r 2 ) are dimensionless damping functions that constrain the potential in the limits of θ → 0 and r 1,2 → ∞.
These are defined as: 110 The number of parameters C ijk were optimized to provide the lowest RMS deviation from the underlying ab initio data such that there are also no 'holes' created from over-fitting. A 'hole' is an unphysical feature of a PES that often appears as a continuous (although not always) drop/dip in the surface, where it should instead be smooth. We found that using 250 parameters provided the lowest RMS deviation of 35 cm −1 from the electronic structure calculations. This value is large due to the large discrepancy between our ab initio data points and V up from POKAZATEL rather than from our fitting of V low . The rections (BODC), also known as the adiabatic correction, did not agree with those calculated by Zobov et al. (1996). The two calculations did however exhibit better agreement for the different quanta of bend in ν 2 . The adiabatic correction is known to be large for high stretch modes , particularly for those in the visible and near-ultraviolet which we are interested in. However, neither source is well tested nor suited for such energetic states, hence we chose to omit this correction to our surface and rely on fitting to experiment to incorporate this effect.

Nuclear Motion Calculations
We use the DVR3D  suite of programs for solving the nuclear motion problem. For these calculations we take Radau coordinates with a bisector embedding and use a 55 by 40 discrete variable representation (DVR) grid with Morse oscillator like functions in r and associated Legendre polynomials in θ, respectively. The DVR for these basis sets is constructed using Gaussian quadrature schemes in associated-Laguerre and associated-Legendre polynomials respectively in 140 r and θ. For the Morse oscillator-like functions, we take r E = 3.0, ω = 0.007 and β = 0.25 (all in a.u.), which are the values used to compute the POKAZATEL line list. For the vibrational problem, matrices of dimension 3 500 are diagonalized and used as a basis for the full rovibrational problem. For this, matrices of dimension 600(J + 1 − p) are diagonalized, where J is the total angular momentum and p is the parity (p = 0 or 1). Nuclear masses have been used throughout.
These parameters have been optimized for the initial J = 0 problem such that vibration energies below 27 000 cm −1 are 145 well converged to better than 0.01 cm −1 , while for energies at 37 000 cm −1 the convergence error is less than 0.03 cm −1 .    (Mant et al., 2018). In this procedure, one maintains the overall structure of the underlying ab initio surface while simultaneously optimizing the parameters of the fit. This prevents the development of unwanted 'holes' while refining.

155
Overall, we are trying to minimize: is the typical observed minus calculated DVR3D ro-vibrational energy and similarly ∆ (ai) j is the difference between ab initio and calculated potential energies. The factor f is the 'weight' of our semi-empirical PES to our initial ab initio surface. Setting f too large can result in over-fitting if the sum over j and/or i is too small.

160
The Hellman-Feynmann theorem allows us to efficiently calculate the derivative of an energy level with respect to a particular parameter in our potential, required for the least-squares fit. With this, we can iterate and optimize the parameters of the PES to reduce the deviation of our semi-empirical energies from the observed levels. The MARVEL (measured active rotationalvibrational energy levels) procedure (Furtenbacher et al., 2007;Császár et al., 2007;Furtenbacher and Császár, 2012) was originally constructed for a IUPAC study of water spectra (Tennyson et al., 2014). The resulting empirical energy levels 165 for H 2 16 O  have been subsequently been updated in response to both improvements to the MARVEL algorithm (Tóbiás et al., 2019) and to the availability of new data (Furtenbacher et al., 2020). We refine our potential to updated MARVEL energy levels with J = 0, 2, 5, 10, 15 and 20, representing approximately 4 000 states. The more recent potentials for water vapor (Shirin et al., 2003;Polyansky et al., 2018;Mizus et al., 2018;Bubukina et al., 2011) have been limited to refinement of states with J = 0, 2 and 5, which is not sufficient to accurately predict high J levels.

170
The only near-ultraviolet energy levels available for H 2 16 O come from the multiphoton experiments of Grechko et al. (2010Grechko et al. ( , 2009) and span states below J 7. The reduced number of measurements in the blue-violet and near-ultraviolet makes the V up particularly difficult to refine accurately. More high resolution experimental work in these regions would be welcome.

175
For our initial un-refined ab initio PES, the average deviation from the MARVEL J = 0 ab initio vibration band origins (VBOs) below 37 000 cm −1 is approximately 2 cm −1 , a figure dominated by overtones in ν 2 . Refining to the VBOs alone is known not to produce accurate results (Schryber et al., 1997). However, fits to J = 0 levels are significantly faster and provides a good starting point for refining using non-zero J states.
For the first refinement of J = 0 VBOs, we set the weight of all levels with energies greater than 26 000 cm −1 to 0.1, while 180 those less than this carry a weight of 1. This ratio of 10:1 was chosen such that we can include all states in the refinement without deteriorating the residuals of the lower states. The weight of our semi-empirical PES to the underlying ab initio surface was fixed at 1 000, which is large enough to provide accurate results, while also small enough to prevent the formation of undesirable 'holes'. For this process, V up was held constant. Doing this allowed us to reduce our average RMS error from the MARVEL VBOs to only 0.08 cm −1 . For the second step, the ratio of weights for those states below 26 000 cm −1 to those above this limit are now switched compared to the previous refinement of V low . 61 of the lowest order parameters in V up are optimized to improve the agreement between both our ab initio data points and the MARVEL levels, while V low was held fixed. For this refinement of V up , f carries the same value as the previous step and is 1 000.
For the third stage, we return to V low and focus on the refinement of energies in higher J states, notably J = 2, 5, 10, 15 190 and 20. The weighting criteria remains the same as in step one and V up was not optimized. Next, for step four, we apply the weighting criteria of step two and refine V up to states in J = 0, 2, 5, 10, 15 and 20 and hold V low fixed. Although there are no known near-ultraviolet states with J = 10, 15 and 20, the low order parameters in V up potentially interact very weakly with the lower states and it is important to include these in the optimization such that we do not lose the rotational dependence of these levels. This step is repeated several more times, each time gradually increasing f towards 10 5 . Increasing f above this 195 provided no improvement in the RMS error and this concluded the refinement of V up .
For the final optimization of our potential, we refine V low to states in J = 0, 2, 5, 10, 15 and 20 using the 10:1 ratios of step one while also gradually increasing f to 10 10 . Going beyond this offered no improvement in the final RMS error and only increases the risk of over-refining. This f value is significantly larger than that used in the final refinement of V up , which is entirely justified by there being significantly fewer states in the near-ultraviolet.

200
It is common to provide a breakdown of residuals for the VBOs in a long table; however, as already described, these states alone cannot be used to measure how well a potential can calculate energy levels. Hence, in Table 1, we provide the average deviation of the calculated energy levels using our new potential, the POKAZATEL potential and the PES15K potential to those MARVEL states with J ≤ 20. Firstly, we must acknowledge that PES15K is excellent at reproducing those energy levels below 15 000 cm −1 with J ≤ 9, but above this J threshold, the residuals begin to increase and eventually surpass ours. A 205 similar situation occurs for the POKAZATEL surface, but the RMS error now increases much more rapidly with J. This is most likely due to these potentials only being refined to states in J = 0, 2 and 5. Our new potential offers lower residuals for those high J states while also providing relatively accurate energies into the near-ultraviolet.  To generate transition intensities, we require an accurate dipole moment surface. The CKAPTEN (Conway et al., 2018) surface has previously been shown to provide reliable dipole values (Conway et al., 2020a) and hence, we will use this DMS to calculate our spectra. We compute a line list for H 2 16 O that extends to 41 200 cm −1 , i.e. beyond the shortest wavelength that will be accessible by the NASA TEMPO mission which is 290 nm (Zoogman et al., 2017). The accuracy of this line list is not verified 220 for transitions with frequencies beyond 37 000 cm −1 and this region may be susceptible to basis set convergence issues. In HITRAN (Gordon et al., 2017) units, the minimum intensity considered here is 10 −32 cm molecule −1 and J max = 20, all assuming 296 K. There are no transitions in the near-ultraviolet that include J = 20 which have intensities surpassing our 10 −32 cm molecule −1 threshold. We then proceed to 'MARVELize' this line list, meaning, we replace, where possible, our calculated energy levels with empirical ones from MARVEL, which also allows us to add extra quantum labels (K a , K c , ν 1 , 225 ν 2 , ν 3 ) on top of the rigorous labels J, parity and symmetry. This process is described in more detail in (Conway et al., 2020a).
In an earlier study (Conway et al., 2018), we generated near-ultraviolet spectra with the POKAZATEL potential and CK-APTEN DMS, although the thresholds used were different to those used here. The maximum transition frequency considered in the previous study was 35 000 cm −1 with J max = 14 and the minimum intensity considered was 10 −30 cm molecule −1 , but these criteria should be sufficient for comparison studies in the near-ultraviolet. Comparing these calculations to our new ones 230 will allow us to ascertain how different potential surfaces influence intensities.
In Figure  is reasonable to assume POKAZATEL also under-absorbs at 19 000 cm −1 . At the 400 nm limit of HITRAN2016, we begin to 240 notice larger differences in the intensities, although our new data agrees much better with POKAZATEL.
Comparing our new line list to the old calculations indicates that the new potential does not greatly alter the intensities, which was expected as for stable transitions the DMS controls the magnitude of the absorption (Lodi and Tennyson, 2012).
Hence, the differences which are observed in the near-ultraviolet are due to differences in the underlying dipole surfaces. The states, however, the nearest electronic state is an unbound 1 B 1 state which corresponds to the spectral feature at approximately 170 nm as confirmed by numerous experiments (Chung et al., 2001;Mota et al., 2005;Cantrell et al., 1997b, a). These experiments show that absorption decreases exponentially with increase of the wavelength (i.e. decrease of the wavenumber), as expected considering the upper state is unbound. In order for these electronic transitions to absorb more in the red one needs to populate high vibrational levels of the ground state, which is not possible at atmospheric temperatures. At room temperature, 270 this band is unlikely to affect absorption in this 290 -350 nm interval to the degree quoted by Pei et al. Conversely our line list, which predicts greatly reduced cross sections in this reason appear to be in line with atmospheric observations. We are currently collaborating with atmospheric scientists at the Center for Astrophysics | Harvard and Smithsonian (Wang et al., 2014(Wang et al., , 2019Gonzalo Gonzalez Abad et al., 2019) to further investigate this near-ultraviolet absorption by water vapor but this effort would greatly benefit from further experimental research. Initial tests will focus on data obtained from the Ozone
Our calculated line list is available in the supplementary material and assumes 100% H 2 16 O isotopic abundance.

Conclusions
A new semi-empirical potential energy surface for the main water vapor isotopologue is created by refining (Yurchenko et al., 2003)  near-ultraviolet would interfere with atmospheric retrievals in a manner which is simply not observed (Lampel et al., 2017).
Further experimental work on the near-ultravioloet absorption by water vapor is therefore required to resolve these issues.

305
Considering the improvements this new potential surface has to offer for high temperature spectra, future work is planned on this. The potential energy surface is available in the supplementary material as a FORTRAN F90 file along with the calculated line list assuming 100% abundance. This line list will form basis for the HITRAN2020 line list in the visible and UV where it will be supplied with best available experimental data, especially for broadening. The calculated line list will also be added to the ExoMol (Tennyson et al., 2016) website in the ExoMol format.

310
Data availability. The data to this article is provided in the supplementary material.
Code availability. The Fortran code for the potential energy surface is provided in the supplementary material.