Estimation of rate coefﬁcients for the reactions of O 3 with unsaturated organic compounds for use in automated mechanism construction

. Reaction with ozone (O 3 ) is an important re-moval process for unsaturated volatile organic compounds (VOCs) in the atmosphere. Rate coefﬁcients for reactions of O 3 with VOCs are therefore essential parameters for chemical mechanisms used in chemistry transport models. Updated and extended structure–activity relationship (SAR) methods are presented for the reactions of O 3 with mono- and poly-unsaturated organic compounds. The methods are optimized using a preferred set of data including reactions of O 3 with 221 unsaturated compounds. For conjugated dialkene structures, site-speciﬁc rates are deﬁned, and for isolated poly-alkenes rates are deﬁned for each double bond to determine the branching ratios for primary ozonide formation. The information can therefore guide the representation of the O 3 reactions in the next generation of explicit detailed chemical mechanisms.


Introduction
Volatile organic compounds (VOCs) are emitted to the atmosphere from both biogenic and anthropogenic sources. Many of these compounds are unsaturated (i.e. contain at least one double bond), including the ubiquitous biogenic VOCs isoprene and monoterpenes (Sindelarova et al., 2014). Chemical degradation of these compounds in the atmosphere leads to a variety of secondary pollutants including ozone and secondary organic aerosol (SOA). Unsaturated com-pounds are generally highly reactive and react with the oxidant ozone (O 3 ), which is typically present in the troposphere at mixing ratios in the range 10-200 ppb. The ozonolysis reaction involves the concerted addition of O 3 to the double bond, followed by decomposition of the short-lived primary ozonide to yield a carbonyl compound and a carbonyl oxide, commonly referred to as a Criegee intermediate (Criegee, 1975). The reaction is important as a nonphotolytic source of radicals and reactive intermediates, including the hydroxyl radical (e.g. Johnson and Marston, 2008;Cox et al., 2020). Ozonolysis of large alkenes (e.g. monoterpenes and sesquiterpenes) is also particularly efficient at producing SOA (Hallquist et al., 2009), including as a result of the formation of low-volatility products from reactions of Criegee intermediates with atmospheric trace gases (e.g. Heaton et al., 2007;Sakamoto et al., 2013;Zhao et al., 2015;Mackenzie-Rae et al., 2018;Chhantyal-Pun et al., 2018), and from auto-oxidation mechanisms involving peroxy radicals formed from decomposition of the Criegee intermediates (e.g. Ehn et al., 2014;Jokinen et al., 2015).
Previous assessments using explicit organic degradation mechanisms have demonstrated that the atmosphere contains an almost limitless number of organic compounds (e.g. Aumont et al., 2005), for which it is impractical to carry out experimental kinetics studies. This has resulted in the development of estimation methods for rate coefficients (e.g. see Calvert et al., 2000Calvert et al., , 2011McGillen et al., 2008;Vereecken et al., 2018, and references therein), which have been applied widely in chemical mechanisms and impact assessments. As part of the present work, a set of preferred kinetic data has been assembled for the reactions of O 3 with 221 unsaturated organic compounds, based on reported experimental studies (see Sect. 2 for further details). Updated structure-activity relationship (SAR) methods are presented for the initial reactions of O 3 with unsaturated organic compounds. In the cases of poly-alkenes, the rate coefficient is defined in terms of a summation of partial rate coefficients for O 3 reaction at each relevant site in the given organic compound, so that the attack distribution is also defined. Application of the methods is illustrated with examples in the Supplement.
The information is currently being used to guide the representation of the O 3 -initiation reactions in the next generation of explicit detailed chemical mechanisms, based on the Generator for Explicit Chemistry and Kinetics of Organics in the Atmosphere (GECKO-A; Aumont et al., 2005) and the Master Chemical Mechanism (MCM; Saunders et al., 2003). It therefore contributes to a revised and updated set of rules that can be used in automated mechanism construction and provides formal documentation of the methods. This paper is part of a series of publications, including rules for the estimation of rate coefficients and branching ratios for the reactions of OH with aliphatic (Jenkin et al., 2018a) and aromatic (Jenkin et al., 2018b) organic compounds, and for peroxy radical reactions (Jenkin et al., 2019). Rules governing the decomposition of the primary ozonides, formed initially from the O 3 -initiation reactions, and the subsequent chemistry of the Criegee intermediates formed are considered in a further paper (Mouchel-Vallon et al., 2020).

Preferred kinetic data
A set of preferred kinetic data has been assembled from which to develop and validate the estimation methods for the O 3 rate coefficients. The complete set includes 298 K data for 221 compounds, comprising 111 alkenes and 110 unsaturated oxygenated compounds. Temperature dependences are also defined for a subset of 39 compounds. In three cases, the preferred rate coefficient is an upper-limit value, and in three cases a lower-limit value. The information is provided as a part of the Supplement (spreadsheets SI_1 and SI_2). As described in more detail in Sect. 4, the oxygenates include both monofunctional and multifunctional compounds containing a variety of functional groups that are prevalent in both emitted VOCs and their degradation products, namely -OH, -C(=O)H, -C(=O)-, -O-, -C(=O)OH, -C(=O)O-, -OC(=O)-, -ONO 2 and -C(=O)OONO 2 . For a core set of 30 reactions, preferred kinetic data are based on the evaluations of the IUPAC Task Group on Atmospheric Chemical Kinetic Data Evaluation (Cox et al., 2020; http://iupac.pole-ether.fr/, last access: September 2020). The remaining values are informed by recommendations from other key evaluations with complementary coverage (e.g. Atkinson and Arey, 2003;Calvert et al., 2011Calvert et al., , 2015 and have been revised and expanded following review and evaluation of additional data not included in those studies (as identified in spreadsheets SI_1 and SI_2).

Alkenes
As discussed in detail previously (e.g. Calvert et al., 2000;Vereecken et al., 2018, and references therein), the data indicate that the rate coefficients are highly sensitive to alkene structure and depend on the degree of alkyl substitution of the unsaturated bond(s), on steric effects and on ring strain effects in cyclic compounds. The set of preferred kinetic data has been used to update and extend a SAR method that can be used to estimate the rate coefficients when no experimental determinations are available. Similar to previous appraisals (e.g. Jenkin et al., 1997;Calvert et al., 2000), reference rate coefficients (k) are defined for addition of O 3 to a series of alkene and conjugated dialkene structures, based on the preferred data for relevant sets of alkenes and conjugated dialkenes.

Acyclic monoalkenes
The set of preferred values contains data for the reactions of O 3 with 43 acyclic monoalkenes. The generic rate coefficients for O 3 addition to C=C bonds in acyclic monoalkene structures with differing extents of alkyl (R) substitution are given in Table 1 (k A1O3 -k A6O3 ). These rate coefficients are based on averages of the preferred values of k at 298 K and of the preferred temperature coefficients, E/R, for the identified sets of alkenes (as described in detail in the Table 1 comments), and are defined for "R" being a linear alkyl group (i.e. -CH 3 or -CH 2 R ′ ). In practice, reported data for sets of alk-1-enes (CH 2 =CHR) and 2-methylalk-1-enes (the majority of the CH 2 =CR 2 dataset) show small systematic increases in k with the size of the alkyl group (Mason et al., 2009), although the preferred data for the other structural alkene groups do not apparently show such dependences (see Fig. 1). In view of the high sensitivity of k to the degree of alkyl substitution of the double bond, the use of single sizeindependent values of k for each of these structural groups is considered acceptable for the present SAR.
stituent in the molecule. A factor, F α (alkyl), describing the effect of each (acyclic) alkyl group at the α carbon atom was determined by minimizing the summed square deviation, ((k calc − k obs )/k obs ) 2 , for the set of relevant branched alkenes (the resultant value is given in Table 2, along with those for selected oxygenated groups discussed below). It should be noted that the reported value of k obs for 3,4-diethyl-hex-2-ene is substantially lower than the reference value of k A5O3 for the CHR=CR 2 structure (i.e. by 2 orders of magnitude; see Table 1), and this compound was therefore excluded from the optimization procedure for F α (alkyl). Confirmatory measurements of that rate coefficient, and data for other α-branched alkenes, are therefore required to test and refine the method proposed here. In the absence of reported rate coefficients as a function of temperature for α-branched alkenes, the temperature dependence of F α (alkyl) is assumed to be described by F α (alkyl) = exp(298 × ln(F α (298) (alkyl))/T ); further temperature-dependent data are also required for this assumption to be fully tested. The limited data for more remotely branched alkenes suggest no significant effect on the rate coefficient. In those cases, the rate coefficients in Table 1 are applied as a default. The corresponding absolute deviations, (k calc − k obs )/k obs , at 298 K for 40 acyclic monoalkenes in the set of preferred values indicate that the estimation method reproduces the observed values to within about +50 % −30 % (see also Fig. 2). The three monoalkenes excluded from the procedure were ethene (a unique structure, for which no value needs to be calculated), 3,4-diethyl-hex-2-ene (as indicated above) and 3,4dimethyl-hex-3-ene (for which only a lower-limit preferred Table 2. Substituent factors, F α (298) (X), describing the effect of the given substituent at the α carbon atom in R groups in alkenes and in allylic oxygenated compounds at 298 K a .
value is available). In the final case, k calc is a factor of 3 higher than the lower-limit value.

Cyclic monoalkenes
The set of preferred values contains data for reactions of O 3 with 14 simple monocyclic monoalkenes containing endocyclic double bonds, including cyclopentenes, cyclohexenes, cycloheptenes, cyclooctenes and cyclodecenes, with the temperature dependence also defined in seven cases. The values of k for these sets of compounds show systematic deviations from those observed for acyclic monoalkenes with the same level of substitution, likely resulting from the effects of ring strain (e.g. Calvert et al., 2000). Table 3 provides a series of ring factors, F ring , based on optimization to the 298 K rate coefficients and E/R values within this dataset (as described in detail in the Table 3 comments). The rate coefficients for cyclic alkenes with endocyclic double bonds are therefore determined from the following expression, where k AO3 is the appropriate reference rate coefficient (k A3O3 -k A6O3 ) in Table 1. For polycyclic alkenes, a value of F ring needs to be applied for each ring for which the given C=C bond is a component. In addition to the F ring values given in Table 3, it is also possible to infer a tentative value of 12 for F ring (298) for 11-member rings, based on the reported rate coefficient for the sesquiterpene α-humulene (which contains a 1,4,8-cycloundecatriene ring), with the as-  .24 e, f a Based on data for cyclopentene, 1-methylcyclopentene and 3-methylcyclopentene. b Based on data for cyclohexene, 1-methylcyclohexene, 3-methylcyclohexene, 4-methylcyclohexene and 1,2-dimethylcyclohexene. c Based on data for cycloheptene and 1-methylcycloheptene. d Based on data for cis-cyclooctene, 1-methylcyclooctene and 3-methylcyclooctene. e Based on data for cis-cyclodecene. f Tentative values of F ring (298) of 2.1 and 12 can be derived for 9-and 11-member rings, respectively, based on limited data for structurally complex sesquiterpenes (see Sect. 3.2). These can be applied with an approximate average value of A F (ring) = 0.3, and B F (ring) values of −580 and −1100 K, respectively.
sumption that the values for F ring can be applied to cyclic systems with unconjugated multiple double bonds. The reported rate coefficient for the sesquiterpene β-caryophyllene (which contains a trans-cyclononene ring) then allows a tentative value of F ring (298) = 2.1 for nine-member rings, although it is noted that the level of ring strain, and therefore F ring , likely depends on the cis-/trans-conformation. Clearly, additional data for cyclononenes and cycloundecenes are required to confirm these tentative values of F ring . As with the acyclic alkenes above, a value of F α (alkyl) is also applied for each (acyclic) alkyl group at the α carbon atom in both monocyclic and polycyclic alkenes, where appropriate. For this procedure, the term "acyclic" is taken to mean that the first carbon atom in the substituent group is not part of a cycle that also contains the α carbon atom. To avoid ambiguity in defining the number of α acyclic alkyl groups, the base structure is taken to be cyclic as a default, and values of F α (alkyl) are applied as appropriate to each acyclic alkyl group. This rule applies whether the double bond is endocyclic or exocyclic and has the effect of maximizing the number of acyclic alkyl groups (e.g. see example calculations B2-B5 in the Supplement). Table 4. Arrhenius parameters (k = A exp(−(E/R)/T )) for the rate coefficients for O 3 addition to generic conjugated dialkene structures, and the rate coefficient values at 298 K a .

Dialkene structure
Parameter 10 0 1000 l a k 298 K for bold structures based on data for the compounds identified in subsequent comments, with other values based on trends in the data. A value of A = 10 −14 cm 3 molecule −1 s −1 adopted in all cases, based on the reported parameters for buta-1,3-diene, trans-penta-1,3-diene, isoprene (2-methyl-buta-1,3-diene) and 2,3-dimethyl-buta-1,3-diene. E/R = 298 × ln(A/k 298 K ). Rate coefficients are also default values for conjugated dialkenes possessing remote substituents (i.e. β or higher), including isolated double bonds. b k 298 K based on data for isoprene (2-methyl-buta-1,3-diene). c k 298 K based on data 2,3-dimethyl-buta-1,3-diene. d k 298 K based on rounded average of data for cis-penta-1,3-diene and trans-penta-1,3-diene; parameters are assumed to apply to both cis-and trans-isomers. e k 298 K based on data for 2-methyl-penta-1,3-diene. The same value of k 298K is also adopted for CH 2 =CHC(R)=CHR and CH 2 =CHCH=CR 2 . f k 298 K for CH 2 =C(R)C(R)=CHR taken to be a factor of 2.1 greater than that for CH 2 =CHC(R)=CHR, based on the trend in k observed on going from buta-1,3-diene (0.63) to CH 2 =C(R)CH=CH 2 and CH 2 =C(R)C(R)=CH 2 , and from CH 2 =CHCH=CHR to CH 2 =C(R)CH=CHR. The same value of k 298 K is also adopted for CH 2 =C(R)CH=CR 2 and CH 2 =CHC(R)=CR 2 . g k 298 K for CH 2 =C(R)C(R)=CR 2 taken to be a factor of 3 greater than that for CH 2 =C(R)C(R)=CHR, based on the trend in k observed on going from CHR=CHCH=CHR to CR 2 =CHCH=CR 2 . h k 298 K based on average of data for cis-,trans-hexa-2,4-diene and trans-,trans-hexa-2,4-diene; parameters are assumed to apply to all cis-and trans-isomer combinations. i k 298 K for CHR=CHCH=CR 2 taken to be a factor of 3 lower than that for CR 2 =CHCH=CR 2 , based on the trend in k observed on going from CHR=CHCH=CHR to CR 2 =CHCH=CR 2 ; the same value of k 298 K is also adopted for CHR=CHC(R)=CHR. j k 298 K based on data for 2,5-dimethyl-hexa-2,4-diene; the same value of k 298 K is also adopted for CHR=C(R)C(R)=CHR, CHR=C(R)CH=CR 2 and CHR=CHC(R)=CR 2 . k k 298 K for CR 2 =C(R)CH=CR 2 taken to be a factor of 2.1 greater than that for CR 2 =CHCH=CR 2 , based on the trend in k observed on going from buta-1,3-diene (0.63) to CH 2 =C(R)CH=CH 2 and CH 2 =C(R)C(R)=CH 2 , and from CH 2 =CHCH=CHR to CH 2 =C(R)CH=CHR. l k 298 K for CR 2 =C(R)C(R)=CR 2 assigned the same value as A, this being compatible with expected increase in k 298 K relative to that for CR 2 =CHCH=CR 2 .

Acyclic conjugated dialkenes
The generic rate coefficients for O 3 addition to C=C-C=C bond structures in acyclic conjugated dialkenes with differing extents of alkyl (R) substitution are given in Table 4, based on reported data for nine compounds (k D1O3 −k D11O3 ). These rate coefficients are based on the preferred values of k for the identified dialkenes, with those for some structural groups being inferred from the observed trends in the impact of successive alkyl substitution on k, as described in detail in the Table 4 comments. The values are generally based on data for conjugated dialkenes for which "R" is a linear alkyl group (these making up almost all of the reported data). The limited information on dialkenes possessing branched substituent groups (5-methyl-hexa-1,3-diene and 5,5-dimethylhexa-1,3diene) suggests that there is a less pronounced reducing effect on k, compared with that observed for the monoalkenes above. Similar to the approach used for the reactions of OH with conjugated dialkenes (Jenkin et al., 2018a), the following expression is therefore applied, where k DO3 is the appropriate reference rate coefficient in Table 4, and a value of F α (X) ( Table 2) is applied for each α substituent in the molecule. Temperature-dependent recommendations are available for four acyclic conjugated dialkenes (see Table 4 com- Comment 6-member ring 4.5 b 7-member ring 0.44 c 8-member ring 0.06 d a These factors apply to conjugated dialkene systems that are completely within the given ring structure. In cases where the conjugated dialkene is only partially within the ring (e.g. as in the case of β-phellandrene), the appropriate value of F ring given in Table 3 should be applied. In the absence of data, the temperature dependence is assumed to be described by F ′ ring = exp(298 × ln(F ′ ring (298) )/T ). b Based on data for cyclohexa-1,3-diene, 5-isopropyl-2-methyl-cyclohexa-1,3-diene (α-phellandrene) and 1-isopropyl-4-methyl-cyclohexa-1,3-diene (α-terpinene). c Based on data for cyclohepta-1,3-diene. d Based on data for based on data for cis-,cis-cycloocta-1,3-diene. ments), with the recommended pre-exponential factor, A, being close to 10 −14 cm 3 molecule −1 s −1 in each case. This value of A is therefore adopted for all of the generic rate coefficients k D1O3 -k D11O3 , with E/R given by 298 × ln(A/k 298 K ).

Cyclic conjugated dialkenes
The set of preferred values contains 298 K data for the reactions of O 3 with five monocyclic conjugated dialkenes, including cyclohexa-1,3-dienes, cyclohepta-1,3-diene and cycloocta-1,3-diene. The values of k for these sets of compounds all show systematic deviations from those observed for acyclic conjugated dialkenes with the same level of substitution, again likely resulting from the effects of ring strain (e.g. Calvert et al., 2000;Lewin et al., 2001). In each case, these also differ substantially from those for the same sized cyclic monoalkenes (as shown in Table 3), and Table 5 shows a series of ring factors, F ′ ring , based on optimization to the cyclic conjugated dialkene dataset. The rate coefficients for cyclic conjugated dialkenes are therefore determined from the following expression: In the absence of data, the temperature dependence is assumed to be described by F ′ ring = exp(298 × ln(F ′ ring (298) )/T ). For polycyclic systems, a value of F ′ ring needs to be applied for each ring for which the given C=C-C=C bond structure is a component. As for acyclic conjugated dialkenes, a value of F α (X) is applied for each α substituent in the molecule, where appropriate. In the cases of cyclic (or polycyclic) conjugated dialkenes with α acyclic alkyl substituents, the base structure is once again taken to be cyclic, and values of F α (alkyl) are applied as appropriate (e.g. see example calculations D1 and D2 in the Supplement). Note that for the special case of conjugated dialkenes for which only one of the double bonds is within the ring (e.g. β-phellandrene: example D2 in the Supplement), a modified version of Eq. (4) is applied, in which F ′ ring is replaced by the appropriate value of F ring .

Other alkenes and poly-alkenes
The remainder of the alkene dataset consists of preferred values for 30 acyclic and cyclic compounds containing various combinations of isolated double bonds and conjugated dialkene structures, for which the methods described above can be used to estimate rate coefficients. There are also preferred values for a limited set of three conjugated poly-alkenes (one acyclic and two cyclic) and four alk-1-enyl-substituted aromatics (styrenes), for which there are insufficient data to attempt development of a SAR method. The observed and calculated rate coefficients for 100 alkenes and poly-alkenes are compared in the correlation plot in Fig. 2. These are all the compounds for which preferred values are available in the reference database (spreadsheet SI_1), less those not covered by the SAR methods, i.e. the three conjugated poly-alkenes and four styrenes referred to above, and ethene and buta-1,3-diene, which are unique structures. As shown in Fig. 2, the SAR methods perform well for the sets of acyclic monoalkenes and conjugated dialkenes, and for the monocyclic monoalkenes and conjugated dialkenes. This is because the data show well-defined variations with structure, and because most of those rate coefficients were used as the basis of the SAR methods (as identified in Tables 1-5 and in spreadsheet SI_1).
The data for the remaining 30 compounds are subdivided into acyclic (6 compounds), monocyclic (11 compounds) and polycyclic (13 compounds) in Fig. 2, with the observed data from the first two categories also generally well described by the SAR methods. The observed values of k for the remaining polycyclic compounds are also reasonably well correlated, although with much more scatter than for the simpler structures. This is almost certainly due to a combination of ring strain and steric effects in these complex structures that cannot be fully accounted for by the SAR methods developed here. The alkenes in all these categories are identified in spreadsheet SI_1.

Unsaturated oxygenated compounds
The set of preferred values contains data for the reactions of O 3 with 110 unsaturated oxygenated compounds, possessing -OH, -C(=O)H, -C(=O)-, -O-, -C(=O)OH, -C(=O)O-, -OC(=O)-, -ONO 2 and -C(=O)OONO 2 substituents. The SAR methods applied to these compounds depend on the location of the substituent oxygenated group relative to the C=C bond and fall into three categories. For those possessing oxygenated substituents at the α position (i.e. allylic oxygenates), the methods described above for alkenes and di-alkenes are modified to take account of the effect of the given substituent (see Sect. 4.1). More remote oxygenated substituents are assumed to have no effect, and the appropriate alkene or dialkene rate coefficient is applied unmodified in these cases (see Sect. 4.2). When the oxygenated group (including -C(=O)H, -C(=O)-and -C(=O)O-) is a substituent of the C=C group itself (i.e. vinylic oxygenates), the method is based on a series of reference rate coefficients for those specific structures, which are derived from the preferred data for the relevant sets of oxygenated compounds (see Sect. 4.3).

Allylic oxygenated compounds
Preferred kinetics data at 298 K are available for 25 allylic oxygenated compounds, containing the following substituents at the α carbon atom: -OH (14 compounds), -C(=O)H (3 compounds), -C(=O)R (1 compound), -OR (4 compounds, 1 possessing a remote -OH group), -OC(=O)R (1 compound), and both -ONO 2 and -OH (2 compounds, with 1 having only a lower-limit recommendation). These data were used, in conjunction with the methods described for alkenes and dialkenes in Sect. 3, to optimize the corresponding values of F α (X) given in Table 2. It was found that the effect of the -C(=O)H and -C(=O)R could reasonably be described by a single factor, F α (-C(=O)-), and the further assumption was made that the same factor applies to groups containing the -C(=O)O-sub-structure. It is noted that several of the factors are based on data for limited sets of compounds (in some cases a single compound), and further data are clearly required to test the approach fully. As shown in Fig. 3, however, the method appears to work very well for most of the relevant compounds containing -OH groups (the largest subset of allylic oxygenates), providing some support for the approach.
There are almost no temperature-dependent data for allylic oxygenated compounds, and the temperature dependence of F α (X) is therefore assumed to be described by F α (X) = exp(298 × ln(F α (298) (X))/T ). The recent study of Kalalian et al. (2020) reports temperature dependences for the reactions of O 3 with cis-pent-2-en-1-ol and pent-1-en-3-ol. In the former case, the reported value of (E/R) obs = 902 K is very well described using the above assumption, which leads to (E/R) calc = 895 K, whereas in the latter case the observed and calculated values differ by about a factor of 2, (E/R) obs = 730 K and (E/R) calc = 1590 K. Clearly, further temperature-dependent data are required for a variety of allylic oxygenated compounds for the method to be fully tested and refined.

Unsaturated compounds containing remote oxygenated substituents
The preferred data include rate coefficients for 22 unsaturated compounds possessing remote oxygenated substituents. In these cases, the oxygenated substituent is as-sumed to have no effect, and the corresponding alkene or dialkene rate coefficient, calculated as described in Sect. 3, is applied unmodified. As shown in Fig. 4, this assumption provides 298 K values of k calc that are generally within about a factor of 2 of the values of k obs , and therefore within the scatter of the methods when applied to unsubstituted alkenes and dialkenes.
In the majority of cases, the presence of the remote oxygenated group appears to reduce the value of the rate coefficient slightly, compared with that of the generic alkene or dialkene rate coefficient. In the cases of a series of cis-hex-3enyl esters (i.e. cis-CH 3 CH 2 CH=CHCH 2 CH 2 OC(=O)R) the rate coefficient is reported to depend systematically on the size of the remote R group, rather than displaying a consistent influence of the -OC(=O)-substructure itself (Zhang et al., 2018). Within this series, the rate coefficient for the largest compound (with R = n-C 3 H 7 ) agrees well with the reference rate coefficient, whereas that for the smallest (with R = H) is about a factor of 3 lower than the reference rate coefficient. Clearly, further information is required for unsaturated compounds possessing remote oxygenated substituents before refined estimation methods can be developed that take account of this type of effect.
The preferred data include a rate coefficient for methyl chavicol (1-prop-2-enyl-4-methoxy-benzene), which contains a remote methoxy group as part of a methoxyphenyl substituent at the carbon atom α to the alkene double bond. The rate coefficient, reported by Gai et al. (2013), is well described by the generic rate coefficient k A1O3 (Table 1). This suggests that the aromatic substituent at the α carbon atom has no effect on the rate coefficient. However, further data on alk-2-enyl substituted aromatic compounds are ideally required to confirm this.

Vinylic oxygenated compounds
When the oxygenated group (including -C(=O)H, -C(=O)and -C(=O)O-) is a substituent of the C=C group itself (i.e. vinylic oxygenates), the data indicate that the rate coefficients are much less sensitive to the presence of other alkyl groups attached to the C=C group (and in some cases actually decrease upon additional substitution). In contrast, the data for some classes clearly show a greater influence of substituent size. It is therefore not possible to treat these compounds using modifications to the SAR methods presented for alkenes in Sect. 3, and it is necessary to assign generic rate coefficients for addition of O 3 to a series of vinylic oxygenate structures. Three categories of vinylic oxygenate structure are considered, namely vinyl aldehydes and ketones (Table 6), vinyl esters and acids (Table 7) and vinyl ethers ( Table 8).
The influence of substituent group size is clearly apparent in the data for vinyl aldehydes and ketones, vinyl ethers and alk-1-enoic acid alkyl esters, for example as discussed by Ren et al. (2019) for the esters. Accordingly, the follow-  ing expression is used to describe the 298 K data, where k • 298 K and α s are constants. n i is the number of carbon atoms in the ith substituent group, where each relevant substituent group is represented by "R" in the structures shown in Tables 6-8. k • 298 K therefore quantifies the rate coefficient when each R group in the given structure is CH 3 , and α s × k • 298 K is the incremental increase for each additional carbon in any substituent. As defined, therefore, the same incremental increase is assumed to apply to each R group in the given structure, although the trends in the preferred data are generally based on information for particular R groups. Additional data are therefore required to test this  Based on preferred data for methacrolein. c Based on data for trans-but-2-enal, trans-pent-2-enal, trans-hex-2-enal, trans-hept-2-enal, trans-oct-2-enal and trans-non-2-enal. d Based on data for 3-methyl-2-butenal. e Based on data for trans-2-methyl but-2-enal and 2-methyl pent-2-enal. f Based on data for methyl vinyl ketone and pent-1-en-3-one. g Based on data for 3-methyl-3-buten-2-one. h Based on data for pent-3-en-2-one, hex-4-en-3-one and 3-methyl-pent-3-en-2-one. i Based on data for 4-methyl-pent-3-en-2-one. Also inferred to apply to CR 2 =C(R)C(=O)R. j Based on data for 4-oxo-pent-2-enal and cis-and trans-3-hexen-2,5-dione. Value assumed to apply to all corresponding vinylic keto-aldehydes, dialdehydes and diketones, except for the unique case when all unspecified groups are H, i.e. butenedial.
assumption. It was also found that there was only marginal benefit in using independent values of α s for the different vinylic oxygenate categories, based on the data currently available. A single category-independent value of α s = 0.19 was therefore optimized for simplicity. The calculated and observed rate coefficients are compared on the scatter plot in Fig. 5. There is only limited information available on the temperature dependences of these reactions. Where data are available (e.g. for methacrolein and methyl vinyl ketone), the data suggest that the temperature dependence can reasonably be represented by k = A × exp(−(E/R)/T ), with A = 10 −15 cm 3 molecule −1 s −1 and E/R = 298 × ln(A/k 298 K ), and this approach is adopted in the present work.
There are very limited data for conjugated dialkenes containing vinylic oxygenated substituents and for cyclic vinylic oxygenates, and it is not possible to propose SAR methods for most oxygenated groups at the present time. The data include rate coefficients for some relevant conjugated dienals/dienones and cyclic vinyl ketones (hexa-2,4-diendial, cyclohex-2-en-1-one, β-ionone and acetyl-cedrene). In contrast to the compounds discussed above, the rate coefficients for these species are all reasonably well described by applying the appropriate conjugated dialkene or alkene rate coefficient determined by the methods presented in Sect. 3, reduced by a factor of 50 for each alkyl group replaced by a -C(=O)H or -C(=O)-group. This assumption is provisionally applied in the current work, although further data are clearly required. There are also limited data for some furans and dihydrofurans. The rate coefficients for these species are influenced by the compounds being aromatic (in the cases of the furans) and also by ring strain effects, and it is difficult to extend the methods developed here for unsaturated ethers to cover these species. The methods presented here are therefore not applicable to heterocyclic compounds with endocyclic double bonds, such as furans and dihydrofurans.
There are no data for compounds containing a number of vinylic oxygenated functional groups (e.g. -ONO 2 and -OOH), although such compounds are not expected to be prevalent in atmospheric chemistry. Data for the reactions of Table 7. Reference rate coefficients for O 3 addition to acyclic vinylic esters and acids a .
Oxygenate structure Parameter k • 298 K Comment 10 −17 cm 3 molecule −1 s −1 Alk-1-enoic alkyl esters 0.23 f a Determined from data for the compound or sets of compounds identified in subsequent comments. Rate coefficients are also default values for related compounds possessing other remote oxygenated substituents. Data for peroxymethacryloyl nitrate (MPAN) suggest that parameters for alk-1-enoic alkyl esters can reasonably be applied to corresponding unsaturated PANs and, by inference, peracids. The values of k • 298 K should be used in Eq. (5), with the globally optimized value of α S = 0.19. b Based on preferred data for methyl and n-butyl acrylate. c Based on data for methyl, ethyl, n-propyl, i-propyl, n-butyl and i-butyl methacrylate, ethyl crotonate and ethyl 3,3-dimethyl acrylate. Also inferred to apply to ROC(=O)C(R)=CHR and ROC(=O)C(R)=CR 2 . d Based on data for vinyl acetate, vinyl propionate and 2-methylpropenyl acetate. Also inferred to apply to RC(=O)OCH=CHR. e Based on data for i-propenyl acetate. Also inferred to apply to RC(=O)OC(R)=CHR and RC(=O)OC(R)=CR 2 . f Based on data for methacrylic acid and trans-pent-2-enoic acid. Also inferred to apply to bracketed structures shown.
O 3 with vinylic alcohols are very limited, because kinetics studies tend to be complicated by keto-enol tautomerism.
A recent theoretical study of the reaction of O 3 with 4hydroxy-pent-3-en-2-one (Ji et al., 2020), the enolic tautomer of pentane-2,4-dione (acetyl acetone), suggests that the presence of the hydroxy group has a limited effect, the reported rate coefficient at 298 K (2.4 × 10 −17 cm 3 molecule −1 s −1 ) being comparable with k VC7O3 . However, it is noted that this is more than an order of magnitude greater than the laboratory determination of Zhou et al. (2008) for the reaction of O 3 with the tautomeric mixture of pentane-2,4-dione and 4hydroxy-pent-3-en-2-one. Further information is clearly required to allow the effects of vinylic hydroxy groups to be defined with confidence. Until then, they are provisionally assumed to have the same influence as vinylic H atoms in the present work.

Combinations of groups
Data for compounds containing two different vinylic oxygenated substituents listed in Tables 6-8 are limited to 4-methoxy-but-3-ene-2-one. This therefore falls into both of the CHR=CHC(=O)R and ROCH=CHR generic structure categories, and the rate coefficients for these categories, k VC7O3 and k VO3O3 , differ by an order of magnitude. The rate coefficient for 4-methoxy-but-3-ene-2-one (1.3 × 10 −17 cm 3 molecule −1 s −1 ) is actually a factor of three lower than that for the less reactive category (k VC7O3 ). Based on this, it is tentatively suggested that the estimated rate coefficient for compounds containing two different vinylic oxygenated substituents should be based on the less reactive category. Data for compounds containing both vinylic and allylic oxygenated groups are limited to 2-methyl-4-nitrooxy-cis-2buten-1-al, which contains a vinyl -C(=O)H group and an allyl -ONO 2 group. In this case, the rate coefficient (4.4 ×  10 −18 cm 3 molecule −1 s −1 ) is in reasonable agreement with that of the relevant vinylic category, k VC4O3 (Table 6), suggesting that the allyl -ONO 2 group has almost no additional deactivating effect in this case. This is consistent with the relative insensitivity of the vinylic rate coefficients to the presence of additional substituents, and it is therefore tentatively suggested that the appropriate rate coefficient in Tables 6-8 can be applied, with no additional effect from an allylic sub-stituent in relevant cases (i.e. the factors in Table 2 are only applied with rate coefficients derived from those shown in Tables 1 and 4).

Initial products and branching ratios
It is well established that the addition of O 3 to a C=C bond leads to initial formation of a primary ozonide (POZ), or 1,2,3-trioxolane product (e.g. Calvert et al., 2000;Johnson and Marston, 2008). In compounds with multiple C=C bonds, the SAR methods developed here define k calc in terms of a summation of the various alkene and/or conjugated dialkene structures within the poly-unsaturated compound, and therefore also provide the basis for estimating branching ratios for the formation of the various isomeric POZs (e.g. see calculations B4, B5 and C2 in the Supplement). Using βocimene as an example, the present SARs provide a value of k calc = k A5O3 + k D4O3 = 5.5 × 10 −16 cm 3 molecule −1 s −1 at 298 K, which agrees well with the preferred value of k obs = 5.1 × 10 −16 cm 3 molecule −1 s −1 . The component rate coefficients also indicate that the reaction is expected to occur predominantly (85 %) at the isolated C=C bond, leading to the formation of POZ1, as shown in the schematic in Fig. 6. This conclusion is also supported by comparison of k obs with that reported for the reaction of O 3 with the βocimene oxidation product 4-methyl-hexa-3,5-dienal, which retains the conjugated dialkene structure (Baker et al., 2004). The addition of O 3 to conjugated dialkene structures leads to the formation of either of two primary ozonides, as shown in Fig. 6 for the example of the minor channel of β-ocimene ozonolysis (POZ2 and POZ3). In the cases of symmetrically substituted conjugated dialkene structures (i.e. CH 2 =C(R)C(R)=CH 2 , CHR=CHCH=CHR, CR 2 =CHCH=CR 2 and CHR=C(R)C(R)=CHR, where "R" represents any alkyl group or remotely substituted oxygenated group) it is reasonable to assume that the addition of O 3 occurs equally at the two possible sites. There has only been limited information reported on the products of the reactions of O 3 with unsymmetrically substituted conjugated dialkenes. Most of this information relates to the reaction of O 3 with isoprene (CH 2 =C(CH 3 )CH=CH 2 ), but with selected product yields reported for subsequently formed carbonyl compounds in a few other cases (Lewin et al., 2001). In the case of isoprene, product information indicates that the addition of O 3 occurs significantly at both sites, but with about 60 % at the less substituted -CH=CH 2 bond (e.g. Aschmann and Atkinson, 1994;Nguyen et al., 2016), and the same branching ratio can therefore reasonably be assigned to CH 2 =C(R)CH=CH 2 structures in general. The information for other conjugated dialkenes also suggests preferential addition of O 3 at the less substituted C=C bond (Lewin et al., 2001;Mackenzie-Rae et al., 2016). On this basis, it is tentatively assumed that 60 % of addition occurs at a less substituted C=C that contains one fewer alkyl substituents (e.g. as in CH 2 =CHCH=CHR or CHR=CHC(R)=CHR), 70 % at a less substituted C=C bond that contains two fewer alkyl substituents (e.g. as in CH 2 =CHC(R)=CHR or CHR=CHC(R)=CR 2 ) and 80 % at a less substituted C=C bond that contains three fewer alkyl substituents (i.e. as in CH 2 =CHC(R)=CR 2 alone). Clearly, further information is required to allow these addition ratios to be assigned with greater certainty. In the absence of reported mechanistic data, the same rules are also applied to conjugated dialkene structures with vinylic or allylic oxygenated groups.

Conclusions
Updated and extended SAR methods have been developed to estimate rate coefficients for the reactions of O 3 with unsaturated organic species. The group contribution methods were optimized using a database including a set of preferred rate coefficients for 221 species. The overall performance of the SARs in determining log k 298 K is now summarized.
The distribution of errors (log k calc /k obs ), the root-meansquared error (RMSE), the mean absolute error (MAE) and the mean bias error (MBE) were examined to assess the overall reliability of the SAR. The RMSE, MAE and MBE are here defined as where n is the number of species in the dataset. A total of 198 of the 221 species in the database contributed to the statistical analysis. Six species could not be included, because only upper-or lower-limit recommendations are available. In addition, the SAR methods do not currently include styrenes, heterocyclic species and conjugated poly-alkenes (11 species), and the smallest species in some homologous series (i.e. ethene, buta-1,3-diene, acrolein, butenedial and acrylic acid) are not covered by the SAR categories because the double bonds do not contain (additional) organic substituents (see Sects. 3 and 4). Finally, because of the factorof-60 difference between k calc and k obs , 3,4-diethylhex-2-ene was also excluded from the statistical analysis as an outlier (see Sect. 3.1). However, it is emphasized that there is no firm basis for believing that the reported rate coefficient for 3,4-diethylhex-2-ene (Grosjean and Grosjean, 1996) is any less reliable than many other rate coefficients in the database. Given the substantial disagreement between k calc and k obs , confirmatory measurements of that rate coefficient, and data for other similar branched alkenes, would clearly be valuable to help test and refine the methods presented here. Figure 7 summarizes the statistics for the full set of 198 species, for acyclic and cyclic species collectively, and for various alkene and unsaturated oxygenate subsets. With the exception of the poly-alkene and remotely substituted oxygenate subsets, the calculated log k 298 K for all categories shows no significant bias, with MBE at or below 0.06 log units and with median values of the error distributions close  . Root-mean-square error (RMSE), mean absolute error (MAE), mean bias error (MBE) and box plot for the error distribution in the estimated log k 298 K values for the full set and subsets of the unsaturated species in the database. The bottom and the top of the boxes are the 25th (Q1) and 75th percentiles (Q3); the black band is the median value. The whiskers extend to the most extreme data point, which is no more than 1.5× (Q3-Q1) from the box. The black dotted lines correspond to agreement within a factor 2.
to 0. Overall, the SAR methods overestimate k 298 K for polyalkenes and remotely substituted oxygenates by a factor of about 1.5. This is likely due to a number of contributory factors that are not fully accounted for in the SAR methods, including effects of remote substituents on double bond reac-tivity (e.g. see Sect. 4.2) and possible systematic ring strain effects in cyclic poly-alkenes that are incompatible with the factors derived from simpler compounds (see Sects. 3.2 and 3.4).
The RMSE for the various alkene and unsaturated oxygenate subsets covers the range from 0.11 to 0.27 log units; i.e. the relative errors for the calculated k 298 K lie in the range 29-86 %. Of these, the poly-alkene and remotely substituted oxygenate subsets again have values towards the high end of the range (0.27 and 0.23, respectively). The RMSE for the mono-alkene subset is also elevated (0.26), this being mainly due to the influence of polycyclic species on the overall statistics. Accordingly, the RMSE of cyclic species collectively (0.34) is substantially higher than that for acyclic species (0.15), corresponding to relative errors for the calculated k 298 K of about 120 % and 40 %, respectively. The large errors for cyclic species result from the difficulties in accounting fully for ring strain and steric effects in polycyclic alkenes and cyclic poly-alkenes, as also illustrated in Fig. 2. Finally, for the full database, the SARs give fairly reliable k 298K estimates, with a MAE of 0.13 and a RMSE of 0.21, corresponding to an overall agreement of the calculated k 298K within about 60 %. Although this level of agreement is considered reasonable, it is noted that the methods generally do not perform as well as those for the reactions of OH with alkenes and unsaturated organic oxygenates (Jenkin et al., 2018a). This may be explained by the O 3 reaction being a concerted process, which is more influenced by orientational effects, ring strain and steric hindrance than the OH reaction (e.g. see Johnson et al., 2000), and therefore less easy to represent with a practical SAR. As discussed in Sects. 3 and 4, and highlighted by Vereecken et al. (2018), additional kinetics studies would be highly valuable for some classes of alkene and unsaturated oxygenate to help the SAR methods to be further assessed and refined, including data for multifunctional oxygenated species in particular. Data availability. All relevant data and supporting information have been provided in the Supplement.