of isoprene emission rates using a neural network approach

Using a statistical approach based on artificial neural networks, an emission algorithm (ISO-LF) accounting for high to low frequency variations was developed for isoprene emission rates. ISO-LF was optimised using a data base (ISO-DB) specifically designed for this work, which consists of 1321 emission rates collected in the literature and 34 environmental variables, measured or assessed using National Climatic Data Center or National Centers for Environmental Predictions meteorological databases. ISO-DB covers a large variety of emitters (25 species) and environmental conditions (10° S to 60° N). When only instantaneous environmental regressors (instantaneous air temperature T0 and photosynthetic photon flux density L0 ) were used, a maximum of 60% of the overall isoprene variability was assessed with the highest emissions being strongly underestimated. ISO-LF includes a total of 9 high (instantaneous) to low (up to 3 weeks) frequency regressors and accounts for up to 91% of the isoprene emission variability, whatever the emission range, species or climate investigated. ISO-LF was found to be mainly sensitive to air temperature cumulated over 3 weeks ( T21 ) and to L0 and T0 variations. T21 , T0 and L0 only accounts for 76% of the overall variability.


Introduction
Chemistry-Transport models are commonly used to assess, at local or global scales, the distribution of tropospheric species, such as ozone.Appropriate and accurate emission data are needed to initialise their chemical modules.Emissions of gaseous compounds in the atmosphere can be related to human activities and natural processes.Volatile organic compounds emitted from vegetation, usually referred to as Correspondence to: C. Boissard (boissard@lisa.univ-paris12.fr)biogenic or BVOC, are key species in atmospheric chemistry processes.Indeed, global biogenic volatile organic compound fluxes are believed to exceed their anthropogenic inputs by a factor of 10 (Müller, 1992;Guenther et al., 1995) and, due to their high reactivity, they were shown, on regional to global scale, to significantly influence atmospheric chemistry and climate (Fehsenfeld et al., 1992;Simpson, 1995;Poisson et al., 2000;Steinbrecher et al., 2000;Sanderson et al., 2003).Therefore, the assessment of accurate and highly resolved BVOC emission fluxes represents a major goal for environmental issues and in particular of isoprene (C 5 H 8 ) fluxes, the major BVOC (Guenther et al., 1995;Simpson et al., 1999).
However, due, in part, to a variability which ranging over several orders of emission magnitude, isoprene emission assessments remain critical and uncertain.Those variations are resulting from a complex set of biophysical regulations to ambient condition changes.Indeed, isoprene emission variability is closely triggered by leaf developmental stage and emissions occur only when leaves are grown or are growing.For deciduous trees, induction of isoprene emissions was observed to happen 200, 300, and 400 cumulated degree day (d.d., • C) after bud break for Quercus macrocarpa (Petron et al., 2001) , Quercus alba (Geron et al., 2000) and Populus tremuloides (Monson et al., 1994) respectively.Highest emissions are generally observed for fully developed leaves.For Quercus alba and Quercus Macrocarpa maximal isoprene emissions were observed 600 and 700 d.d.respectively after bud break.Depending on local environmental conditions, such d.d.values were reached within a period of time ranging from few days to 3 weeks.With leaf senescence, isoprene emissions decrease down to non detectable levels.Moreover, when a leaf is emitting, the rapid enzymatic activity adaptations can lead to an additional type of fast (seconds to minutes) variations of isoprene emissions.Such "instantaneous" variations are well described by specific emission algorithms based on instantaneous photosynthetic photon flux Output values y calc are assessed by a weighted sum of input parameters x i .w 0 is the connecting weight between the bias (initial random values optimised to obtain the co-ordinate at the origin of the neuronal regression) and y calc , w j,k the connecting weight between the neuron N j and y calc , w 0,j the connecting weight between the bias and the neuron N j , and w i,j the connecting weight between the input x i and the neuron N j .density (PPFD) and air temperature (G93 algorithm, Guenther et al., 1993), or on the previous day PPFD and air temperature values (Lehning et al., 1999;Zimmer et al., 2000;Fischbach et al., 2002).Another source of emission variations, in some occasions even more critical than leaf developmental stage, originates from the acclimation of a plant to more or less long term environmental changes.For instance, the onset of kudzu isoprene emissions were observed to be shortened by one week under elevated temperature growth conditions compared to cold growth conditions (Wiberley et al., 2005).Light acclimation was found to be more complex for oak species, with a first impact observed within few hours and a second one after 4-6 days (Hanson and Sharkey, 2001).Similarly, isoprene emissions from mature oak leaves were found to be significantly correlated with air temperatures averaged over the previous first, 2 and 7 days (T 1, T 2 and T 7 respectively) and with photosynthetic active radiations averaged over the previous 2 days (PPFD2), the strongest correlation being with T 2×PPFD2 (Sharkey et al., 1999).
Most of the general parameterisations developed so far for isoprene emissions (e.g.Tingey et al., 1979;Guenther et al., 1991;Guenther et al., 1993;Sharkey and Loreto, 1993;Lehning et al., 2001;Guenther et al., 2006;Arneth et al., 2007) assign an emission factor to an emitter or a group of emitters which is then modulated by some relevant environmental parameters (air temperature, light intensity and CO 2 ) prevailing over a period ranging from few minutes to 10 days before the measurement.However, these parameterisations mainly describe the most rapid variations of isoprene emissions and do not consider acclimation over more than 10 days (Guenther et al., 2006).Nevertheless, lower frequency (e.g.seasonal) variations of a tree capacity to release isoprene were observed to account for a significant, in some cases the major, part of the overall observed emission fluctuations, and reach up to 3 orders of standardised emission rates magnitude (Monson et al., 1994;Geron et al., 2000;Boissard et al., 2001;Petron et al., 2001).If not correctly assessed this low frequency variability can represent a major source of discrepancies in isoprene emission assessments (Guenther et al., 1995).
Artificial neural networks (ANNs) have shown in various occasions their capacity to account for some complex sets of environmental interactions.For instance, multiple nonlinear regression technique based on ANNs was employed by Lasseron (2001) to assess variations of Ulex europaeus isoprene emission rates using environmental parameters integrated over few days to few weeks prior to the measurements.Simon et al. (2005) used a similar technique to couple isoprene and monoterpene emissions measured from Amazonian tree species with physiological and environmental regressors.ANNs were also used to provide kilomer scale emission maps of European forest carbon fluxes (Papale and Valentini, 2003), and to improve assessments of biogenic soil NO x emission variations (Delon et al., 2007).
In this study, using an appropriate database specifically built for this work (ISO-DB), ANNs were implemented in order to develop an isoprene emission rate algorithm (ISO-LF) accounting for high (instantaneous) to low (weeks) frequency variations of ambient conditions and for a large set of species.ISO-LF development, performances and sensitivity are presented and discussed.

The overall strategy
A similar methodology than the one employed by Lasseron (2001) for Ulex europaeus was used for this work but applied to a wider range of isoprene emitters and environmental conditions.Briefly, non linear regressions between isoprene emission rates reported in the literature and a set of environmental parameters were calculated and examined.Physiological parameters, such as net assimilation, transpiration, stomatal conductance were not considered due to their difficulty to be assessed afterward when not directly provided in the literature reference.Moreover, when both, environmental and physiological input parameters, were considered, best assessments of isoprene and monoterpene emissions were found, in tropical conditions, to be obtained when environmental information (instantaneous PPFD and air temperature, and averaged temperature of the preceding 48-18 h) was employed (Simon et al., 2005).
Non linear regressions were assessed using ANNs.Among the other available statistical methods, ANNs present the advantage of being the most parsimonious (Dreyfus et al., 2002).Moreover, ANN approach, as the other non-linear regression methods, is not, or not very, sensitive to regressor co-linearity (Bishop, 1995;Dreyfus et al., 2002).

Neural network description and setting
The neural network developed in this study was used as a Multi Layer Perceptron (MLP).Further details concerning the MLP theory can be found in Aleksender and Morton (1990) and White (1992).Briefly, a MLP consists in a network of processing units (the neurons or artificial neurons) N j , all connected to each other and arranged in different layers (input, hidden and output layer, Fig. 1).Such a neuron arrangement learns (or approximates) during a training phase the information contained in a set of experimental data or output data y mes (in our case the isoprene emission rates).When the MLP is trained, a set of n inputs data x i (in our case, the environmental parameters) is processed several times in order to adjust a weighted sum x i •w i,j where w i,j represents the optimised weights calculated by non-linear regressions to y mes .x i •w i,j is then modified (or transferred) by a transfer function f in order to calculate y cal as follows: where w 0 is the connecting weight between the bias (initial random values optimised to obtain the co-ordinate at the origin of the neuronal regression) and y calc , N the number of neurons N j , w j,k the connecting weight between the neuron N j and y calc , w 0,j the connecting weight between the bias and the neuron N j, and w i,j the connecting weight between the input x i and the neuron N j .The transfer function f consists of a parameterised asymptotic "S" shape function such as sigmoid or hyperbolic tangent.The training process starts from randomised values of weights which are iteratively adjusted using a second order Quasi Newton back propagation technique until the minimum difference E between y mes and y calc reaches the point where the first derivative of E equals zero.For this study, E was calculated as follows: where z is the number of output values.For our study, a large number (300) of iterations were selected for every ANN run in order to make sure that E did not correspond to a local error minimum.During a validation phase, or blind validation, ANN performances are assessed by the root mean square error RMSE obtained for these data which were not used during the training phase.A special set of validation data is thus required before start.
The training/validation data splitting represents a key step in the neural approach.For this work, the training-validation division was first carried out by considering different climates (tropical, temperate with dry summer, temperate without dry summer, and cold and humid).For every climate, data were then classified according to their emission strength (strong, medium and small as in Guenther et al., 1995).Each of the 11 sub datasets thus obtained was finally splited between training (80%) and validation (20%) data using a Kullback-Leibler distance function (Kullback, 1951)  1.87, 14.57 and 38.53 µgC (g foliar dry weight) −1 h −1 (hereafter, µg C g −1 dwt h −1 ) respectively, close to the validation values of 30.39, 2.75, 16.08 and 38.50 ) and smallest (5.0×10 −4 µg C g −1 dwt h −1 ) isoprene emission rates were forced into the training database, since the neural approach is only valid within interpolation.
The neural network developed in this study was based on a commercial version of the Netral NeuroOne software (v.6.0http://www.netral.com,France)

ISO-DB description
The isoprene database ISO-DB designed for this study consists of: 1. isoprene emission rate values (n=1321) extracted in the literature and obtained from previous in-situ studies.Most of the data collected were available under figures which were digitally numerised.All emission rates were expressed in ISO-DB in µg C g −1 dwt h −1 .Leaf based emission rates were considered and converted into mass based emission rates only when a specific leaf mass conversion factor was provided together with the data.Isoprene emissions being negligible at night, only daytime data were used.All emission rates represent branch level measurements carried out at the top of the canopy, except for Liquidambar whom emission rates were additionally measured 12 m under the top of the 22 m canopy.Most (93%) of the measurements were obtained using branch enclosure technique, the other 7% from leaf cuvette system.A total of 25 broadleaved and coniferous trees species, grown under environmental conditions ranging from tropical (10 • S) to boreal (60 • N) climates were considered (Table 1).Most of these species are representative of moderate to high isoprene emitters (i.e.standardised emissions rates higher than 35 and 70 µg C g −1 dwt h −1 respectively, as in Guenther et al., 1995).Emission rate values were shown to vary over more than 4 orders of magnitude, from, approximately, 5×10 −4 to 3×10 2 µg C g −1 dwt h −1 (Fig. 3), with a mean and median value of 30.1 and 14.8 µg C g −1 dwt h −1 respectively.
2. the temperature (T 0) and PPFD (L0) values, hereafter referred to as "instantaneous", recorded during the sampling time; T 0 and L0 values were found to range from 2 to 42 • C (0 to 2400 µmol m −2 s −1 respectively), with a mean and median value of 25.1 and 25.5 • C (680 and 590 µmol m −2 s −1 respectively).
3. 32 other environmental regressors which were examined for their ability to account for environmental changes during and before the emission measurements (Table 2).They were integrated over 1 to 21 days preceding the measurements using daily mean values extracted from NCDC meteorological data for air temperatures and rainfall or NCEP reanalysis data for soil variables and solar radiations.All the selected meteorological stations were within a 30 km distance of each measurement site, except for the Kuhn et al. (2002Kuhn et al. ( , 2004) ) 3. Measured isoprene emission rates (µg C g −1 dwt h −1 ) compiled in ISO-DB vs the day of year of their measurement, according to the tree type (coniferous evergreen, CEV; broadleaved deciduous, BLD; and broadleaved evergreen, BLEG) and to the climate (cold, C; temperate, T; temperate with dry summer, Td, and tropical,TR).
where λ is the latitude, δ=23.45 • the Earth inclination, r the day of the year, and n the annual number of days.

Data pre-processing
Values of each input were compiled in ISO-DB using a same unit (Table 2).Because each input is expressed in a different unit, absolute values are highly variable from one input to another.To prevent any input regressor x i to get an artificially stronger weight in the neural regressions, every input was centrally-normalised as follows: where x i is the x i mean and s xi its associated standard deviation, both calculated over the entire database.Isoprene emission rates were similarly treated.In addition, due their large range of variation (5 orders of emission magnitude), log values were used in the ANNs.

Results and discussion
When not mentioned, results hereafter presented were obtained for validation data.
3.1 Are L0, T 0, L1 and T 1 sufficient to account for the overall BVOC emission variability?
In order to make sure that the variability of ISO-DB emissions is not only triggered by high frequency environmental changes, the impacts of L0, T 0, L1 and T 1, recognised for their role in describing short term acclimation of isoprene emissions (Guenther et al., 1993;Geron et al., 2000;Lehning et al., 2001) were evaluated.Two series of ANN tests were conducted: with L0 and T 0 only (ANN0 case) and with T 0, T 1, L0 and L1 (ANN01 case).ANN0 accounted for a maximum of 60% of the isoprene emission variability (Table 3).When L1 and T 1 were additionally considered in the ANNs, 10% of the isoprene variability could additionally be accounted for, which represents, at the 95% confidence level, a significant improvement compared to ANN0 case.Most of the remaining 30% of the variability not described was associated with the highest isoprene emissions which were underestimated by up to two orders of magnitude (results not shown), whatever the species or the environmental conditions.

ISO-LF development
The development of ISO-LF was carried out by training the ANNs until the best combination between a relevant set of environmental regressors x i and a network structure
In term of neuronal structure, the number of iterations was fixed at 300 and a second order Quasi-Newton backpropagation employed.Among the different transfer functions available, the hyperbolic tangent tanh was used.A number of 1 to 7 neurones were tested.RMSE validation was shown to decrease for a higher number of neurons until a minimum value of 0.293 was reached for N =4.When more than 4 neurons were used, RMSE validation was showed to increase again indicating an overtraining phenomenon (data not shown).
Table 2. Tested ISO-DB environmental input regressors assessed using daily mean values, except for T0 and L0 (instantaneous).Daylight length D1 is in h, air and soil temperatures in • C, L0 in µmol m −2 s −1 , solar fluxes L1-L21 in W m-2, precipitations in mm and soil water contents in fraction of volume (0-1).* are regressors rejected using the variable probe technique and the covariance analysis.• are regressors showing a weak influence on the overall isoprene variability.In bold, the regressors eventually considered in ISO-LF.3. Comparison of the performances (slope s, correlation coefficient r 2 , root mean square error RMSE and mean bias error MBE) obtained using (L0, T 0) -ANN0 case, (L0, T 0, L1, T 1) -ANN01 case, (L0, T 0, T 21) -ANN021 case, (L0, T 0, L1, T 1, T 21) -ANN0121 case, and ISO-LF.Using the statistical probe technique (Chen et al., 1989) and a covariance analysis, 15 of the 34 inputs x i were rejected since they were found to have no statistical influence on isoprene emissions (Table 2).For every of the 19 remaining x i , the slope of y calc =f [(x i ) x j ] (where x j is the mean for every input x j , j =i-1 and j =i) was examined.For 10 of them, a slope close to zero was obtained, indicating their weak influence on the overall isoprene variability.They were no longer considered in the ANN trainings, and, as shown in Table 2, a total of 9 x i were eventually considered in ISO-LF: the instantaneous air temperature T 0 and light intensity L0, the (d-1) day mean (T 1) and minimum (T 1m) air tem-peratures, solar radiation (L1) and soil temperatures (ST1u), the precipitation cumulated over 14 and 21 days (P 14 and P 21 respectively), and the air temperature cumulated over 21 days T 21.When a single one of these 9 inputs was excluded from the statistical analysis, isoprene emission assessment error was, at the 95% confidence level, significantly increased.One third of ISO-LF inputs represents adaptations on a time scale of at least one week, and more than half of them (T 0, L0, L1, T 1m and T 21) was previously reported as positively influencing isoprene emissions under in in-situ conditions.
The general equation obtained for ISO-LF is given in appendix A.

ISO-LF performances
As shown Fig. 4a and b, ISO-LF was found to account for 90% of the overall isoprene emission variability, a result which is, at the 95% confidence level, significantly better than for the G93 algorithm (55%, Fig. 4a,b), and the ANN0 (56%) or ANN01 cases (70%) (Table 3).Moreover, this good performance was obtained over the whole emission range, including the highest emission values, and what ever the climate or the species type.The few outliers correspond to statistically poorly represented situations (e.g., for some of the Ulex europaeus measurements, sudden cloud x i , while all the other inputs x j (j =i-1 and j =i) were fixed to their mean values.
occurrences during sampling or summer late afternoon samplings with low light intensity but still elevated temperature).
As shown in Table 3, the air temperature cumulated over several weeks (T 21), previously observed to account for some of the seasonal variations of isoprene emissions (Monson et al., 1994;Geron et al., 2000), was found, at the 95% confidence level, to significantly improve the results obtained for ANN0 and ANN01 cases: 76 and 85% of isoprene variability was accounted for in the ANN021(T 0, L0, T 21) and ANN0121(T 0, L0, T 1, L1, T 21) case respectively.However, ISO-LF performances were, at the 95% confidence level, found to remain better (r 2 =0.90) and, in particular, for the highest and lowest emission rates which were poorly assessed in the ANN021 and ANN0121 (data not shown).

ISO-LF sensitivity
When non linear regressions are used, the weight of each individual factor in the global variance, (here, the sensitivity of isoprene emission rates variability to every of the 9x i used in ISO-LF) is rather complex to assess and the interpretation of the results not straightforward.However, in order to help in having an idea to which environmental variable x i ISO-LF is more sensitive, s xi was calculated as follows: where y calc is the variation of the predicted isoprene emission rate obtained for a given variation x i of the input x i , while the other inputs were set to their mean x j , where j =i-1 and j =i.s xi was calculated (i) for the entire dataset and (ii) for data related to 4 different climates (temperate with and without dry summer, tropical, and cold and humid) and for every season (only summer for cold climate).
As shown in Fig. 5a, ISO-LF is, on the overall, mainly sensitive to T 21 (s T 21 =0.46), and to T 0 (s T 0 =0.32) and L0 (s L0 =0.25), whatever the climate or the season.T 1, T 1m and P 21 were found to have a smaller weight on the predicted isoprene emission variability (s xi < 0.2) and the lowest sensibility of ISO-LF was observed for ST1u and L1 (s xi of 0.02 and 0.04 respectively).
s xi values appeared to be correlated with the magnitude of the environmental condition fluctuations (Fig. 5b-e): the lowest s xi were generally associated with data measured under tropical climate (s xi < 0.15, Fig. 5b), when higher s xi values were obtained for more contrasted climates.The overall s xi pattern of every climate remains, however, similar to the one obtained with all data, except for P 14 in autumn for temperate climate with dry summer data (Fig. 5d), and for ST1u (the upper layer soil temperature of the preceding day) which represents the second most important contributor in winter under temperate climate (Fig. 5e).Soil nutrient uptake, such as nitrogen, is known to be strongly dependent on the microorganism activity, which is itself directly controlled by soil temperature (e.g.Bassirirad, 2000).However, direct impacts of soil temperature on isoprene emissions having not been reported so far, the observed ST1u predominance remains unclear.Moreover, this result cannot be generalised due to the poor statistical representation of winter conditions under temperate climate which relies only on Ulex europaeus measurements.Under temperate climate with dry summer, isoprene emission regulation was found to be critically dependent on autumnal conditions, with most of the highest s xi obtained for this season (Fig. 5d).Unexpected high monoterpene emissions have been previously measured in several occasions in the Mediterranean area in October (Bertin et al., 1997;Owen et al., 1998).This observation and our finding suggest that BVOC emissions from plant growing within Mediterranean climates may be quite sensitive to autumnal conditions, in particular to low frequency variations of air temperature (S T 21 >2).For temperate climate data, T 21 remains the dominant ISO-LF input (Fig. 5e), in particular during winter and spring (S T 21 of 0.76 and 0.59 respectively).

Conclusions
Multiple non-linear regressions based on ANNs were implemented to develop an isoprene emission algorithm (ISO-LF) accounting for high (instantaneous) to low (weeks) frequency variations.1321 isoprene emissions rates extracted from the literature were specifically compiled in a database (ISO-DB), which covers a large variety of isoprene emitters (25 deciduous and coniferous species) grown under latitudes ranging from 10 • S to 60 • N, and describes a set of 34 environmental high to low frequency environmental regressors.The instantaneous air temperature T 0 and light intensity L0 alone were found to account for a maximum of 60% of ISO-DB isoprene emission variability.When the preceding day information was additionally considered, this figure increases up to 70%, the remaining 30% being mostly associated with the highest emission rates.
ISO-LF algorithm, obtained from a best combination made of the (d-1) day minimum air and soil temperatures, the precipitations cumulated over 2 and 3 weeks, and the cumulated air temperature over 21 days (T 21), accounts for up to 90% of the overall isoprene variability.None of these inputs were artificially selected in any of the ANN optimisation processes.ISO-LF was found to be mainly sensitive to T 0, L0 and T 21.More precisely, T 21 was found to be particularly critical during spring for temperate climates, and during autumn for temperate climates with dry summers.
These findings are in agreement with previous experimental findings and prove the ability of ANNs to help in accounting for the complex regulations of BVOC emissions.This work also confirms that some parameters other than L0 and T 0 can be successively considered to significantly reduce the uncertainties on isoprene emission assessments and that, among the different parameters, environmental ones represent a relatively straightforward solution.
ISO-LF can be routinely updated and improved by adding new emission data in ISO-DB.In particular, factors not available for this study (e.g. the nitrogen content or the soil characteristics known to affect the plant water and nutriment uptake) or broadly assessed with meteorological datasets should be tested and better assessed during ad-hoc seasonal campaigns (measurement of isoprene emissions and environmental information before and between samplings).
Such an approach could also be extended for other BVOC emissions such as monoterpenes or sesquiterpenes.BVOC canopy fluxes, rather than emission rates, would also be good candidates for such a neuronal approach.w i,4 .x j

CFig. 1 .
Fig.1.Structure and principle of a Multi Layer Perceptron.Output values y calc are assessed by a weighted sum of input parameters x i .w 0 is the connecting weight between the bias (initial random values optimised to obtain the co-ordinate at the origin of the neuronal regression) and y calc , w j,k the connecting weight between the neuron N j and y calc , w 0,j the connecting weight between the bias and the neuron N j , and w i,j the connecting weight between the input x i and the neuron N j .

Fig. 2 .
Fig. 2. Comparison of the statistical characteristics of ISO-DB isoprene emission rates for the training (n=1065) and validation (n=259) databases.The lower, medium and upper horizontal bars correspond to the first, median and 3rd quartile respectively.Mean values are represented by crosses.Minimum and maximum values are represented by the vertical bars.

Fig. 4 .
Fig. 4. Comparison between the log of isoprene emission rates calculated using ISO-LF (black circles) and G93 (open squares, Guenther et al., 1993, with I s =5 µg C g −1 dwt h −1 ) vs. the measured isoprene emission rates for (a) validation data and (b) training data.The 1:1 line is shown (dotted line).

i
cumulated over p days preceeding the d-day (included)

Fig. 5 .
Fig. 5. ISO-LF sensitivity (s xi ) to the inputs x i , for (a) all ISO-DB data, (b) tropical climate date (wet and dry season), (c) cold climate data (summer only), (d) temperate climate with dry summer (all seasons), and (e) temperate climate (all seasons).s xi was calculated by varying x i , while all the other inputs x j (j =i-1 and j =i) were fixed to their mean values.

Table A1 .
w: the optimised weights as follows:

Table A2 .
x i : the selected input regressors as follows: T 0 L0 T 1m T 1min T 21 P 14 P 21 ST1u L1