Estimation of the terms acting on local surface one-hour temperature variations in Paris region: the specific contribution of clouds

. Local short-term temperature variations at the surface are mainly dominated by small-scale processes coupled through the surface energy balance terms, which are well known but whose specific contribution and importance on the hourly scale still need to be further analyzedLocal temperature variations at the surface are mainly dominated by small-scale processes coupled through the surface energy budget terms, which depend mostly on radiation availability and thus cloud processes. A 10 method to determine each of these terms based almost exclusively on observations is presented in this paper, with the main objective to estimate their importance in hourly near-surface temperature variations at the SIRTA observatory, near Paris. Almost all terms are estimated from the multi-year dataset SIRTA-ReOBS, following a few parametrizations. The four main terms acting on temperature variations are radiative forcing (separated into clear -sky and cloudy sky radiation), atmospheric heat exchange, ground heat exchange, and advection. Compared to direct measurements of hourly temperature variations, it is 15 shown that the sum of the four terms gives a good estimate of the hourly temperature variations, allowing a better assessment of the contribution of each term to the variation, with an accurate diurnal and annual cycles representation, especially for the radiative terms. A random forest analysis shows that whatever the season, clouds are the main modulator of the clear sky radiation for 1-hour temperature variations during the day, and mainly drive these 1-hour temperature variations during the night. Then, the specific role of clouds is analyzed exclusively in cloudy conditions considering the behavior of some classical 20 meteorological variables along with lidar profiles. Cloud radiative effect in shortwave and longwave and lidar profiles show a consistent seasonality during the daytime, with a dominance of mid- and high-level clouds detected at the SIRTA observatory, which also affects near-surface temperatures and upward sensible heat flux. During the nighttime, despite cloudy conditions by the circulation of large-scale air masses (such as the NAO or blocking regimes) but also by smaller-scale processes such as cloud radiation and surface-atmosphere interactions located within the atmospheric boundary layer. In this paper, a method is developed and evaluated to quantify each process that affects hourly 2 m-temperature variations on a local scale (near Paris), based almost exclusively on observations. The method exhibits good accuracy and is able to quantify well the realistic diurnal and monthly-hourly cycles of each term involved, especially in summer where statistics are the best. The clear-sky radiative term represents the biggest positive (negative) contribution during daytime (nighttime) with values reaching up to 4 ° h -1 (-1.7 ° h -1 ), whereas clouds cool (warm) up to -3.7 ° h -1 (1.4 ° h -1 ). The atmospheric heat exchange becomes predominant in the late afternoons when turbulent 545 fluxes develop due to the increase in atmospheric instability. Some of the biases found are due to the difficulty of the model to well reproduce smaller-scale processes such as the cold pool events.

variability. Temperature and pressure conditions are then modulated by the complex terrain (Mediterranean sea, topography, 30 surface heterogeneities): extreme events and temperature anomalies are generally not exclusively explained by the presence of these large-scale air mass circulations (Vautard and Yiou, 2009). Indeed, synoptic and meso-scale atmospheric processes have been previously studied to explain interannual temperature changes in some parts of Europe (Efthymiadis et al., 2011;Xoplaki et al., 2003), or even precipitation occurrence (Xoplaki et al., 2004;Bartolini et al., 2009). It is, therefore, necessary to consider small-scale processes such as surface-atmosphere interactions and cloud feedbacks to better explain near-surface 35 temperature variations (e.g. Chiriaco et al., 2014).
Temperature variations at the surface are related to the Surface Energy Balance (SEB) following surface-atmosphere interactions and solar radiation (Wang and Dickinson, 2013) that are separated into different components: latent and sensible heat fluxes (F lat and F sens , respectively), ground heat flux, air advectiontemperature advection, and atmosphere radiation.
Studies have been made to parametrize these terms when direct measurements are not available (Miller et al. 2017;Arnold et 40 al. 1996) but their exact contribution is uncertain. Depending on the time-scale considered, the importance of each SEB term on temperature variations will change Ionita et al. 2015).
The first objective of the current paper is to quantify the specific local contribution of the primary SEB terms acting on shortterm (i.e. hourly) temperature variations in a Western European location and to determine their importance and the conditions when a certain term predominates over the others, based on a simple linear model. The current study is inspired by Bennartz 45 et al. (2013) who implemented a temperature variation model to study the influence of low-level liquid clouds over the Arctic on the ice surface melt period in July 2012, by estimating all of these SEB terms for a case study. Here, the same approach is used for a different location (mid-latitude) and on a long-time period to get robust statistics. To do so, the study is based on direct measurements from the SIRTA-ReOBS dataset , which includes many variables collected since 2002 at SIRTA (Site Instrumental de Recherche en Télédetection Active; Haeffelin et al. 2005), an observatory located in a 50 semi-urban area in the Southwest of Paris, France. This dataset is well suited to the current study objective because: (i) it allows the use of a multi-variable synergistic compilation to study and compare different characteristics of the atmosphere or the surface (e.g. Dione et al. 2017;Cheruy et al. 2013, Chiriaco et al., 2014, (ii) it is located in Western Europe and allows access to the hourly time scale, i.e. the scales of the local processes of the current study. The SIRTA-ReOBS dataset, along with some variables retrieved from ERA5 Reanalysis, enables the development of a model to estimate all the 55 terms involved in the near-surface temperature variations using predominantly real observations. The use of the model developed in the current study considers all the variables acting within the ABL and controlling surface temperature variations, all of them estimated almost exclusively from surface-based observations. Thus, it allows to study separately the influence of each SEB term in a local scale. This indeed allows to have a realistic and reliable estimation of the contribution of each term (radiative fluxes, turbulent heat fluxes, etc.) on hourly temperature variations, and it would be possible to have that at different 60 sites since each term will present a different behavior and importance. These estimations could help improving the parametrizations already existing of the SEB terms and better understanding their spatial evolution as a function of local conditions. Furthermore, a comparison between multi-model regional climate simulations and these estimations can be Commenté [OR4]: RC1-12: We added new references to mention how large-scale processes affect temperature and precipitation, and its temporal scale Commenté [OR5]: RC1-13: Air advection is replaced by temperature advection performed to evaluate if the simulations are able to well reproduce these behaviors, in particular in a warming climate where these processes are expected to change. The weight of each term of the SEB on hourly temperature variations is analyzed using 65 a powerful random forest analysis, whose maximal advantageone of its attributes is its capacity to handle up to thousands of input variables and identify the most significant ones.
The main result of this first objective is the fact that clouds are the main modulator on hourly temperature variations for most of the hours of the day and all the seasons Clouds are well known to modify directly near-surface temperatures and other nearsurface variables in multiple time-scales (Parding et al., 2014;Broeke et al., 2006;Kauppinen et al., 2014). Hence the second 70 objective of the current study is to understand the specific role of clouds and their associated characteristics on hourly temperature variations. Indeed, the influence of clouds on the temperature at the surface can vary depending on their physical properties and the altitude where they are formed (Hartmann et al., 1992;Chen et al., 2000). In general, low-level clouds tend to cool the surface, whereas high-level clouds, such as cirrus clouds, tend to warm it by absorbing a significant amount of Earth's outgoing radiation. The average contribution of clouds is to decrease the near-surface temperature, combined with the 75 damping effects of soil moisture, from a global point of view, by reflecting the solar radiation to space. But their damping effects vary depending on the season and the state of the atmosphereby time of day. For instance, the reduction in near-surface temperature is the highest in fall in northern mid-latitudes for specific cases when precipitation is not significant (Dai et al. 1999). The local contribution of clouds to temperature variations is thus an important topic to assess how this intake affects local climate variability and extreme local events, such as the extreme heatwave and drought during summer in EU in 2006 80 (Chiriaco et al., 2014;Rebetez et al., 2009) or the sudden melt of the ice sheet in Greenland on July 2012 (Bennartz et al., 2013). Several studies were made primarily focusing on the large scale effects of clouds on the radiative energy balance on a global scale, either at the top of the atmosphere (Arkin and Meisner, 1987;Raschke et al., 2005;Dewitte and Clerbaux, 2017;Willson et al., 1981;Allan et al., 2014;Cherviakov, 2016) or at the surface (Wild et al., 2015;Hakuba et al., 2013;Wild, 2017) or both at the same time (Hartmann, 1993;Kato et al., 2012;Li and Leighton, 1993), but a lack of studies investigating their 85 impact on a smaller scale is noted due to limited reliable ground-based measurement availability. In this study, to understand how clouds influence the one-hour temperature variations, cases with a predominance ofparticular cloud effect on temperature variations are deeper analyzed specifically withusing lidar profiles. to understand the role of the vertical distribution of clouds and their radiative effect on the state of the atmosphere.
To answerachieve these two objectives, the current paper is organized as follows: the dataset is presented in Section 2. Section 90 3 (and Appendix A) consists of describingdescribes how the different terms acting on hourly temperature variations are estimated and evaluating how well the model fits the hourly observations based on statistics. Section 4 presents an assessment to determine which term dominates at different times by performing a random forest analysis firstly for day and night cases, and then separated in a diurnal cycle perspective, along with a mean monthly-hourly and annual cycles analysis of the contribution of each term. Section 5 focuses on a discussion to study the specific role of clouds and the atmospheric conditions 95 under which they develop by assessing only the cloudy cases, which gives an overview of the type of clouds and surface 4 conditions damping or enhancing near-surface temperature for both day and nighttime. Section 6 draws conclusions and provides perspectives opened by this work.

Data used for the temperature variations estimation model
This study is mainly based on the SIRTA-ReOBS dataset analysis. The SIRTA observatory (Haeffelin et al. 2005) is located in a semi-urban area 20-km Ssouthwest of Paris (48.71° N, 2.2° E) and has collected long-term meteorological variables since 2002. The ReOBS project aims to synthesize all observations available at a single observatory at an hourly time scale with an exhaustive data-quality control, calibration, and rigorous treatment, into a single NetCDF file. The SIRTA-ReOBS dataset 105 , https://doi.org/10.5194/essd-10-919-2018) contains more than 60 variables.
All the necessary hourly variables requested by the current study are available for the period going from January 2009 to February 2014 and are listed in Table 1, allowing the performance of a multi-year analysis. Some variables that were retrieved from measurements at the SIRTA observatory sometimes present gaps due to instrumental issues Pal and Haeffelin 2015). In particular F lat and F sens variables are limited (with only 5% and 73% availability, respectively): Sect. 110 3 and Appendix A show how this issue is handled. Figure 1 shows the complete dataset availability in the SIRTA-ReOBS dataset for the five-year study period. The availability of data is quasi-homogeneous around 60% for all hours (Figure 1a). Gaps are most of the time due to the absence of the Mixing Layer Depth (MLD) variable for some complete days, extended sometimes for more than two months (not shown), restricting 71% of MLD data availability. Other gaps are caused by the absence of radiative variables (9% of missing data, see 115 Table 1). Summer is the season with the best data coverage (70%) and winter with the least data coverage (56% - Figure 1b), whose absences are due precisely to gaps in the MLD variable.
To complete this dataset, ERA5-Land hourly Reanalysis (spatial resolution of 31 km horizontal resolution of 0.1 ° x 0.1 °, Copernicus Climate Change Service, 2019)and 137 levels up to 1 hPa; ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate, 2020) is used to estimate the horizontal wind at 10 m (useful to estimate the advectionsee 120 Section 3) and the temperature near the surface T2m, as required for the advection term (see Section 3 and Appendix A). Furthermore, the temperature at the mixing layer depth ( ) -see Section 3.1 and Appendix) is also retrieved from the ERA5 Reanalysis (spatial resolution of 31 km and 137 levels up to 1 hPa; ERA5: Fifth generation of ECMWF atmospheric reanalyses of the global climate, 2020).

Variables used for the cloud contribution analysis 125
In order to get vertical information about clouds (see Section 5), SIRTA-ReOBS lidar profiles are also used, retrieved from a LNA lidar (532 and 1064 nm) whose vertical resolution is 15 m (for further details, see Chiriaco et al., 2018), and their analyzed is based on hourly lidar Scattering Ratio altitude-intensity histograms, calculated as follows: Where ATB(z) is defined as the total attenuated backscatter lidar signal and ATB(z)mol is the signal in clear-sky conditions. 130 These altitude-intensity histograms are used to estimate the mean cloud fraction percentage at a given altitude level z. The intensity axis which contains SR(z) thresholds as well as the three vertical atmospheric layers (i.e. low, middle and high-level layers) for cloud detection and characterization are defined in (Chepfer et al. (2010). The value of SR(z) = -999 corresponds to non-normalized profiles, the value -777 represents the profiles that cannot be normalized due to the presence of a very low opaque cloud and the value of -666 is set as invalid data. Then, bins located in the range 0.01<SR(z)<1 are for clear-sky 135 conditions, 1.2<SR(z)<5 is defined as unclassified data, and for cloud detection, a threshold of SR(z)>5 is set. Details are given in Chiriaco et al. (2018). Even though the instrument does not operate uninterrupted (it does not operate when it is raining, when it's nighttime, and on the weekends), it is very powerful to get information on the vertical structure of the atmosphere. and so understand the category of clouds acting on the temperature variations.

Estimation of the terms acting on near-surface temperature variations 140
The exchange of energy between the surface and the overlying atmosphere involves four terms: radiation, heat exchange with ground, heat exchange with the free atmosphere, and advection. In this section, a model which estimates these four terms is presented, based on several meteorological observations retrieved from the SIRTA-ReOBS dataset. Each term involved in this model is described (further descriptions are presented in Appendix), and then a statistical evaluation is performed to assess how well the model follows the real observations of temperature variations. Finally, the mean monthly-hourly and annual 145 cycles of the averaged contribution of each term and the same for temperature 1-hour variations ( 2 ) are presented (split into day and night).

Model description
The temperature variation at the surface is estimated from the sum of four terms: Where R is the net radiative flux at the surface, HG is the ground heat exchange, HA is the atmospheric heat exchange, and Adv is the air advection. These four terms control the changes in temperature over the surface, but depending on the temporal scale, one will dominate over the others (Ionita et al., 2012). Following the Appendix and partly based on (Bennartz et al. (2013),)these four terms are expressed as: Then: Where 2 is the near-surface temperature, is the time (in hours), is a coefficient characterizing the form of the temperature vertical profile in the boundary layer, is the average air density of the boundary layer, is the specific heat of air, MLD is the mixing layer depth, ∆ is the net radiative flux at the surface, is the temperature in the ground at 20 cm depth, is the temperature at the top of the boundary layer (or mixed layer depth), 10 and 10 are the zonal and meridional wind components respectively at 10 m above the ground, x is the zonal wind component towards the east and y is the meridional 165 component wind towards the nord, and and are defined as relaxation timescales for heat exchange processes in the ground and the atmosphere, respectively.
If the model is realistic, then the sum of the four terms in the left side of Eq. (2), denoted as 2 , is as close as possible to the observed temperature variations denoted as 2 .
The net radiative flux at the surface is calculated as the difference between the radiative longwave (LW) and shortwave (SW) 170 fluxes leaving and arriving at the surface as follows: Where the upward and downward arrows represent the upwelling and downwelling radiation, respectively. It is possible to add and subtract the clear-sky (CS) downwelling radiation fluxes to Eq. (8), to obtain the flux radiative components in CS and cloudy (CL) conditions (i.e. the specific contribution of clouds), as follows: 175 Here, the ↑ flux is the same for clear-sky and cloudy conditions, because it mostly depends on the near-surface temperature 180 (i.e. the Stefan-Boltzmann law, ↑ = 2 4 , where is the Stefan-Boltzmann constant), but note that in an annual global mean, As shown by Eq.
(3) to Eq. (6) and in the Appendix, temperature variations are estimated from variables described in Section 2, i.e. predominantly based on observations retrieved from the SIRTA-ReOBS, but also on ERA5 datasets. 190 Concerning the advection term, the computation of this term requires the extraction of the temperature at 2 m ( 2 ) and northward ( 10 ) and eastward ( 10 ) wind components at 10 m, at the SIRTA grid point and the surrounding grid points.
The temperature at the mixing layer depth is retrieved using both SIRTA-ReOBS andcombined with ERA5 datasets.
First, radiosoundings available in SIRTA-ReOBS (twice a day) are used to get the pressure at the MLD level and then is retrieved from the vertical temperature profile in ERA5 at the nearest grid box from SIRTA Observatory (48.7° N and 2.2° E) 195 and at this pressure level but at the closest time. Note that uncertainty remains in because it is the temperature at a certain pressure level, which is only available twice a day.

Statistical evaluation of the model
Statistics are considered to assess how close the total model ( 2 ) is to the observed temperature variations ( 2 ).
The PDFs of each term in Eq.
(2) are shown in Figure 2a. Firstly, the radiative term R is dominating and contributes the most 200 in the temperature variations at an hourly scale since its distribution presents different significant peaks for negative and positive values of temperature variations. In clear-sky conditions, the radiative term contributes the most to warm the surface whereas the radiative effect of clouds has an opposite effect. A negative significant contribution by the HA term is also observed (green line), meaning that the mixing with an atmosphere of higher levels contributes to decrease surface temperature variations, even if surface temperatures continue increasing along the day.cooling the surface atmosphere The two other terms 205 (HG and Adv) have a similar but weak impact on the temperature variation model at this time scale. In addition, the PDF of the 2 is compared to 2 in one hour in Fig. 2b: differences occur for cases where the temperature decreases during the hour, but differences occur for cases where the temperature decreases during the hour, but this difference corresponds to some cases where the model presents more negative values than the observations, around -1 °C h -1 the difference remains low.
The modeled PDF fits very well with the observed PDF for cases where temperature increases during the hour (a smaller 210 number of cases as shown by the PDFs).  Figure 2c shows the scatter plot of the observations versus the model. The correlation between the two datasets is good (0.79) and the bias remains small (-0.20 °C h -1 ). However, the model has difficulties in reproducing the extremes of temperature change within an hour, e.g. -6 °C h -1 which corresponds to the 5 th of July 2011 at 20:00 UTC, or -8 °C h -1 which corresponds to the 2 nd of July 2010 at 15:00 UTC. This high decrease of temperature could be associated with a cold pool event, which is 215 more detectable in summer thanks to higher near-surface temperatures than those in winter, and bring along with it storms and heavy precipitations, as detected for these two cases (not shown). The hourly values in the observations (as well as in ERA5) do not allow the capture of this cold pool event properly in the model developed, since it is related to rapid cloud formation (<1 h) that is not well captured by the R CL term. However since this type of event is rare and ephemeral on a local scale (Llasat and Puigcerver, 1990;Conangla et al., 2018), its presence does not bias significantly the general performance of the model. 220 These statistics are estimated for each season as presented in Table 2. Correlation is high (0.82) and bias is low in summer.
Nevertheless, the standard deviation remains high for this season, probably due to the higher values of temperature variation and the contribution of all terms involved for the summer in comparison to other seasons. For the spring and fall seasons, the correlation coefficient is still high (0.80). However, in winter the correlation coefficient is the lowest (0.67) with a higher bias (-0.31 °C h -1 ). Removing the extreme values of the observations and the model (i.e. taking the values that are within the 5th 225 and 95th percentile) gives better statistics (values in brackets in Table 2), confirming that the model encounters difficulties in reproducing these extreme events. Figure 2d presents the hourly evolution of the five terms on the right side of Eq. (10), their sum (i.e. 2 in blue solid line) and 2 (pink solid line) for the first days of September 2010. As expected, a diurnal cycle is identified for the radiative (both cloud and clear-sky terms) and the heat atmosphere exchange terms. For the radiative terms, it is quite expected to have 230 a positive (negative) contribution during daytime (nighttime) due to the presence (absence) of solar radiation. In addition, there are some days when R CL (dotted red lines) plays an important modulating role in temperature variations, noticed during the day on September 6 th and 8 th , when clouds manage to provide a maximum cooling effect of -2.9 °C h -1 .
Evaporation and thermal conduction at the surface increase as the day goes on and the larger these terms are the more they will prevent the increase in temperature linked to the sensible and latent turbulent fluxes. But overall the temperature increases 235 during the day when these fluxes increase. However, the atmospheric heat exchange term (HA) is on average a negative contribution to temperature variations in one hour when F lat and F sens increase in the afternoons. This negative contribution could be mostly associated with the mixing of masses of air at the top of MLD, where depending on the state of temperature inversion, is likely to be very negative and this cold air could cool the surface. When R CL presents an important contribution, HA becomes weaker and its diurnal cycle is attenuated due to the absence of solar radiation and hence fewer 240 surface fluxes are developed. The ground heat exchange and the advection terms play a minor role at this time scale and contribute negligibly to the model without a significant diurnal cycle detected. Finally, the model follows well the observations, with a better agreement for daytime than for night-time, and with a better correlation for the first case (not shown). Reasons that may explain the bias are 1) the limited availability of data which is estimated using only two radiosoundings per day and not hourly continuous values and 2) the hypothesis and assumptions 245 in some variables and atmospheric conditions outlined in the Appendix. Further, the temperature variations at night might be hard to quantify accurately due to the minor contribution of each of the non-radiative terms, for which contributions are close to zero, especially for the HA term, and where F lat and F sens are most of the time very low (Section 3.3).

Annual-and Monthly-hourly cycles of the different terms
The mean contributions of each term are averaged monthly for both day and night ( According to Figure 3b and 4b for nighttime, the clear sky and cloud radiative forcing terms dominate in magnitude the hourly temperature variations at the surface and the other terms remain close to zero. Still during the night, 2 monthly mean is 255 negative throughout the entire year, with higher negative values during warm months than during winter. Indeed, R CL is always positive during the night, and has an annual cycle more pronounced than the other terms, having a stronger effect in the cold months, due to the increase in cloud cover, especially low-level clouds which enhance an increase of the downwards flux of LW radiation and whose effect gets weaker while approaching summer, due to the modification of the radiative effect of clouds for this season. This decrease of R CL in summer explains the 2 annual cycle during night. 260 A diurnal cycle is pronounced for all the terms ( Figure 4). R CS contributes the most in magnitude to local temperature variations during daytime (Figure 3a), and the other terms damp its effect by providing a negative contribution. Furthermore, all the terms are mainly driven by the solar radiation intensity during the day. The diurnal and annual cycles of the R CS (R CL ) term are important, by having a positive (negative) contribution during the day, mostly dominating the temperature variations at this scale (as stated in (Gaevskaya, 1962). The contribution of R CL on 2 during daytime is more important during May and 265 June, and finds its minimum for December and January, when the solar radiation is weak, thus preventing clouds to strongly reduce it. Further, a maximum negative (positive) contribution to temperature variations of R CL (R CS ) is found during the late mornings for all the months, cooling (warming) the surface with a maximum mean value of -1.3 °C h −1 (3.2 °C h -1 ) in May and June at 10:00 UTC and 11:00 UTC.
In addition, the HA term plays an important role in temperature variations for spring and fall seasons, when there is an increase 270 in hourly solar radiation ( Fig. 3a and 4d). These two seasons are characterized by an important contribution of this term in the late morning and the afternoon, especially during spring, with an hourly maximum averaged contribution of -1.1 °C h -1 . The difference between day and night for the HA term is the largest during these months when the MLD can reach high values in the afternoon due to increased turbulence. For winter, its contribution is minimal due to the low development of the MLD.
This result is in contrast with the negative (and very low) contribution of the HG term, which is maximal in summer in the 275 afternoon with an average smaller value of -0.25 °C h -1 when the near-surface temperature is the highest and thus strengthens its cooling action during daytime. An opposite effect is found in winter when the ground is usually warmer than the surface, yielding to a positive contribution ( Fig. 3a and 4c). The advection term does not present a strong monthly-hourly cycle compared to the other terms, although one can distinguish a mean positivenegative action (still very low) to local temperature variations at all seasons except winter during daytimewith a mean minimum in July in the afternoon of -0.12 °C h -1 , as shown 280 in Figure 3a and 4e.
Lastly, the residual, defined as the difference between the model (sum of all terms) and the observations (i.e. 2 − 2 ), also has a seasonal variability and is mainly negative, but a minimum difference of -0.45 °C h -1 between the two datasets is found in April around noonbetween 07:00 and 09:00 UTC, which is partly related to the negative increase of the HA term at that time. This is due to the important absence of latent heat flux data especially for this month, which implies an 285 increase in the bias when is estimated (see Table 1 and Appendix A). Nevertheless, during the months when solar radiation is strong, the residual reaches positive values (with a maximum value around 0.4 °C h -1 in June in late morning) but remains a low overestimation. These values for the residual are low compared to the magnitudes of R CS , but it remains almost at the same order of magnitude as the R CL and HA terms and a good correlation in a diurnal cycle approach is found. Generally, this hourly mean residual could also be due to the simplifications and assumptions made in the model, under-sampling and energy 290 imbalance.
Focusing on the transition periods (sunrise and sunset, black lines in Fig. 4), the residual presents low values at these times.
Indeed, there is a slight underestimation of the model of about -0.13 °C h -1 for some months (e.g. February) at sunrise hours, whereas a low overestimation with close-to-zero residual mean values are found for May and June. For the sunset, a similar behavior is found (with very similar values for the residual term). Therefore, a good agreement is found between the model 295 and the observations for these specific hours.

Weight of the different terms acting on temperature variations: the random forest method
In this section, a random forest (RF) evaluation (James et al., 2013;Manish, 2016;Brownlee, 2016;Loh, 2002) is carried out at different time-scales to estimate the relative weight (i.e. importance) of each term in temperature changes according to the hour, month or seasons. This machine learning method consists of bootstrapped-aggregated decision trees, a method that 300 combines and gathers together the results of these trees to construct more powerful prediction models.

General behavior
Here, the random forest method is used to establish which term dominates the temperature changes in the model ( 2  ) during day and nighttime. This machine learning method consists of bootstrapped-aggregated decision trees, a method that combines and gathers together the results of these trees to construct more powerful prediction models. 305 One of the most impressive features of RF is here used, which consists on the ability to provide a fully nonparametric estimation of the importance of each term (or predictor) on the model. One of the main advantages of this method is that it allows covering not only the impact of each term individually in the model but also the multivariate interactions with other predictors. Here, the model (i.e. ∂T 2m ∂t mod ) has been already developed and defined as the sum of five terms. Therefore, to determine the importance of each term, the input data (the five terms) in the RF method are trained to predict the modeled temperature 310 changes ( ∂T 2m ∂t mod ), and so here the output of the RF method is still ∂T 2m ∂t mod . To know more on how the hyperparameters are tuned, how is the data split up into training and testing, and further information on the RF method, please refer to Appendix B.Since in the present study the model has already been constructed, one of its most impressive features is used, which consists of the ability to provide a fully nonparametric estimation of the importance of each term (or predictor) on the model. One of the main advantages of this method is that it allows covering not only the impact of each term individually in the model but 315 also the multivariate interactions with other predictors.
Indeed, oOther approaches to estimateion theof predictor importance (e.g. simple squared marginal correlations method, squared standardized coefficients) do not give reliable results when the problem involves correlated predictors (Grömping, 2007). Additionally, this method is used as a result of the nonlinear relationship between each separated term and the model (not shown), which does not yield to an estimation and quantification on how each process at the surface affects the temperature 320 variations, an approach suggested and used by Miller et al. (2017) who estimated the response and importance of some surface processes (such as F lat and F sens ) to the forcing radiative terms in Summit, Greenland.

General behavior of the weight of the different terms
FirstlyHere, the random forest method is used to establish which term dominates the temperature changes in the model 325 ( 2 ) during day and nighttime. The predictor importance estimate value is defined as the sum of the mean square error (MSE) of each term averaged over all decision trees used and normalized by the standard deviation taken over the trees. This principle consists of permuting the values of a termpredictorin the decision trees where this term was left out-of-bag and assess how much worse the MSE becomes after the permutation (James et al. 2013, Chapter 8). Thus, the larger this value, the more important the term. This importance estimation feature from the random forest method has been previously used to 330 calculate the variable importance in different datasets for many applied problems in sciences or health fields for both regression and classification studies (e.g. Archer et Kimes 2008;Genuer, Poggi, et Tuleau-Malot 2012;Strobl et al. 2007). (2)) importance estimate value for daytime and nighttime respectively. Figure 5a corroborates that R CS is the most important term, followed by and then HA, whereas HG is the least important term in the model developed for the timescale considered. Next, Figure 5b illustrates that 335 dominates hourly temperature variations during nighttime, followed by HG and . These results agree with (Kukla and Karl (1993),)who suggested that the key regulator of nighttime warming temperatures is downward thermal radiation when cloudy cases are presented, an effect that is damped most of the time by the radiative cooling effect of the ground.

Diurnal cycle of the weights
The importance estimation previously calculated for both day and nighttime periods considers all the processes occurring 340 during each case and thus gives a global importance estimation. In order to separate the influence of each term on hourly temperature variations, an importance estimation value is performed for each hour independently. separation of daytime/nighttime does not allow the consideration of the importance of processes occurring during transition times of the diurnal cycle when the increase/decrease in temperature is the strongest. So, to better evaluate the influence of each term on hourly temperature variations at different times of the day, the importance estimation value is determined for each hour using 345 the random forest method described previously in Section 4.1. Figure 6 presents the results of this method for each season.
This diurnal cycle is estimated by applying the random forest to each hour separately. Figure 6 presents the results of this method for each season. As expected (and previously exposed), for all the seasons is the term dominating during nighttime, just after sunset, and before sunrise (indicated by vertical dashed lines in black).
After sunrise, the surface heating produced by the sun in the early morning enhances a high growth rate in the temperature 350 variations, whose effect makes R CS the dominant term driving 2 at those hours of the day for all seasons, except for winter. For this latter season (Figure 6a), the growth rate of R CS is almost the same as that of R CL and thus it does not expose an important estimation value. This effect is due to the weak mean solar zenith angle (SZA) for this season, and the surface heating by the sun in clear-sky conditions is not strong enough to modulate dominate temperature variations. On the contrary, R CL importance reaches its minimum just after sunrise and before sunset, where, depending on the season, either or HA 355 are the main terms controlling temperature variations.
A shift in importance between R CL and R CS occurs during the rest of the day for the four seasons of the year. This effect is explained by the variation of the data for these two terms: R CS standard deviation for a given hour/season is weak, and thus its influence to explain the difference between one day to another at a specific hour remains minimal and it is not a strong predictor at diurnal cycle scale, especially in the summer and spring (Figure 6b and c, respectively) when other variables will modulate 360 temperature variations. On the other hand, R CL turns into the main modulator of 2 for all the seasons due to its strong standard deviation for a given hour/season, reaching its maximum importance in summer (Figure 6c). Therefore, the hourly Commenté [OR27]: RC1-5: This paragraph is modified to better explain the message we want to convey at first. Also, details on how the diurnal cycle of the weights of each term for each season are given.

Commenté [OR28]
: RC1-20: Changed temperature variations are more sensitive to cloud changes rather than solar radiation which does not vary significantly for a specific hour from day to day.
Concerning the other terms, HA importance grows along the day. Its variation is greater than that of the other terms making it 365 the second most important modulator most of the time (except for autumn, Figure 6d). Its contribution is weak in winter due to the lack of solar radiation and vegetation whose absence will diminish the turbulent heat fluxes measured at the surface, along with a weak MLD. Note that it becomes sometimes the most important term in the late afternoon just before sunset. This is explained by an increase in instability of the ABL enhanced by a large boundary layer and near-surface temperature gradient (i.e. the difference between − 2 reaches its maximum, see the third term of the right-side of Eq. (7)) in the late 370 afternoon (not shown) which is generated by the increase in turbulence fluxes near the surface.
Concerning the Adv term, it does not show a significant contribution during daytime but it Regarding the Adv term, it shows an important weight in some hours in the late afternoon in winter, which makes it the term controlling on average hourly temperature variations at that time (then it is HA who becomes more important). In summer (Fig. 6c), it presents an important increase as the day goes on similar to HA after 10:00 UTC, but HA is even more important thanks to a development of turbulent 375 heat fluxes at the surface in the late afternoon. Adv becomes the second most important modulator during nighttime before sunrise for all seasons except summer; probably a thermally-induced circulation linked to the urban heat island which is set up in Paris is at the origin of this importance: this circulation is likely to create a greater variation of temperature some nights when cooler air from rural areas is advected to warmer zones towards the city center. The SIRTA observatory, located in a suburban area at 20 km from the center of Paris could be indeed regularly affected by this modulation of circulation. Indeed, 380 in this area the two predominant winds come from S-E regime (the Siberian High) bringing mostly cold air temperatures, and S-W regime (air masses coming from the Atlantic Ocean) with warmer and more humid air (not shown). Figure 6 supports a clear and reliable estimation of the importance of each term split into seasons for every hour of the day, which is not seen by estimating the diurnal and annual cycle contribution of each term to temperature variations (cf. Section 3). Indeed, depending on the hour of the day (and thus the state of the atmosphere), one term will become important over the 385 rest and temperature variations will be more sensitive to its change even if its contribution in terms of magnitude remains smaller compared to other terms. For instance, R CS absolute contribution during nighttime is higher than that due to R CL (Fig.   3b), and yet the latter is more important during nighttime due to its seasonal and hourly (not shown) contribution variability to 2 . 390

Validation of the random forest method
However, this machine learning method is generally used in other studies to train and to have better estimations of a particular model. In order to validate the random forest method skill on predicting new observed temperature variations, the method is used to predict In the following, only cloudy cases are considered. These cases are identified based on a criterion on the absolute value of CRE that must be higher than 5 W m -2 in SW and LW  during daytime, whereas for nighttime only LW is considered for this criterion. The SWCRE (and LWCRE) is calculating following Eq. (11): According to this threshold, cloudy conditions correspond to 82% of the total cases from January 2009 to February 2014, and unique-clear-sky conditions represent the remaining 18%.

Daytime analysis
Since the R CS term dominates during daytime, the ratio of 2 divided by R CS is created to estimate how much these two terms are driven by the solar radiation: when this ratio is close to 1, it means that the variations of temperature are the ones 420 that would be expected in clear-sky conditions with no other modulators, and when this ratio deviates from 1 it means that the temperature variations are damped or enhanced by the other terms. The R CL /R CS ratio is also estimated, of which the Commenté [OR31]: RC1-4-4: Validation of the random forest method by showing the scatterplot between the observed and the predicted data (Fig. 7). distribution (not shown) has two peaks, one between -0.5 and -1 and one slightly negative with a tail in positive values. Thus, three bins are created from the highest cooling effect to the warming effect: (i) −1.0 ≤ < −0.5 (bin 1), (ii) −0.5 ≤ < 0 (bin 2) and (iii) 0 < ≤ 0.5 (bin 3). The cases with negatives or very close-to-zero values of R CS (10% of the cases), 425 which occurred in early mornings and late afternoons, are excluded since they affect the sign of the distribution or give divergent values. This histogram, along with the one used in Section 5.2, are presented in Appendix C. Figure 8a shows the values of the observed temperature variations divided by R CS and Figure 9 shows the distribution (represented as box-and-whiskers) of different relevant meteorological observations available in the SIRTA-ReOBS dataset, for each bin created, split into seasons for daytime. To add cloud information, lidar data are also considered but because they 430 are not available all the time, and in particular when it is raining (see Sect. 2.2), they are presented as additional boxplots (light color ones) that correspond to the lidar sampling (whereas the dark color boxplots are for the total cloudy sampling). This difference of sampling mostly affects the first two bins when clouds have a cooling effect, for which both the occurrence and the amount of precipitations are the highest (not shown). Lidar SR(z) histograms presented in Figure 10 are estimated by cumulating all lidar SR(z) observations available for one bin and one season in particular. The red horizontal lines on each 435 histogram correspond to low-(P > 680 hPa), mid-(440 < P < 680 hPa), and high-level (P ≥ 440 hPa) clouds limits (Chepfer et al. 2010;Chiriaco et al. 2018). Note that a Noise bin is presented here, which is simply the sum of the -999 and -777 bins that correspond to noisy and non-normalized profiles (cf. Section 2.2). Figure 8a shows that the values of temperature variations (normalized by R CS ) are the lowest for clouds having the most 440 cooling effect (bin 1 in blue) for all the seasons. In winter this ratio stays mostly positive (temperature increase in one hour) but less than 0.5 (less than half the increase that could be reached if the sky was clear) and it can become negative during summer and fall, i.e. the warming induced by R CS can be counterbalanced by the clouds in these seasons. Besides, the cloud cover is almost total and the radiative effects are strong, as seen in Figures 9d, e and f. In addition, the seasonal variability of SWCRE is directly related to the seasonal variability of , ↓ , and not (or only slightly) to the seasonal variability of cloud 445 properties, since the ratio , ↓ ⁄ remains almost constant during the entire year (not shown) and cloud cover presents the highest mean values (Fig. 9d).

Case with strong cloud cooling effect -bin 1
The light blue bin is the same as the dark one except for lidar sampling, i.e. for non-rainy cases exclusively. Indeed, lower RH values (Fig. 9b) favor lower precipitation occurrence (not shown), higher amount of F sens (Fig. 9c), and higher (i.e. less negative) values of SW CRE (Fig. 9e) for the lidar sampling compared to the total one. Logically, lower LWCRE is found for 450 the lidar sampling. As expected, temperatures are higher for the lidar sampling for all the seasons except summer, as shown in Fig. 8a, along with a more frequent negative temperature variation (Fig. 8a). For this sampling that excludes rainy cases, both higher and lower values of SWCRE are found for spring rather than for summer (Fig. 9e): the minimal value is around ~− Commenté [OR33]: RC1-22: Histograms shown in Appendix C has a lower value of SWCRE than spring). Figure 10b shows that low-and high-level clouds are more frequent for spring than 455 for summer (Fig. 10c) and thus stronger negative values of SWCRE are found for spring. Then, the important presence of midlevel clouds with a high value of lidar SR(z) (> 80) spotted in summer in Fig. 10c are potentially at the origin of the strong negative values of SWCRE despite the very low presence of other clouds. These strong and negative SWCRE could be associate with a presence of nimbostratus clouds, due to the high SR detected for lidar, clouds which are more likely to form in summer because of the strong convective systems developed during that time due to higher surface temperatures. In addition, 460 despite the strong negative values of SWCRE for this bin for all seasons, near-surface temperatures are not so low (Fig. 9a), partly due to the high LWCRE values (Fig. 9f) that slightly dampen the strong SWCRE. Fig. 10c exhibits an important presence of high-level thin clouds (especially above 8 km), along with mid-level thick clouds. Figure 10b also presents an important presence of high-level clouds in spring but within a smaller vertical range (7 < z < 10 465 km) compared to fall. Indeed, one reason explaining the presence of these high-level clouds at these two transition seasons could be the meet of a warm air with a cold air mass (which occur more often at spring and fall), where the lighter warm air rises up to several km from the ground and could form some cirrus clouds.Indeed, these high-level clouds could correspond to cirrostratus or cirrocumulus which form when a mass of warm air meets a mass of cold air, where the lighter warm air rises and could form these cirrus clouds, an event that occurs more often in the transition seasons, such as spring and fall. In addition, 470 spring exhibits high amounts of low-level thick clouds (SR>40) that are not detected the rest of the year, which could correspond to the moments just before a storm is set. Indeed, the highest amount of precipitation rates are found for this season (not shown) and the cloud cover minima in Fig. 9d for the lidar sample is 90% (the highest among the other seasons) and less clear-sky conditions are found (0.01 < SR(z) < 1.2).

Cases with weak cloud radiative effect: cooling or warming 475
When R CS becomes dominant with respect to R CL (bins 2 and 3), temperature variations are most of the time positive, especially in winter (Fig. 8a). In comparison with the first bin, the air becomes drier (higher sensitive heat flux and lower RH, Figure 9b and c), which agrees with less cloud cover (Fig. 9d), and lower LWCRE and less negative SWCRE ( Fig. 9e and f). Figure 10 shows fewer differences between bin 1 and bin 2 (R CL < 0) than between bin 2 and bin 3 (R CL > 0) because this figure only considers non-rainy situations and in section 5.1.1. it is noted the important differences between the two samplings 480 (total and non-rainy) for bin 1 for most of the variables. This difference between the two samplings is weaker for bins 2 and 3, despite its existence in winter and fall. Figure 10 is thus more representative of the full sampling for these two bins than it is for the first bin.
For the lidar sampling, LWCRE in winter and fall is slightly higher than for the two other seasons for the second bin (Fig. 9f). This is partly due to the presence of high-level thin clouds (5 > SR(z) > 20), such as cirrus, detected in the histograms of SR(z) 485 for this bin in Fig. 10e and h for these two seasons, which is slightly more than for spring and summer.

Commenté [OR34]: RC1-24: Discussion about the presence of nimbostratus clouds
In bin 3, the absence of mid-level clouds and the lower occurrence of high clouds compared to bin 2 is obvious, while the difference for low-level clouds is not clear. This is consistent with lower LWCRE (Fig. 9f) but not necessarily with the very low values (i.e. close to zero) of SWCRE observed for bin 3 (Fig. 9e). The SWCRE values close to zero are explained by the fact that most of the hours corresponding to this bin 3 coincide just after sunrise (not shown), when solar radiation is still weak, 490 thus explaining that ↓ ≈ , ↓ . These low SW and LWCRE values for the lidar sampling also correspond to low mean cloud cover (Fig. 9d) and thus a lower amount of clouds are detected by the SR(z) histograms (Figure 10i-l). Note that some of the SWCRE values are even positive, revealing that ↓ > , ↓ , an effect observed when the direct part of solar radiation is not fully attenuated and when some diffuse radiation reaches the surface in addition to the direct flux, radiation which is strengthened by scattering processes in clouds and atmospheric particles. This effect thus increases observed radiation beyond 495 a respective clear-sky scenario, and it mostly happens in winter (but could occur the rest of the year) in early mornings and/or late afternoons hours when the solar elevation angle is weak (Stapf et al., 2020;Wendisch et al., 2019). This distribution towards early morning hours explains that despite the high values of the ratio between temperature variations and ( Fig.   8a) and lower cloud cover and cloud effects, near-surface temperatures are not always maximal for this bin (these maximal temperatures happen only for spring and fall). 500

Nighttime analysis
During nighttime, a similar procedure is followed to analyze and characterize different meteorological variables under different cloudy conditions. Since R CL is the term dominating and controlling temperature variations (Section 4) and since R CS is constant, it is not necessary to divide it by R CS . Thus, two bins are created from the PDF of : 0 < R CL < 0.75 °C h −1 and 0.75 ≤ R CL ≤ 1.5 °C h −1 . During nighttime there is no SW radiation so clouds always have a warming effect on near-surface 505 temperature. The distribution of 2 for the two bins created for all the seasons is presented in Fig. 8b. Figure 11 shows the same meteorological variables shown in Fig. 9 but for nighttime cases. Cloud fraction profiles are not shown because the lidar does not operate during nighttime. Figure 8b shows that the temperature variations are generally stronger for the first bin than for the second bin, i.e. the negative temperature variations induced by longwave cooling during nighttime are damped when the effect of clouds is stronger. This 510 agrees with what is expected since, at hourly time scales, clouds are the most important factor in modulation of temperature variations during nighttime (see section 4). As expected, the second bin has stronger LWCRE values (Fig. 11e). The effect of clouds seems enhanced when cool and moist air is present over SIRTA (Fig. 11a, b). More details for each bin are given in the following subsection.

Case with weak cloud warming effect 515
For the first bin (0 < R CL < 0.75 °C h −1 ), LWCRE presents a wide range of values going from 5 W m -2 up to 75 W m -2 with no seasonal variability detected (Fig. 11d), and yet temperatures are the highest for this bin (Fig. 11a) except in winter. As expected, upward sensible heat flux is most of the time negative due to surface cooling, as values of 2 are predominantly negative ( Fig. 11c and 7b, respectively), with drier air conditions since RH has the lowest values for all the seasons (Fig. 11b).

Case with strong cloud warming effect 520
The temperature distribution has the lowest values for all seasons (except spring) for the category of clouds that have the strongest warming effects (second bin, in purple on Fig. 11a). This category of clouds is tightly linked to strong LWCRE ranging from 55 W m -2 up to 90 W m -2 (Fig. 11d). These low temperatures could be associated with situations when the incoming air masses are colder, which is linked to large-scale situations (Pinardi and Masetti, 2000;Wang et al., 2005). Note that here hourly temperature variations are studied, and indeed on multi-day time scales, it is the large-scale atmospheric 525 processes that determine on the first order the daily temperature value.
Finally, the distribution of sensible heat flux is positive in winter with values reaching up to 40 W m -2 (purple box plot, Fig.   11c). These unusual and positives values during nighttime are explained by the presence of a strong stable atmospheric layer, where near-surface temperatures are the lowest except for spring (Fig. 11a) and the atmosphere is warmer at higher altitudes since clouds contribute to warming the most at this time, and therefore a positive upward sensible heat flux develops. Similar 530 behavior was previously found by Miao et al. (2012) in Beijing also during nighttime in cloudy conditions, where sensible and latent heat fluxes were close to zero due to the presence of a stable atmospheric layer, especially in winter when the sensible heat flux is found to be slightly greater than the latent heat flux. Indeed, almost half of temperature variation values are found to be positive for this season, as illustrated in Fig. 8b, and along with the positive upward sensible heat flux found, higher temperatures are found for this bin which is not seen in the other seasons where the sensible heat flux is close to zero or mostly 535 negative and temperatures are higher for the first bin (cloud with a low warming effect).

Conclusions
The European climate and most temperature anomalies are affected not only by the circulation of large-scale air masses (such as the NAO or blocking regimes) but also by smaller-scale processes such as cloud radiation and surface-atmosphere interactions located within the atmospheric boundary layer. In this paper, a method is developed and evaluated to quantify 540 each process that affects hourly 2 m-temperature variations on a local scale (near Paris), based almost exclusively on observations. The method exhibits good accuracy and is able to quantify well the realistic diurnal and monthly-hourly cycles of each term involved, especially in summer where statistics are the best. The clear-sky radiative term represents the biggest positive (negative) contribution during daytime (nighttime) with values reaching up to 4 ° h -1 (-1.7 ° h -1 ), whereas clouds cool (warm) up to -3.7 ° h -1 (1.4 ° h -1 ). The atmospheric heat exchange becomes predominant in the late afternoons when turbulent 545 fluxes develop due to the increase in atmospheric instability. Some of the biases found are due to the difficulty of the model to well reproduce smaller-scale processes such as the cold pool events.
The use of a random forest analysis makes it possible to identify which term dominates over the others. Clear-sky radiation most influences the 1-hour temperature variations during the day, whereas it is the cloud radiation during the night followed by the ground heat exchange term. Nevertheless, separating the dataset into hours and seasons shows that the cloud radiation 550 effect becomes predominant in several hours for all the seasons because it presents a greater variation of its hourly values than the clear sky radiative effect, especially in spring and summer during daytime when more local cloud cover variability occurs.
This cloud dominance is still found despite the fact that the clear-sky radiation magnitudes are the highest on a monthly-hourly scale. Temperature variation is then more sensitive to cloud changes within an hour rather than the large contribution of clearsky radiation which mostly dominates the early mornings when surface heat fluxes are not yet developed. The other terms 555 remain important as there are times when some of them modulate temperature variations, especially the turbulent heat fluxes in the late afternoons. These terms also remain necessary to obtain the best coefficient estimator between the directly measured observations and the method developed.
To better understand the importance of clouds on temperature variations at this scale, a deeper analysis is performed partly based on lidar profile observations. The atmosphere presents a high amount of cloud fraction detected by both the lidar and 560 the sky imager during daytime which correlates well with the strong cooling contribution found for the cases when clouds have a strong cooling effect. Indeed, an important presence of mid-level thin and thick clouds is spotted for all the seasons (except spring) for these cases. These thick clouds could correspond to nimbostratus which are very opaque and often produce continuous moderate rain, but the lidar detects them just before the rain starts. Indeed, temperatures at this time are very low, which contributes to these clouds forming since they usually form ahead of a warm front. On the contrary, a weak cooling 565 effect is predominantly associated with low-and high-level thin clouds for all seasons. Situations with daytime positive cloud radiative effect occur when the atmosphere is close to clear-sky conditions and corresponds to low-and high-level thin clouds mostly in early mornings and late afternoons. In addition, situations with weak cloud effect (either negative or positive) coincide with an important amount of high-level thick clouds for all the seasons (except winter) whose LWCRE is high, but SW clear sky radiation controls temperature variations (Fig. 8a, bins 2 and 3). These high-level clouds are more present in 570 weak cloud cooling and warming effects (bins 2 and 3) than the times of strong cooling effect (bin 1)In addition, situations with weak cloud effect (either negative or positive) are associated with an important presence of high-level thick clouds for all the seasons (except winter), whose LWCRE is strong but SW clear-sky radiation dominates and controls temperature variations along with other surface meteorological variables, high-level clouds which are more present than the times of strong cooling effect. Overall, a dominance of mid-and high-level clouds at the SIRTA observatory is detected for all cases. The dominant 575 presence of this specific type of cloud is linked to the geographical position of the SIRTA observatory since it is located in a zone where warm and wet air coming from the Atlantic Ocean can nudge against the cold and dry masses of air coming from the Siberian region. This encounter of air masses (whether it is the warm air that encounters cold air or the opposite) tends to form mostly high-level clouds. Similar behavior has been found by Chakroun et al. (2018) during nighttime and Mariotti et al. (2015) over the Euro-Mediterranean area where the variability of cloud fraction (CF) is driven mostly by the encounter of two 580 different air masses coming from Northern Europe and Southern Africa. During nighttime, even when clouds have a positive important radiative effect, temperatures are low and they are more controlled by large-scale processes since surface turbulent fluxes are most of the time low and there is no shortwave radiation. Since neither cloud fraction variable nor lidar data are available during nighttime, no information about the percentage of cloud presence can be known, yet variability on LWCRE is still found between clouds whose warming effect is weak and those whose warming effect is strong. This LWCRE variation 585 is on the first-order controlled by the presence of clouds, but it could also be due to the sensitivity of LW radiation arriving at the surface to the presence of atmospheric gases and aerosols and not only to cloud radiative properties, as shown previously The approach developed in this study is innovative because it is predominantly based on observations. It opens several 590 perspectives on the possibility for continuing work: (i) use of the approach in combination with weather regimes to better relate the small-scale processes to the large-scale atmospheric dynamics; (ii) application of the approach to other locations to understand the spatial variability of the results and specific local conditions affect each of the terms involved in near-surface temperature variations; (iii) identification of the moments when one term becomes predominant over the rest by integrating the time into larger scales (such as weekly and monthly); and (iv) estimation of how well these estimations are represented in 595 the present and future climate simulations to evaluate models and/or understand the evolution of the different terms in a warming climate. Further, the ground heat exchange contribution is very site-dependent (more than the other surface terms), with different behavior in an urban environment, therefore it will be interesting to study its impact in the attenuation/amplification of maximum near-surface temperatures in periods of heatwaves.

600
Author contributions. OR carried out the data analysis, methodology and prepared all the figures. MC, SB and JR contributed to the data analysis, advised on methods, validation and interpretation of results. OR wrote the manuscript with contributions from MC and SB.
Acknowledgements. The authors would like to acknowledge Mesocentre ESPRI team at IPSL for providing storage resources 605 and giving access to SIRTA Re-OBS dataset, and also the SIRTA observatory for collecting and providing the geophysical variables and lidar data used in this study. The SIRTA-ReOBS NetCDF file used in this study is available at https://sirta.ipsl.fr/reobs.html under https://doi.org/10.14768/4F63BAD4-E6AF-4101-AD5A-61D4A34620DE. The advection term as well as the temperature at the mixing layer depth were estimated using Copernicus Climate Change Service Information ERA5-Land andERA5 [2009-2014], respectively. 610 Competing interest. The authors declare that they have no conflict of interest.           The prognostic model used to study the temperature change at the surface, considering all the components driving the surface energy balance, can be simply written as the sum of four processes: Where R is defined as the radiative forcing term, HG as ground heat exchange term, HA as atmospheric heat exchange term, and Adv as advection. Each of these terms represents a process involved in the variation of near-surface temperature 2 , and can be calculated, using different measured and available variables as follows: Where 2 is the near-surface temperature, is the time, is a coefficient characterizing the form of the temperature profile 910 in the boundary layer, is the average air density of the boundary layer, is the specific heat of air, MLD is the height of the boundary layer, ∆ is the net radiative flux at the surface, is the temperature in the ground at 20 cm depth, is the temperature at the top of the boundary layer, 10 and 10 are the zonal and meridional wind components, respectively, at 10m above the ground, x is the zonal wind component towards the east and y is the meridional component wind towards the nord, and and are defined as relaxation timescales for heat exchange processes in the ground and the atmosphere, respectively. 915 Before explaining how each term is estimated, considerations are necessary about the stability of the atmosphere and related temperature profiles. Note that these assumptions will not affect the physical behavior of the developed method; they are made in order to have a more quantitative treatment of the studyNote that these assumptions will not affect the physical behavior of the method developed; they are made to have a more quantify treatment of the study.
Temperature near the surface and its behavior in lower layers are mostly driven by the quantity of net radiative flux arriving 920 on it, which also depends on what moment of the day the temperature is observed; thus, it is important to differentiate day and night. During the first case mentioned, radiative flux is significant as a result of solar incoming flux radiation (shortwave radiation), and the vertical temperature profile will depend on the amount of this radiative flux (among other components). In the absence of any incoming shortwave radiation, which corresponds to nighttime, clouds and other processes control temperature variations at the surface and its vertical profile. 925 Figure A1 confirms that the Planetary Boundary Layer (PBL) is on average unstable during daytime and stable during nighttime in the area of study. This figure is built from twice-daily radiosoundings (at 11:00 and 23:00 local time) observations available in the SIRTA-ReOBS file from a METEO-France station located in Trappes (48.77° N, 2.01° E), 16 km away from the SIRTA observatory, to retrieve (every 15 m) the temperature and pressure up to 15 km above the ground. Figure A1a and b provide an overview of temperature profiles retrieved from these radiosoundings at 11:00 and 23:00 LT, respectively, for 930 July of 2011 (each color represents a day of the month). Then, the monthly mean temperature profiles from 2003 to 2017, each month represented as one color, are plotted in Fig. A1c and A1d. As expected, the temperature is decreasing from the surface This said, a daytime and a nighttime equation is used to parametrize the temperature profile in the surface layer (SFL). 935 For the daytime approach, a linear temperature profile is established as follows: Where is the temperature gradient, negative during day.
For the nighttime stable case, the temperature profile can be defined by a polynomial approach (Stull, 1988): A shape parameter = 1.5 shows a quasi-linear behavior of the temperature profile with a weak positive gradient near the surface, fitting well with the profiles in Fig. A1c and A1d (not shown).
The assumption of working with an ideal gas in an isobar environment, and applying the second law of thermodynamics, yield to an expression of the enthalpy as described below (Malardel, 2009;Wallace and Hobbs, 2006): Knowing the vertical temperature profile, Eq. (A5) can be integrated all over the MLD to find the total enthalpy per square meter at a given level (assuming that the density of the air is approximately constant at lower layers).
It is found that for a nighttime case, the total enthalpy can be expressed as: By deriving this equation by the near-surface temperature and total enthalpy, the ability to assess the change in near-surface 950 temperature per unit change in enthalpy is estimated as: For the daytime, proceeding as above, it is found the same equation for the nighttime with the only exception that = 0.
As explained before, distinguishing between night and daytime is very important for the study due to the differences in the radiative flux arriving at the surface, which condition the behavior of the PBL (i.e. MLD) and SEB terms. 955

A1.1 Radiative term
This term is calculated as: The contribution of the radiative forcing on the temperature variations at SIRTA can be estimated by using the second law of thermodynamics, which states that the only energy a particle exchanges with its surroundings is heat. That said, it is assumed 960 that the only heat exchanged for a particle of air in the low atmosphere with its surrounding is the net radiative flux divergence, yielding to: Where ↓ − ↑ is the only source of energy of the particle. By replacing ℎ in Eq. (A7) by (A9), it is found that: With Eq. (A10) and (A11) corresponding to night and daytime cases, respectively. These two equations described above will allow estimating the first term of the temperature variations model at SIRTA, the radiative term. MLD is a scale of height corresponding here to the height of the PBL retrieved from SIRTA-ReOBS. This value is set at this threshold because all the turbulent processes affecting the temperature variations are found within this layer. The PBL thickness varies depending, 970 among other processes, on the amount of solar radiation, which enhances thermal and turbulent processes, making this thickness ranges from tens of meters to 2 km or more. Thus, an average PBL thickness value (i.e. MLD) in an hourly and monthly scale is set (an assumption that doesn't affect the physical behavior of the model), as seen in Fig. A1c. Since during nighttime the boundary layer depth is difficult to estimate because of the weak turbulence due to the absence of solar radiation and its very complex dynamical system (Shi et al., 2005;Walters et al., 2007;McNider, 2011), a fixed value of 350 m is set 975 (Fig. A1d). Figure A2b shows the behavior of the radiative term calculated by using Eq. (A8) assuming = 1 kg m -3 and = 1006 J kg -1 K -1 . A seasonal cycle is marked for this term, where during summers a peak of maximum contribution is found, whereas in winter its impact on temperature variations decreases significantly.

A1.2 Atmospheric heat exchange term
This term is estimated as: 980 In this equation, can be calculated at equilibrium by setting the left side of the Eq. (A2) to zero and neglecting the ground heat exchange and advection terms (i.e. for the temperature not to vary, a balance between these two latter terms must be set), leading to: For the atmospheric heat exchange, ∆ = + . However, F lat has lots of missing values in SIRTA-ReOBS dataset (material defection), and therefore does not allow performing a complete analysis for the five years of study. Hence, an hourly and monthly look-up table (LUT) is created to have an estimation of . Moreover, is not directly available in the SIRTA-ReOBS dataset. To retrieve that temperature, the mixing layer depth is first retrieved from the SIRTA-ReOBS dataset, by looking for the nearest radiosounding in time (two per day) and founding the pressure corresponding to this altitude, then it is 990 possible to have the temperature at that pressure levels by looking them in the ERA5 reanalysis dataset. Figure A3 shows an average for all the months of the year. During night, the relaxation time scale shows a higher month-to-month variability mainly due to the absence and gaps of the latent and sensible heat fluxes, which does not allow to reflect a marked tendency of , whereas for daytime a more clearly behavior of is spotted for all the months thanks to an increase of the availability of the data for these hours (not shown). Sometimes during nighttime, this relaxation time exceeds 30 h, probably because the 995 turbulence is so weak at that time that the heat does not reach completely the surface boundary layer (SBL) established. Stull The atmospheric heat exchange term hourly values are shown in Fig. A2d. Most of the values are negative, indicating that 1000 this term will modulate the positive contribution made by the radiative forcing term.

A1.3 Ground heat exchange term
The ground heat exchange term is defined as the energy lost by heat conduction through the lower boundary. This term can be then determined as follows: On average, the near-surface temperature during daytime (nighttime) is higher (lower) than that at low depth, as it's been shown previously (Al-Hinti et al., 2017;Popiel and Wojtkowiak, 2013). Hence, the assumption that the vertical profile of temperature within the ground at low depths follows approximatively the same behavior as the atmospheric temperature profile can be established, and therefore the approach adopted for the atmospheric heat exchange term (c.f. A1.2) can be here implemented, to evaluate the relaxation time scale for the ground heat exchange at a depth of = 0.2 m below the ground. It 1010 is considered the surface to be warmer than the temperature at 20 cm below the ground during the day thanks to the solar radiation arriving. The opposite happens during night, when the ground losses important longwave radiation and it gets cold faster than . Knowing the temperatures at the top of the ground ( 2 ) and at 20 cm below it ( ), the relaxation time scale results in: The soil type at the SIRTA observatory is a mix of clay and limestone (obtainable from a regional predominant type of soil map in France; Wulf et al., 2015). The approximative values for ground density and heat capacity , for this type of soil are then: = 1300 kg m -3 , = 1140 J kg -1 K -1 1020 A mean value of = 20 h for the day and night cases is found. Figure A2c presents the contribution of this term to the temperature variations, which is very minor but remains important to have a better agreement between the developed model and the directly measured observations, by the fact that adding or removing some units could make a difference in estimating the correlation between the two datasets.

A1.4 Advection term 1025
= ( 10 2 + 10 2 ) (A16) The advection term is estimated as presented in Eq. (A16) using ERA5 Reanalysis dataset of horizontal wind components at 10 m and temperature at 2 m above the ground. The purpose of using this dataset is to have another point in each horizontal axis to calculate the transport of the mass of air in zonal and meridional directions, between the SIRTA observatory and the immediately following grid box. For the period and time scale considered, the advection term plays in the majority a minor 1030 role, as shown in Fig. A2e, compared to the other terms.

A1.5 Observed hourly temperature variations term
The left side of Eq. (A2) is defined as: Where ℎ is the temperature at 2 m above the surface at hour h and ℎ−1 is the temperature at the previously considered hour. 1035 Figure A2a presents the evolution of this term, denoting, as expected, a seasonal cycle. Temperature variations have stronger negative than positive values, reaching few times a rate of -6 °C h -1 or even -8 °C h -1 for someday in summer 2010.
The sum of all terms on the right side of Eq. (A2) is hereafter called 2 . main interest of using this machine learning is to determine the importance of the terms on the model developed, as it is known that random forest protect against overfitting by constructing training samples through bootstrapping.
Some hyperparameters are tuned in order to optimize the analysis: • The random forest method is set to have 150 decision trees, because at that number the error converges to a small 1055 value, as seen in the Fig.B-1 (converging value of 0.25 during daytime, 0.12 during nighttime).
• For the split criteria, since our model is a simple one-degree regression, the method is set to use the mean square error (MSE) to do the split at each leaf.
• The number of random features to consider at each split and the number of bootstrapped dataset used to train each decision tree in the random forest method is approximatively to be 2/3 of the total of predictors and 2/3 of the total of 1060 sample, respectively (James et al., 2013).

1065
In Section 5, different bins are created to analyze the specific contribution of clouds for both day and nighttime. Figure D1a shows the distribution of the ratio RCL/RCS for daytime hours, where two peaks can be easily identified: one between -0.5 and -1 and one slightly negative with a tail in positive values. From this distribution, three different bins can be created in order to separate the effect of clouds on observed near-surface temperature variations, going from the highest cooling effect to the warming effect (rectangular semi-transparent brown boxes in Fig. C1a): (i) −1.0 ≤ < −0.5 (bin 1), (ii) −0.5 ≤ < 0 1070 (bin 2) and (iii) 0 < ≤ 0.5 (bin 3).
As for nighttime cases, bins are created only considering the distribution of RCL, which is presented in Fig. C1b. Two peaks are spotted in this figure when RCL > 0 and thus the following two bins are created: 0 < R CL < 0.75 °C h −1 and 0.75 ≤ R CL ≤ 1.5 °C h −1 . During nighttime there is no SW radiation so clouds always have a warming effect on near-surface temperature.