Application of WRF / Chem-MADRID and WRF / Polyphemus in Europe – Part 1 : Model description , evaluation of meteorological predictions , and aerosol – meteorology interactions

Comprehensive model evaluation and comparison of two 3-D air quality modeling systems (i.e., the Weather Research and Forecast model (WRF)/Polyphemus and WRF with chemistry and the Model of Aerosol Dynamics, Reaction, Ionization, and Dissolution (MADRID) (WRF/ChemMADRID)) are conducted over Western Europe. Part 1 describes the background information for the model comparison and simulation design, the application of WRF for January and July 2001 over triple-nested domains in Western Europe at three horizontal grid resolutions: 0.5 , 0.125, and 0.025, and the effect of aerosol/meteorology interactions on meteorological predictions. Nine simulated meteorological variables (i.e., downward shortwave and longwave radiation fluxes (SWDOWN and LWDOWN), outgoing longwave radiation flux (OLR), temperature at 2 m (T2), specific humidity at 2 m (Q2), relative humidity at 2 m (RH2), wind speed at 10 m (WS10), wind direction at 10 m (WD10), and precipitation (Precip)) are evaluated using available observations in terms of spatial distribution, domainwide daily and site-specific hourly variations, and domainwide performance statistics. The vertical profiles of temperature, dew points, and wind speed/direction are also evaluated using sounding data. WRF demonstrates its capability in capturing diurnal/seasonal variations and spatial gradients and vertical profiles of major meteorological variables. While the domainwide performance of LWDOWN, OLR, T2, Q2, and RH2 at all three grid resolutions is satisfactory overall, large positive or negative biases occur in SWDOWN, WS10, and Precip even at 0.125 ◦ or 0.025 in both months and in WD10 in January. In addition, discrepancies between simulations and observations exist in T2, Q2, WS10, and Precip at mountain/high altitude sites and large urban center sites in both months, in particular, during snow events or thunderstorms. These results indicate the model’s difficulty in capturing meteorological variables in complex terrain and subgrid-scale meteorological phenomena, due to inaccuracies in model initialization parameterization (e.g., lack of soil temperature and moisture nudging), limitations in the physical parameterizations (e.g., shortwave radiation, cloud microphysics, cumulus parameterizations, and ice nucleation treatments) as well as limitations in surface heat and moisture budget parameterizations (e.g., sn w-rel te processes, subgridscale surface roughness elements, and urban canopy/heat island treatments and CO 2 domes). While th use of finer grid resolutions of 0.125 ◦ and 0.025 show some improvements for WS10, WD10, Precip, and some mesoscale events (e.g., strong forced convection and heavy precipitation), it does not significantly improve the overall statistical performance for all meteorological variables except for Precip. The WRF/Chem simulations with and without aerosols show that aerosols lead to reduced net shortwave radiation fluxes, 2 m temperature, 10 m wind speed, planetary boundary layer (PBL) height, and precipitation and increase aerosol optical depth, cloud condensation nuclei, cloud optical depth, and cloud droplet number concentrations over most of the domain. These results indicate a need to further improve the Published by Copernicus Publications on behalf of the European Geosciences Union. 6808 Y. Zhang et al.: Application of WRF/Chem-MADRID and WRF/Polyphemus in Europe model representations of the above parameterizations as well as aerosol–meteorology interactions at all scales.


Introduction
Significant progress in Europe has been made in recent years in reducing air pollution and its harmful impact on public health through monitoring air pollutants, tightening air quality standards, controlling emissions of air pollutants, and communicating with various stakeholders and the general public on the preventive measures of reducing air pollution and exposure.Several studies showed a strong association between adverse effects on human health (e.g., daily mortality, lung and heart diseases, and diabetes) and elevated PM 2.5 levels in European cities (e.g., Helsinki and Stockholm) (e.g., Timonen et al., 2004;Rosenthal et al., 2011;Aphekom, 2011;Meister et al., 2012).Coarse particles are also associated with increased morbidity and hospital admissions of people with respiratory diseases (Brunekreef and Forsberg, 2005;Pope and Dockery, 2006), despite their less detrimental health effects.Regulations for air quality in Europe focus on gaseous pollutants (e.g., O 3 , NO 2 , SO 2 ) and particulate matter with aerodynamic diameters less than or equal to 2.5 and 10 µm (PM 2.5 and PM 10 ) (European Commission, 2008).Anthropogenic sources such as traffic, energy consumption, industry, domestic combustion and agriculture are the major sources of these pollutants in continental Europe (EMEP, 2006a, b;WHO, 2006), although longrange transport also plays an important role in some regions (e.g., southern Europe where PM 10 concentrations may be enhanced by mineral dust particles transported from the Sahara desert) (Escudero et al., 2007;Stohl et al., 2007;Kallos et al., 2007Kallos et al., , 2009;;Jiménez-Guerrero et al., 2008;Spyrou et al., 2010).Air quality models (AQMs) are used to understand why high concentrations are sometimes observed and to assess the effects of proposed emission reductions on air quality standards in Europe.To establish confidence in these models, they are validated by comparison of model results with observations from ground networks, ground-based lidars, and satellites.For example, over Europe, different AQMs such as Polyphemus, CHIMERE, the European Monitoring and Evaluation Programme model (EMEP), the LOng Term Ozone Simulation (LOTOS), and the Community Multiscale Air Quality (CMAQ) modeling system have often been used to simulate past episodes or forecast European air quality (see references for each model in Solazzo, 2012a).Most of these models were intercompared during the Air Quality Model Evaluation International Initiative (AQMEII) project (Galmarini et al., 2010;Rao et al., 2011;Solazzo, 2012a, b).Regional AQMs have also been used in conjunction with urban/local traffic and/or dispersion models to assess the impact of European emission control on urban/local air quality (e.g., Giannouli et al., 2011).
Depending on the coupling between a meteorological model (MetM) and a chemical transport model (CTM), current three-dimensional (3-D) AQMs can be grouped into two types: offline and online.In the offline-coupled AQMs, a MetM is used first to generate the meteorological fields; a CTM is then used to generate chemical concentrations using outputs from the MetM.The chemical concentrations from the CTM are not fed back to the MetM.In the online-coupled AQMs, simulations using the MetM and CTM are performed in parallel, exchanging predicted meteorological and chemical fields at every time step.Such an online-coupled AQM may include two models with an interactive interface in between such as two-way coupled WRF/CMAQ (Yu et al., 2011;Wong et al., 2012) (which is also referred to as an online access model) or one unified model system in which meteorology and air quality variables are simulated together in one time step without an interface between the two models such as the Weather Research and Forecast model with Chemistry (WRF/Chem) (which is also referred to as an online integration model) (Grell et al., 2005;Fast et al., 2006;Zhang, 2008;Zhang et al., 2010a;Kukkonen et al., 2011;Baklanov et al., 2013).The model treatments of atmospheric processes for both chemical and meteorological variables are consistent in the online integration models but may be different in the online access models.These online models can therefore simulate not only pollutant concentrations but also the meteorology-chemistry feedbacks through various direct, semi-direct, and indirect feedback mechanisms.Both offline and online models have their own merits and are commonly used in current regional and global models.Offline AQMs are frequently used in ensembles and operational forecasting, inverse/adjoint modeling, and sensitivity simulations, whereas online-coupled AQMs are increasingly used worldwide for cases with important chemistrymeteorology feedbacks (e.g., climate change investigations) and fast changes in the local-scale wind and circulation system (Zhang, 2008).The online-coupled AQMs have been applied over many regions including North America (Jacobson et al., 1996(Jacobson et al., , 1997;;Grell et al., 2005;Zhang, 2008;Zhang et al., 2010aZhang et al., , b, 2012a)), Asia (Tie et al., 2009;Wang et al., 2010;Zhang et al., 2012b;Jiang et al., 2012), and Europe (Baklanov et al., 2007(Baklanov et al., , 2008;;Zhang et al., 2011a;Forkel et al., 2012;Tuccella et al., 2012), as well as on a global scale (Roeckner et al., 2006), and global through urban scales (Jacobson, 2001;Zhang et al., 2012c).The strengths and limitations of offline-and online-coupled models are summarized in several reviews (e.g., Grell et al., 2004;Zhang, 2008;Baklanov, 2010;Baklanov et al., 2011Baklanov et al., , 2013;;Kukkonen et al., 2011), among which Zhang (2008) reviewed several onlinecoupled models used over North America and Kukkonen et al. (2011) and Baklanov et al. (2013) provided a comprehensive review of online-coupled models used over Europe.A comprehensive review of offline-and online-coupled AQMs for real-time air quality forecasting models can be found in Zhang et al. (2012d, e).
The performances of offline-and online-coupled AQMs have been compared in several studies.For example, San José et al. (2009) compared offline-coupled Fifth-Generation Penn State/NCAR Mesoscale Model (MM5)/CMAQ and online-coupled WRF/Chem for a high particulate matter (PM) episode over Germany in winter and found that WRF/Chem gave better agreement with PM observations.Matsui et al. (2009) compared WRF/CMAQ v4.6 and WRF/Chem v2.2 over Beijing, China, and found that WRF/Chem systematically gave higher overpredictions of the surface concentrations of primary species such as carbon monoxide (CO), nitrogen oxides (NO x ), and elemental carbon (EC) due to different treatments of mixing processes.Yu et al. (2011) and Wong et al. (2012) compared offline-and online-coupled WRF/CMAQ and reported improved model performance in surface shortwave and longwave radiation, 2 m temperatures, the shortwave and longwave cloud forcing, surface ozone (O 3 ) and PM 2.5 .
This study aims at comparing two AQMs: an offlinecoupled model (i.e., WRF/Polyphemus), and an onlinecoupled model (i.e., the WRF with chemistry and the Model of Aerosol Dynamics, Reaction, Ionization, and Dissolution (MADRID), referred to as WRF/Chem-MADRID) to assess their capabilities in simulating pollutant concentrations over Europe, and the importance of including the feedbacks between aerosols and meteorology for air quality simulations.Compared with a previous application of Polyphemus over Europe in 2001 that used MM5 as the MetM (Sartelet et al., 2007), this study uses WRF as the MetM and an updated version of Polyphemus, includes a much more comprehensive model evaluation with a number of surface networks and satellites, and intercompares the predictions of WRF/Polyphemus and WRF/Chem-MADRID at different grid resolutions.Compared with recent applications of WRF/Chem over Europe, the aerosol module MADRID used in this work includes secondary organic aerosol (SOA) formation that was not included in the study of San José et al. (2009) and that differs from the simpler SOA model SORGAM used by Tuccella et al. (2012) and Forkel et al. (2012).It also includes the aerosol-cloud-precipitation feedbacks that were not included in San José et al. (2009) and Tuccella et al. (2012).In addition, this work examines the sensitivity of predictions to horizontal grid resolution and biogenic emissions that were not addressed in previous WRF/Chem applications over Europe.
The results from this study will be presented as a sequence of two parts.Part 1 describes the two modeling systems: WRF/Polyphemus and WRF/Chem-MADRID, their configurations and the simulation setup, evaluation protocols and observational databases used, the evaluation of meteorological predictions and sensitivity to horizontal grid resolutions using WRF, and the effect of aerosol/meteorology interactions on meteorological fields using WRF/Chem-MADRID.Part 2 (Zhang et al., 2013) describes the evaluation for chemical concentrations and intercomparisons between chemical predictions from the two models, sensitivity of the model predictions to horizontal grid resolutions, and the effect of interactions between meteorology and aerosols predicted with WRF/Chem-MADRID on air pollutant concentrations.

WRF/Chem-MADRID and WRF/Polyphemus
Table 1 summarizes inputs and treatments of major atmospheric processes in the two AQMs used in this study.WRF/Chem-MADRID is based on publicly released WRF/Chem version 3.0 and offers two additional gas-phase mechanisms (i.e., CB05 and SAPRC99) and one additional aerosol module (MADRID) that are alternatives to default gas-phase mechanisms and aerosol modules.A detailed description can be found in Zhang et al. (2010aZhang et al. ( , 2012a)).WRF/Chem-MADRID has been applied to eastern Texas in the US to simulate PM and its interactions with meteorology with different gas/particle mass transfer approaches (Zhang et al., 2010a), to the eastern US to forecast realtime air quality (Chuang et al., 2011), and to the continental US (CONUS) to simulate surface O 3 and PM concentrations and aerosol feedbacks using different gas-phase mechanisms (Zhang et al., 2012a) and different aerosol modules (Zhu et al., 2011).The air quality modeling platform Polyphemus with the CTM Polair3D has been widely used for modeling pollution buildup and transport on urban to continental scales (e.g., Sartelet et al., 2008Sartelet et al., , 2012;;Royer et al., 2011), and specifically to simulate the year 2001 over Europe (Sartelet et al., 2007).A detailed model description setup of Polair3d/Polyphemus is given by Sartelet et al. (2007).Compared with the study of Sartelet et al. (2007), major updates in the model treatments include the gas-phase chemistry, the calculation of photolysis rates, the treatment of organic aerosols, the number of vertical levels and the land use cover.
To minimize differences in model predictions, the same or similar modules are chosen for both model simulations whenever possible.For example, both models use the same gas-phase chemical mechanism (CB05) (Yarwood et al., 2005), the same photolysis scheme (Fast-J) (Wild et al., 2000), and the same aqueous-phase chemical mechanism that is based on the Carnegie Mellon University (CMU) aqueous-phase chemistry of Fahey and Pandis (2001).Although Polyphemus/Polair3D is an offline CTM, photolysis rates are computed online, and thus the influence of particles on photolysis rates is taken into account (Real and Sartelet, 2011).The major differences between the two AQMs lie in heterogeneous chemistry, dry and wet deposition of gaseous and aerosol species, aerosol treatments, and aerosol-cloud interactions.While the version of WRF/Chem-MADRID used in this study does not treat heterogeneous chemistry, Polyphemus includes heterogeneous reactions of HO 2 , NO 3  Jacob (2000).Polyphemus uses the SIze REsolved Aerosol Model (SIREAM)-SuperSorgam aerosol module (Kim et al., 2011), and WRF/Chem-MADRID uses the MADRID aerosol module of Zhang et al. (2010aZhang et al. ( , 2012a)).
Although both aerosol models use a sectional size representation with 8 sections between 0.0215 and 10 µm and simulate aerosol thermodynamics using ISORROPIA (Nenes et al., 1998) for inorganic species, dynamic processes (nucleation, coagulation and condensation/evaporation) and organic aerosol thermodynamics, they differ in several aspects.Both models include similar sets of SOA precursors (e.g., aromatics, long-chain alkanes, long-chain alkenes, isoprene, and terpenes), and use an absorptive approach for hydrophobic SOA (Super-Sorgam of Kim et al. (2011) and Friedlander (1979) that accounts for the competition between nucleation and condensation.Although SIREAM in Polyphemus may be used with two different parameterizations for nucleation, nucleation is not taken into account in this work.For gas/particle mass transfer, both models use the bulk equilibrium approach in this work.In Polyphemus, for inorganic compounds, the weighting scheme used to redistribute the total particle equilibrium concentrations between the particles of different sizes (sections) depends on the condensation/evaporation kernel of the condensation/evaporation rate (Debry et al., 2007).In WRF-Chem, the redistribution of transferred mass also depends on the condensational growth law (Zhang et al., 2004).A further difference between the two models lies in the sea-salt components (sodium and chloride), which are included in WRF/Chem-MADRID but not included in the equilibrium calculation in Polyphemus despite their inclusion in the PM composition.
The dry and wet deposition treatments used in the two models are different.WRF/Chem-MADRID calculates the dry deposition fluxes of gases based on the surface resistance of Wesely (1989), whereas Polyphemus uses a surface resistance parameterization that is similar to that of Wesely (1989) but with updated treatments of Zhang et al. (2003) that consider non-stomatal resistance for all depositing gases.Zhang et al. (2003) compared observed dry deposition velocities of O 3 and SO 2 calculated with and without considering nonstomatal resistance (e.g., in-canopy aerodynamic, soil and cuticle resistances) and found that the calculated dry deposition velocities with non-stomatal resistance over wet canopy are much higher by about a factor of two than those without non-stomatal resistance, and the former agreed better with observations and thus provided a more realistic treat-ment of cuticle and ground resistance.For dry deposition of PM, the modules used in both models calculate particle dry deposition velocities as a function of particle size and density and relevant meteorological variables, but using different modules.WRF/Chem-MADRID uses the parameterization of Venkatram and Pleim (1999).Compared to the traditional approach that is based on electrical analogy, this parameterization conserves mass because it accounts for the fact that the resistance component depends on a concentration gradient, whereas the sedimentation term does not.Polyphemus uses the parameterization of Zhang et al. (2001) that treats dry deposition processes, such as turbulent transfer, Brownian diffusion, impaction, interception, gravitational settling, and particle rebound.Despite the use of different modules (Easter et al. (2004) for WRF/Chem-MADRID and Sportisse and Dubois (2002) for Polyphemus), both models include similar treatments for belowcloud scavenging of gases and use effective Henry's law constant for major water-soluble gases.WRF/Chem-MADRID considers additional in-cloud scavenging of gases that is not treated in Polyphemus.WRF/Chem-MADRID treats in-and below-cloud wet removal of PM based on the parameterization of Easter et al. (2004).Polyphemus only treats in-cloud scavenging parameterization of PM based on the parameterization of Roselle and Binkowski (1999).For aerosol-cloud interactions, WRF/Chem-MADRID includes an aerosol activation parameterization of Abdul-Razzak and Ghan (A-R & G) (2002) and simulates aerosol direct, semi-direct, and indirect effects.Polyphemus allows the activation of particles if they exceed a critical diameter of 0.7 µm (Strader et al., 1998) but does not simulate aerosol direct, semi-direct, and indirect effects (other than the aerosol feedbacks into photolysis rates).These differences in model treatments together with other differences (e.g., advection and chemical boundary conditions) will affect chemical concentrations simulated with both models.

Simulation design
Both models use the meteorological fields produced by WRF with an online coupling for WRF/Chem but an offline coupling for Polyphemus.The physics options selected for the WRF simulations are summarized in Table 2. Figure 1   Sea-salt emissions Biogenic Online module of Gong et al. (2002) Offline emissions of Simpson et al. (1999) in the baseline simulations; online emission modules, i.e., modified Guenther (Guenther et al.,1995) and MEGAN 2.04 (Guenther et al., 2006) used in sensitivity simulations over D01 Monahan et al. (1986) Offline emissions of Simpson et al. (1999) Gas-phase chemistry CB05 (Yarwood et al., 2005) Same as WRF/Chem-MADRID Photolysis Fast-J (Wild et al., 2000) Fast-J (Wild et al., 2000), calculated every hour depending on the simulated aerosol concentration Aerosol module Model of Aerosol, Dynamics, Reaction, Ionization, and Dissolution (MADRID) (Zhang et al., 2004(Zhang et al., , 2010a) ) SIREAM-SuperSorgam (Debry et al., 2007;Kim et al., 2011) Dry deposition for gases Surface resistance of Wesely (1989) Zhang et al. (2003) with surface resistance of Wesely (1989) Dry deposition for aerosol Venkatram andPleim (1999) Zhang et al. (2001) Wet deposition for gases In-and below-cloud scavenging parameterization of Easter et al. (2004), with the effective Henry's law constant of SO 2 , NH 3 , HNO 3 , HNO 2 , and HCl Below-cloud scavenging parameterization of Sportisse and Dubois (2002) with the effective Henry's law constant of SO 2 , NH 3 , HNO 3 , HNO 2 , and HCl Wet deposition for aerosol In-and below-cloud wet removal of particulates (Easter et al., 2004) In-cloud scavenging parameterization of Roselle and Binkowski (1999) Aerosol activation Abdul-Razzak and Ghan (A-R&G) (Abdul-Razzak and Ghan, 2002) Particles are activated if they exceed a critical diameter of 0.7 m (Strader et al., 1998) Aerosol direct effect Goddard shortwave radiative transfer model of   through-urban WRF/Chem (GU WRF/Chem) of Zhang et al. (2012c).In Polyphemus, initial and boundary conditions are extracted from outputs of the global chemistry transport model Mozart 2 run over a typical year for gases, and outputs of the Goddard Chemistry Aerosol Radiation and Transport (GOCART, Chin et al., 2000) for the year 2001 for particulate sulfate, dust, black and organic carbon (Sartelet et al., 2007).Anthropogenic emissions are based on the 2001 EMEP expert inventory (http://www.emep.int)for both models.However, as indicated in Mallet and Sportisse (2006), large uncertainty exists in the vertical distribution of the EMEP emissions.In Polyphemus, the surface and elevated sources are assumed to be released in the first model layer and several upper layers, respectively, at the median height of each layer defined in WRF.They are distributed following a vertical profile in WRF/Chem.Given differences in the first model layer height and the thickness of each model layer between the two models, the vertical distributions of emissions are different, which will affect model predictions of chemical concentrations.Sea-salt emissions are simulated online based on Gong et al. (2002) in WRF/Chem-MADRID and offline based on Monahan et al. (1986) in Polyphemus.While Polyphemus does not simulate mineral dust emissions, WRF/Chem-MADRID uses a modified Shaw (2008) online module that generates emissions from soil surfaces (note that road dust emissions are not simulated) as described in Zhang et al. (2012c).The land types that can generate dust include grassland, shrubland, mixed shrubland/grassland, savanna, and barren or sparsely vegetated land.The biogenic emissions of Simpson et al. (1999) are used in the baseline simulations over D01 and D02 for both models.Sartelet et al. (2012) reported that the formations of O 3 and SOA are sensitive to biogenic volatile organic compounds (BVOCs).To examine this sensitivity, two additional online biogenic emission inventories are used in the sensitivity simulations using WRF/Chem-MADRID: the Biogenic Emissions Inventory System Version 3.13 (BEIS3.13)based on Guenther et al. (1993Guenther et al. ( , 1999) ) and updates in Schwede et al. (2005), which was further modified to map terpene emissions with terpenes treated in CB05 as described in Zhang et al. (2012c) and the Model of Emissions of Gases and Aerosols from Nature version 2.04 (MEGAN 2.04) of Guenther et al. (2006).BEIS3.13 and the Simpson emission scheme use leaf-scale emission factors, and MEGAN uses the canopy-scale emission factors (Pouliot and Pierce, 2009;Sartelet et al., 2012).MEGAN was developed to replace BEIS3, although the canopy-scale emission factors in MEGAN are still primarily based on leaf-and branch-scale emission measurements that are extrapolated to the canopy scale using a canopy environment model (Guenther et al., 2006).Although terpene emissions are distributed among pinene, limonene, and sesquiterpenes with constant factors in the Simpson and MEGAN schemes, different emission factors are used for several species in MEGAN (Sartelet et al., 2012).Differences among these emission schemes are discussed in several studies (e.g., Pouliot and Pierce, 2009;Steinbrecher et al., 2009;Sartelet et al., 2012).
To estimate the effects of aerosols on model predictions through various feedback mechanisms, an additional simulation is performed using the online-coupled WRF/Chem-MADRID with the MEGAN2 BVOC module by turning off primary aerosol emissions and secondary aerosol formation.The differences in the model predictions between this simulation and the simulation using WRF/Chem-MADRID with the MEGAN2 BVOC module that include all primary aerosol emissions and secondary aerosol formation represent the effects of aerosols via various feedback mechanisms.

Observational data and evaluation protocol
Table 3 summarizes the surface and satellite datasets and variables used in the evaluation.The surface meteorological datasets include observations from the Baseline Surface Radiation Network (BSRN) and European Climate Assessment & Dataset (ECA & D), reanalysis data from NCEP, and a combination of observations and interpolated data from the National Oceanic and Atmospheric Administration Climate Diagnostics Center (NOAA/CDC).The meteorological variables evaluated include downward shortwave and longwave radiation fluxes (SWDOWN and LWDOWN, respectively), outgoing longwave radiation fluxes (OLR), temperature, specific humidity, and relative humidity at 2 m (T2, Q2, and RH2, respectively), wind speed at 10 m (WS10), wind direction at 10 m (WD10), and total daily precipitation (Precip).Simulated vertical profiles of temperature (T ), dew point (Td), as well as wind speed (WS) and wind direction (WD) are also evaluated using sounding observations from the NCAR DS353.4ADP.The chemical surface datasets include EMEP, the European air quality database (AirBase), and the Base de Données de la Qualité de l'Air (BDQA).Chemical variables evaluated include hourly and daily average NH 3 , SO 2 , NO 2 , daily average HNO 3 , hourly, maximum 1 h and maximum 8 h average O 3 , and hourly and daily average PM 2.5 , PM 10 , and PM 10 composition (i.e., sulfate (SO 2− 4 ), nitrate (NO − 3 ), ammonium (NH + 4 ), sodium (Na + ), and chloride (Cl − )).EC and organic matter (OM) are not evaluated because of a lack of observations.EMEP contains data from the Convention on Long-Range Transboundary Air Pollution and represents hourly O 3 data and daily data for other species at regional background sites mostly at farmland, rural, and lightly forested areas (Torseth and Hov, 2003).The AirBase database contains observations from the European Air Quality monitoring network (EuroAirnet) provided by European Union Member States, European Environment Agency member countries, and cooperating countries.It contains hourly data for all species and additional daily data for PM 2.5 , PM 10 , and PM 10 composition at various types of sites such as rural background, rural, suburban, urban, traffic, and industrial sites.BDQA is the French database for air quality that covers France with hourly measurements at various types of sites.Because the grid resolution used in this work is not commensurate with urban, traffic and industrial sites, those sites from the AirBase and the BDQA are excluded from the model evaluation, except for urban background sites in Air-Base.Large uncertainties exist in these observational data due to artifacts in the measurements and the impacts of local geographical conditions on the measurements (Schaap et al., 2004;Sartelet et al., 2007).The satellite datasets include the Total Ozone Mapping Spectrometer/the Solar Backscatter UltraViolet (TOMS/SBUV), the Measurements Of Pollution In The Troposphere (MOPITT), the Global Ozone Monitoring Experiment (GOME), and the Moderate Resolution Imaging Spectroradiometer (MODIS).The variables evaluated column concentrations of tropospheric CO and NO 2 , tropospheric O 3 residual (TOR), and aerosol optical depth (AOD).To evaluate all observations related to MODIS, the monthly-mean AOD predictions are calculated as an average of column-integrated values during 15:00-20:00 UTC when the Terra satellite passes over Europe, following Roy et al. (2007).
The protocols for performance evaluation follow those used in Zhang et al. (2009Zhang et al. ( , 2012a)), including spatial distributions, temporal variation including daily values over the whole domain and hourly values at specific sites, and domainwide statistics.Statistics includes the mean bias (MB), mean gross error (MGE), the root mean squared error (RMSE), the normalized mean bias (NMB), the normalized mean error (NME), correlation coefficient (Corr), and index of agreement (IOA).The model performance is evaluated over the D01/D02/D03 domains for WRF simulations, over the D01/D02 domains for the WRF/Polyphemus and WRF/Chem-MADRID simulations using offline BVOC emissions, and over the D01 domain for the sensitivity simulations using WRF/Chem-MADRID with two online BVOC emission modules.In addition to domainwide statistics, performance statistics is also calculated at individual sites.Since WD10 is a vector, a difference between 0 • and 360 • in the wind rose plot actually indicates a zero bias (rather than a difference of 360 • ); treating it as a numerical value in the traditional statistics calculation may give misleading results (Zhang et al., 2006).Therefore, for situations in which the differences between observed and simulated WD10 are greater than 180 • , the simulated WD10 is adjusted to account for the actual differences between it and observed WD10 in the wind rose plot in the statistical calculations and plotting.

Spatial distribution and domainwide performance statistics
Table 4a shows domainwide performance statistics over D01 for the nine meteorological variables evaluated.Figures 3  and 4 show simulated spatial distributions of T2, RH2, WSP10, WD10, and Precip overlaid with observations over D01 and their associated MBs in January and July, respectively.In January, SWDOWN is largely overpredicted by the simulation at a horizontal grid resolution of 0.5 • with an NMB of 66.3 %, whereas LWDOWN is slightly underpredicted with an NMB of −1.8 %.The overpredictions of SWDOWN are mainly attributed to the overestimation of the heating rates resulting from the underestimation of the cloud optical thickness (COT) by the Goddard shortwave radiation scheme, because the scheme in this version of WRF did not account for the contributions of snow, rain, and graupel to COT and thin cloud radiative forcing (Shi et al., 2007;Zhang et al., 2012a).Uncertainties may also exist in the observations as data are only available at six sites from BSRN.OLR is overpredicted with an NMB of 13.2 %, likely because the RRTM longwave radiation scheme coupled with the Goddard shortwave radiation scheme tends to overpredict radiation heating (Shi et al., 2007).WRF reproduces the observed spatial gradients with the coldest temperature in the northwest and the hottest in the south.The largest cold biases (−5 to −2 • C) occur in the Alps area, one of the great mountain range systems in Europe, which stretches about 1200 km across seven countries from Austria and Slovenia in the east, Switzerland, Liechtenstein, Germany, France to the west and Italy and Monaco to the south, indicating the model's difficulty in capturing the temperature variations in mountainous regions.The cold biases are also large (−3 to −1 • C) in the eastern portion of the domain where the temperatures are low, likely due to too cold soil temperature, too much soil moisture, too many daytime clouds, and poor treatment of snow-related processes as reported in several mesoscale   2 NCEP contains pressure, height, T2, dew point, WS10, and WD10.Q2 and RH2 were calculated using T2, dew point, and pressure.
2010, 2011).Similar large positive biases in WS10 are found for the applications of WRF in both winter and summer in the US (Yahya et al., 2012) and East Asia (Zhang and Zhang, 2012).WD10 predictions agree reasonably well with observations.The MBs for WD10 at most sites in January are within 30 • , with larger MBs occurring in complex terrains such as the Alps and the Carpathian Mountains.The domain average MB, NMB, and IOA are 14.4 • , 7.1 %, and 0.7, respectively.Precip is underpredicted at many sites with an MB of −1.7 mm day −1 and an NMB of −54.8 %, particularly in the Alps and coastal areas in Norway and Estonia where the precipitation levels vary greatly (e.g., 1-10 mm day −1 over the Alps), making an accurate prediction at this grid resolution very challenging.Similar to January, SWDOWN in July is largely overpredicted with an NMB of 69.3 % over D01, and LW-DOWN is slightly underpredicted with an NMB of −3.8 %.In addition to possible underpredictions in COT associated with the above limitations in the Goddard shortwave radiation scheme, the overpredictions of SWDOWN in summer are partly attributed to the fact that the effect of cumulus clouds on radiation and the contribution of convective clouds to cloud water content are not accounted for in this version of WRF (Zhang et al., 2012a, b).The overpredictions in SWDOWN in this work are consistent with those reported by Zhang et al. (2012a) from the July 2001 application of WRF/Chem over continental US and the evaluation of SWDOWN against observations from the US networks (i.e., CASTNET and SEARCH) and those reported by Zhang et al. (2012b)  observations from BSRN, CASTNET, and SEARCH.OLR performs slightly better in July than in January, with an NMB of 6.9 %.Similar to January, 0.5 • in July captures well the spatial gradients of T2 and Q2 with the coldest/driest values in the northwest portion of the domain and the hottest/wettest in the southeast.The largest cold biases in T2 occur in the southern Alps, the Balkan and Rhodope mountains in Bulgaria and the Pontic and Taurus mountains in Turkey, and the largest warm biases in T2 occur on the northern edge of the Alps, eastern Austria, and central Romania.These biases compensate each other, resulting in an overall good agreement with the observed T2 with an MB of −0.3 • C and an NMB of −1.5 %.Inaccurate predictions of clouds and land surface heat fluxes may explain largely the biases in T2 predictions.The driest biases in Q2 occur in the south where Q2 is the highest, and the wettest biases occur over the northern edge of the Alps and Austria.The compensation of dry and wet biases results in a domainwide MB of −0.1 • C and an NMB of −1.2 %.Compared with the results in January, WS10 is overpredicted at most sites but to a lesser extent and even becomes underpredicted at many sites in the UK, Denmark, and France with MBs of −4 to −0.8 m s −1 ; their compensation leads to a very good agreement with a domainwide MB (−0.1 m s −1 ) and NMB (−2.3 %).Simulated WD10 predictions in July agree better with observations than in January.The MBs for WD10 predictions at most sites in    and c, respectively.For a fair evaluation of the sensitivity of the model predictions to different grid resolutions, the perfor-mance statistics is also calculated over D02 using model predictions from D01 at a horizontal grid resolution of 0.5 • (see Table 4b) and over D03 using model predictions from D01 and D02 at horizontal grid resolutions of 0.5 • and 0.125 • , respectively (see Table 4c).
In January, the simulated daily-mean T2 and Q2 agree reasonably well with observations at the three grid resolutions.Compared to the simulations at a horizontal grid resolution of 0.5 • over D02, the domainwide statistical performance at 0.125 • is slightly improved for Q2 but is slightly worse for T2 and RH2.Compared to the simulations at 0.5 • and 0.125 • over D03, the domainwide statistical performance at 0.025 • is slightly improved for T2.For Q2 and RH2, it is slightly worse than at 0.5 • but better than at 0.125 • .WS10 is overpredicted on all days, with the smallest NMB over D03 among the three simulations at different grid resolutions over D01-D03 (i.e., 59.2 % vs. 52.5 % vs. 16.6 %).Using the same set of observations over D02, a moderate improvement in performance statistics is found at 0.125 • compared to 0.5 • (i.e., an NMB of 52.5 % vs. 63.1 %).Using the same set of observations over D03, a slight deterioration in model performance statistics is found at 0.025 • as compared to model results at 0.125 • (i.e., an NMB of 16.6 % vs. 9.9 %), but a moderate improvement is found as compared to that at 0.5 • (i.e., an NMB of 16.6 % vs. 29.7 %).Similar overpredictions in WS10 have also been reported by Vautard et al. (2012), who intercompared several meteorological models during AQMEII.The high WS10 bias in the model simulations at all resolutions, in particular, 0.5 • over D01 and 0.125 • over D02, is due to the fact that WRF does not resolve subgrid-scale roughness elements (e.g., the surface roughness length or the friction velocity at the surface) even at the grid resolutions of 0.125 • and 0.025 • .Using a corrected drag parameterization that accounts for the topographic effects, Mass and Ovens (2011) showed large and consistent improvements in the low and moderate low-level winds, and Jiménez and Dudhia (2011) showed reduced overpredictions in wind speeds over plains/valleys.The domainwide NMB is slightly better (NMB changes from 7.1 % over D01 to 5.7 % over D02).The model biases in daily WD10 predictions from D01 and D02 simulations are similar.The NMBs of WD10 predictions are similar at 0.5 • and 0.125 • over D02 with slightly better performance over D02, and at all three grid resolutions over D03 with slightly better performance over D03.Daily mean Precip is significantly underpredicted in the D01 and D02 simulations in the first and last weeks in January.Both underpredictions and overpredictions occur on some days over D03, especially on days with observed intensive precipitations (e.g., 1, 4, 5, 21, 23, and 27 January).For domainwide statistical performance, the NMBs of Precip are −54.8%, −50.0 %, and −13.2 % over D01, D02, and D03, respectively.Compared to the simulations at 0.5 • over D02, the underprediction of Precip at 0.125 • is greatly reduced (NMB changes from −66.3 % to −50.0 %).Compared to the simulations at 0.5 • and 0.125 • over D03, the underprediction in Precip at 0.025 • is also largely reduced, with NMBs changing from −47.6 % at 0.5 • and −49.7 % at 0.125 • to −13.2 % at 0.025 • .The improvement in the Precip predictions over progressively nested domains demonstrates the benefits of using a fine grid resolution in simulating mesoscale events.Compared with January, WRF performs much better for T2, Q2, WS10, WD10, and Precip over D01 in July in terms of both daily mean and performance statistics.Compared with WRF performance at 0.5 • over D02, the WRF simulation at 0.125 • over D02 slightly improves model performance in predicted T2, Q2, RH2, and WD10 in terms of both daily predictions and domainwide performance statics.Compared with the simulations at 0.5 • over D01 and at 0.125 • over D02, the simulation at 0.025 • over D03 gives slightly higher T2, Q2, and RH2 on most days, resulting in an improved performance, with NMBs of −0.8 %, 2.1 %, and 1.3 %, respectively.Compared with the performance at 0.5 • over D01, WS10 predictions are slightly worse on most days at 0.125 • over D02 and moderately worse at 0.025 • over D03, which is also reflected in the domainwide performance statistics (NMBs of −2.3 %, −13.3 %, and −25.5 %, respectively).Compared with model predictions at 0.5 • over D02, the model performance of WS10 is slightly worse at 0.125 • over D02 with NMBs of −10.0 % and −13.3 %, respectively.Compared with model predictions at 0.5 • and 0.125 • over D03, The model performance in WS10 over D03 at 0.025 • is slightly worse than that at 0.5 • (an NMB of −25.5 % vs. −24.0%) and better than that at 0.125 • (an NMB of −25.5 % vs. −30.8%).The performance of WD10 over D02 is slightly improved for some days (e.g., July 5-8 and 18) as compared with that over D01, leading to a slight improvement at 0.125 • as compared to that at 0.5 • (the NMB changing from 4.2 % to 2.9 %).The statistical performance of WD10 over D03 is slightly worse at 0.025 • as compared to at 0.5 • and 0.125 • , with NMBs of 4.7 %, 3.9 %, and 4.3 %, respectively.Compared with D01 results at 0.5 • , daily mean Precip is overpredicted or underpredicted to a larger extent on more days at 0.125 • over D02.The overpredictions in Precip dominate, resulting in an NMB of 82.3 % over D02 at 0.125 • (vs.52.5 % at 0.5 • ).Compared with D01 and D02 results, although large differences exist between simulated and observed Precip on some days (e.g., 5, 7, 10, 13, 14, 17, 18, and 26 July), the predicted Precip at 0.025 • over D03 is improved on 4 and 6 July because WRF can capture well some of the large convective precipitation events in France.The large positive and negative biases over D03 compensate each other, resulting in an improved performance with NMBs of −33.7 % at 0.025 • (compared to −47.0 % at 0.5 • and −47.4 % at 0.125 • over the same D03 domain).
As shown in Table 4b, compared with results at 0.5 • over D02, the use of a grid resolution of 0.125 • over D02 slightly improves the performance of Q2 and WD10 in both months, WS10 in January, and T2 and RH2 in July, and greatly reduces the large underpredictions in Precip in January.As shown in Table 4c, compared with D03 results at 0.125 • , the use of a grid resolution of 0.025 • slightly improves the performance of T2 and RH2 in both months, Q2 and WD10 in January, and WS10 in July and greatly reduces the large underpredictions in Precip in both months.Those improvements are due to better defined and more realistic representations of mesoscale topographic features and structures as well as corresponding atmospheric circulations as the horizontal grid resolution increases; they are consistent with several studies.For example, Mass et al. (2002) showed that the use of a finer grid resolution showed some improvement of WRF performance for some events (e.g., strong forced convection, diurnal circulations, and heavy precipitation events), 10 m wind, 2 m temperature, and sea-level pressure forecasts as grid spacing decreases from 36 to 12 km.While Mass et al. (2002) showed that, despite more detailed and finer structure, further decreasing grid spacing from 12 km to 4 km has only small improvements in traditional model evaluation statistics (e.g., MB, RMSE, threat scores, etc.), Misenis and Zhang (2010) showed that, as the grid resolution increases from 12 km to 4 km, the performance of WRF improves in terms of NMBs for RH2, WS10, and planetary boundary layer height.However, Table 4b and c also showed that not all meteorological predictions are improved using finer grid resolutions.For example, comparing WRF at 0.125 • with WRF at 0.5 • over D02 (Table 4b), slight deteriorations oc-cur for a few variables such as SWDOWN in both months, LWDOWN, T2, RH2 in January, OLR and WS10 in July, and much worse performance occurs for Precip.In addition to horizontal grid resolutions, the accuracy of the meteorological predictions depends on many other factors including the accuracy of the input data such as land use, and boundary conditions, the accuracy of model algorithms for all major meteorological processes under all meteorological and topographical conditions, as well as the uncertainties in model configurations (e.g., vertical grid resolution, nesting options, and data assimilation options).The inaccuracies and limitations in those other factors at a fine grid resolution may offset the benefit of the fine grid resolution, resulting in little improvements or even worse in some meteorological variables.

Description of selected sites
The observations of precipitation and other meteorological variables such as T2, Q2, WS10, and WD10 come from different databases (i.e., ECA&D and NCEP, respectively), and no observations of all these variables were available at the same sites.Ten sites and five co-located sites are selected for detailed temporal analysis for T2, Q2, and WS10.Eight of the ten sites are selected for evaluation of vertical profiles of T , T d , WS, and WD against sounding data.Eight sites are selected for detailed analysis for precipitation.Table 5 summarizes the major characteristics of the selected sites.Among the ten NCEP sites selected for analysis of T2, Q2, and WS10, three sites (Paris Charles de Gaulle airport, Paris Orly airport, and Melun, France) are in D03, two sites (Milan 1/Milan 2, Italy and Bilbao, Spain) are in D02 but outside D03, and five sites (Stockholm 1, Sweden; London 1/London 2, UK; Düsseldorf 1/Düsseldorf 2, Germany; Liberec, Czech Republic; and Madrid 1/Madrid 2, Spain) are in D01 but outside D02 and D03.Among the eight sites selected for a detailed analysis of Precip, three sites (Pris-14E, Brétignysur-Orge, and Chartres-Champhol, France) are in D03, two sites (Milan, Italy and San Sebastián-Igueldo, Spain) are in D02 but outside D03, and three sites (Stockholm, Sweden; Görlitz, Czech Republic; and Düsseldorf, Germany) are in D01 but outside D02 and D03.If other monitoring sites are within a 30 km radius of these selected sites, they are considered to be co-located with the selected sites.The observations from co-located sites are also plotted even though the simulated results at selected and co-located sites fall into the same grid cell in the respective simulation domain.Among the eight sites selected for precipitation analysis, six sites are co-located with the sites selected for analysis of T2, Q2, and WS10.Stockholm is co-located with Stockholm 1. Düsseldorf is co-located with Düsseldorf 1/Düsseldorf 2, Germany.Görlitz is co-located with Liberec, the Czech Republic.Pris-14E and Brétigny-sur-Orge are co-located with Paris Orly airport and Melun, France, respectively.Milan is co-located with Milan 1/Milan 2, Italy.

Simulations over D01 at a horizontal grid
resolution of 0.5 Compared with January, WRF performs much better at all sites, particularly at Milan 1/Milan 2 in July, although relatively large discrepancies between predictions and observations remain at the two mountain sites: Bilbao and Liberec.
Observed summer temperatures at all sites exhibit a stronger diurnal variation than winter temperatures, particularly at Madrid 1/Madrid 2 and Bilbao, due to their high altitudes and/or dry climate.Such a strong diurnal variation at all sites is well reproduced by WRF.WRF gives lower T2 on all days at Bilbao but higher T2 on some days at Liberec, likely due to inaccurate predictions in shortwave radiation, cloud formation, and air-surface heat fluxes at those sites.The nighttime temperatures at Stockholm 1/Stockholm 2 are overpredicted compared to observations at Stockholm 2 but underpredicted compared to observations at Stockholm 1.The nighttime temperatures at other urban sites such as Madrid 1/Madrid 2, Milan 1/Milan 2, and Paris Orly are generally underpredicted due to a poor representation of urban canopy and urban heat island in the default treatments of WRF.Using WRF coupled with a single-layer urban canopy model (UCM) for energy and momentum exchange between the urban surface and the atmosphere, several studies showed large improvement in simulated near-surface air temperature and relative humidity during nighttime (e.g., Chen et al., 2004;Shrestha et al., 2009;Kusaka et al., 2012;Kim et al., 2012).This is because the UCM can provide a more realistic energy balance of the urban region, via parameterizations of street canyons, building wall/roof, road surfaces, and anthropogenic heating.The CO 2 domes also increase surface and near-surface temperature (Jacobson, 2010) 2) and at Milan 1/Milan 2. In particular, the underpredic-tions in Q2 at Düsseldorf 1/Düsseldorf 2 are associated with the snow events during 18-21 January 2001.In July, WRF gives relatively large underpredictions in Q2 at Milan 1/Milan 2 throughout the month, during which thunderstorms occurred on 14 days (i.e.,3,(7)(8)(9)(10)(14)(15)(16)(18)(19)(20)24,(27)(28)(29).The thunderstorms resulted in a very high humidity that could not be reproduced by the WRF simulations.These results illustrate the model's difficulty in simulating water balance under humid subtropical summer climate conditions.At most sites, MBs from the WRF simulation over D01 are within ±0.6 g kg −1 , which indicates a very good performance.Larger biases occur at Milan 1 in both months for the reason mentioned above.Figures 12-13 show simulated and observed hourly WS10 at the ten sites.In January, WRF captures WS10 well in terms of both hourly variations and magnitudes at London 1/London 2, Paris Charles de Gaulle, Paris Orly, and Melun.At the remaining sites, WRF simulates well the temporal variations but tends to overpredict the magnitudes of WS10, particularly at Stockholm 1/Stockholm 2, Düsseldorf 1/Düsseldorf 2, and Milan 1/Milan 2, due mainly to WRF's incapability of resolving subgrid-scale topography at these sites and light wind conditions at Stockholm 1/Stockholm 2 and Milan 1/Milan 2. The inability of WRF to simulate the stable boundary layer has also been found by Vautard et al. (2012).In July, WRF reproduces WS10 well at Stockholm 1/Stockholm 2, Düsseldorf 1/Düsseldorf 2, Liberec, and Bilbao,      Figures 16-17 show site-specific simulated and observed daily Precip at the eight sites.Observed Precip varies largely among these sites and between January and July at the same site, particularly at the high altitude sites (e.g., Görlitz, San Sebastián-Igueldo).In January, WRF is able to capture precipitation events at all sites with exact or close time win-dows, but overpredicts Precip on some days at San Sebastián-Igueldo while underpredicting at the remaining sites.In July, overpredictions or underpredictions occur during some hours/days at all sites except for Pris-14E and Brétigny-Sur-Orge, where WRF at 0.5 • tends to underpredict.Largest deviations between the hourly predictions of Precip and  caused the overprediction of cloud ice, graupel, as well as surface rainfall (http://www.mmm.ucar.edu/wrf/users/wrfv3/known-prob.html) in WRF/Chem v3.0 and older. Hwever, some uncertainties also exist in the Precip observations.For example, the very low or zero observed precipitation did not reflect the occurrence of thunderstorms recorded at Milan on some days in July (i.e.,3,[7][8][9][27][28][29].

Sensitivity to horizontal grid resolution
As shown in Figs.8-9, at Charles de Gaulle, Paris Orly, and Melun, WRF predictions of T2 in January and July at grid resolutions of 0.125 • over D02 and 0.025 • over D03 are very similar; both give slightly higher maximum T2 and slightly lower minimum T2 on most days.WRF predictions at 0.125 • and 0.025 • in both months give slightly better maximum and minimum T2 against observations as compared with that at 0.5 • in January.At Milan 1/Milan 2, WRF at a grid resolution of 0.125 • gives slightly higher maximum T2 and slightly lower minimum T2 on most days in both January and July,   but lower T2 throughout January and slightly higher maximum T2 and slightly lower minimum T2 in July at Bilbao.Compared to WRF at 0.5 As shown in Figs.16-17, at San Sebastián-Igueldo in January, while WRF at 0.5 • tends to overpredict Precip, WRF at 0.125 • reduces the wet bias (MB is reduced from 2.82 mm −1 to −1.72 mm −1 ).At Milan, WRF at 0.5 • is in slightly closer agreement with observations than at 0.125 • .At Chartres-Champhol, WRF results at 0.5 • and 0.125 • are overall similar, both underpredicting precipitation in January.The use of a finer grid resolution of 0.025 • shows a worse performance at this site.At Pris-14E and Brétigny-sur-Orge, WRF at 0.025 • gives the best predictions, although it still overpredicts or underpredicts observed precipitation to some extent during rainy periods.In July, WRF at 0.125 • gives higher dry biases than WRF at 0.5 • at San Sebastián-Igueldo but changes dry bias at 0.5 • (MB = 0.45 mm −1 ) to wet biases (MB = 1.78 mm −1 ) at Milan.WRF at 0.025 • gives the smallest wet bias at Pris-14E and Brétigny-sur-Orge but the worst overpredictions at Chartres-Champhol.
Figures 18−19 compare the simulated monthly mean vertical profiles of T , T d , WS, and WD at a horizontal grid resolution of 0.5 • with the sounding observations at eight sites.WRF at all grid resolutions (figures not shown at 0.125 • and 0.025 • ) reproduces very well vertical profiles of T at all sites in both months, although it fails to reproduce observed surface temperature inversions at Düsseldorf and Paris in January.It also captures well the vertical profiles of T d below 300 mb at most sites in both months.Larger deviations occur in vertical profiles of T d above 300 mb, with higher T d from the model, indicating the model's inability to capture moisture aloft.Among the eight sites, the model shows difficulties in capturing vertical profiles of T d at the London and Madrid sites with larger deviations from observations than those at other sites in both months.Differently from WS10 predictions, which are overpredicted at most sites in January, the observed WSs aloft are underpredicted at all sites in January.Underpredictions of WS aloft also occur at all sites in July.WRF, however, is able to simulate higher WS at altitudes where the observed WS are also high in both months.The observed WDs at various altitudes are reproduced reasonably well with 10-20  in both months and the neglect of the effect of cumulus clouds on radiation and the contribution of convective clouds to cloud water content in July.WRF reproduces the observed spatial gradients of temperatures and specific humidity with the coldest/driest values in the northwest and the hottest/wettest in the south in both January and July.In January, although the positive bias (an MB of 0.5 • C and an NMB of 19.2 %) in T2 dominates, the largest cold biases (−5 to −2 • C) occur in the Alps area and the eastern por-tion of the domain, likely due to several limitations in model initialization and treatments (e.g., too cold soil temperature, too much soil moisture, too many daytime clouds, and a poor treatment of snow-related processes).The compensation of the dry and wet biases results in a good agreement in Q2 (an MB of 0.1 g kg −1 and an NMB of 3.0 %).Large overpredictions (an MB of 2.1 m s −1 and an NMB of 59.2 %) occur in WS10 with the worst ones (>1.6 m s −1 ) over low-lying coastal areas and the Alps and the Carpathian Mountains,  due mainly to a poor representation of surface drag exerted by the unresolved topography (mountains, hills and valleys) and other smaller scale terrain features in WRF.In contrast to the relatively poorer performance in January, WRF performs well in July, with slight underpredictions in T2, Q2, RH2, and WS10 and a small MB in WD10.These results indicate the model's difficulty in simulating winter temperatures at many sites and summer temperature at some sites due to model's limitations in representing shortwave radiation, cloud formation, land surface heat fluxes, as well as wind patterns and mesoscale circulation systems over mountain/hill and high altitude regions.Precip is underpredicted at many sites with a domainwide NMB of −54.8 %, particularly in the Alps and coastal areas in Norway and Estonia, making an accurate prediction at this grid resolution very challenging.Different from January, Precip in July is slightly overpredicted at most sites with an NMB of 9.9 %, particularly over San Marino, Slovenia, and eastern Belarus.The underprediction in winter is likely due to underpredictions in ice clouds because of a lack of ice nucleation treatments in WRF.The overprediction in July may be due to too frequent afternoon convective rainfall and/or an overestimation in the intensity of the rainfall predicted by the cumulus parameterization and a bug in the cloud microphysics module.For site-specific temporal variations, in January, WRF over D01 captures well T2 in terms of both magnitudes and diurnal variations at many sites but significantly underpredicts T2 at mountain/high altitude and large urban center sites and during snow events, due to some limitations of the representations of snow melting treatment, surface energy balance, and urban heat island and CO 2 dome effects in WRF.Larger discrepancies between simulated and observed Q2 also exist at mountain/high altitude and large urban center sites and during snow events in January.WRF captures well WS10 in terms of hourly variations at all sites but overpredicts the magnitudes of WS10 at some sites with complex topography and under light wind conditions.WRF generally produces WD10 observations and diurnal variation at most sites in both months, but large deviations occur at a few sites during some hours, leading to poor overall performance.WRF is able to capture precipitation events at all sites with exact or close time windows, but underpredicts precipitation in terms of amount and lengths, due to some limitations in the Purdue Lin cloud microphysics module.In July, WRF performs much better at all sites and captures very well the strong diurnal variations of T2, despite a similar difficulty (but to a lesser extent) in capturing observed T2 at mountain sites.The underpredictions in nighttime temperatures at urban sites are attributed to an unrealistic representation of urban canopy and urban heat island in the default treatments in WRF.WRF gives relatively large underpredictions in Q2 at urban sites where thunderstorms often occur, illustrating the model's difficulty in simulating water balance under humid subtropical summer climate conditions.WRF reproduces well diurnal variations and magnitudes of WS10, but significantly underpredicts WS10 at large urban center sites.However, it also gives larger deviations in WD10 in July than in January due in part to the model's limitation in capturing the wind fields as well as heat balance over a  complex terrain and the influences of urban heat islands and CO 2 domes.Underpredictions of precipitation occur at most of the selected sites.WRF reproduces well vertical profiles of T and WD at all sites and T d below 300 mb at most sites in both months, although it tends to overpredict T d above 300 mb and underpredict WS aloft at all sites in both months.
The sensitivity of model predictions to horizontal grid resolutions is examined.In January over D02, the performance of WRF at 0.125 • slightly to moderately improves for OLR, Q2, WS10, and WD10 in terms of correlation coefficient, MB, MGE, RMSE, NMB, and NME and Precip in terms of MB, RMSE, NMB, and IOA,, demonstrating the benefits using a fine grid resolution.It slightly deteriorates for SWDOWN, LWDOWN, T2 and RH2 in terms of NMB, but with reduced RMSE and NME for T2 and RH2.In July over D02, the use of a grid resolution of 0.125 • slightly improves the model performance for all these variables except for SWDOWN, OLR, WS10, and Precip.When the grid resolution further increases from 0.125 • to 0.025 • , the model performance for all these variables is not always the best.The best model performance in terms of NMB is obtained for T2, WD10, and Precip at 0.025 • , for WS10 at 0.125 • , and for Q2 and RH2 at 0.5 • over D03 in January.The best model performance in terms of NMB is obtained for T2, RH2, and Precip at 0.025 • , for Q2 at 0.125 • , and for WS10 and WD10 at 0.5 • over D03 in July.Temporal variations of T2, Q2, and WD10 are relatively insensitive, but those of WS10 and Precip are moderately to highly sensitive to horizontal grid resolutions at most sites.The predictions of T2, Q2, RH2, WS10, and WD10 at 0.125 • and 0.025 • are very similar; both show differences with predictions at 0.5 • .Compared with results at 0.5 • , WRF at 0.125 • and 0.025 • in both months gives slightly better T2, WS10, and WD10 at most sites in January, slightly better Q2 at all sites in July, and moderate to significant improvement in precipitation at most sites in January and July.
While the above results show reasonably good performance that is consistent with other mesoscale meteorological model applications, they also indicate a need to further improve model representations of mesoscale processes and phenomena such as shortwave radiation, snow-related processes, subgrid-scale surface roughness elements, urban canopy treatments, cloud microphysics, convective cloud processes, and ice nucleation treatments at small scales.These biases in model meteorological predictions may affect the accuracies in the chemical predictions of WRF/Chem-MADRID and WRF/Polyphemus, which is presented in Part 2.
Acknowledgements.This project is sponsored by the EPA STAR #R83337601, the NSF/USDA EaSM program AGS-1049200, and the fellowship award #704389J, the Atmospheric Environment Center (CEREA)/ École des Ponts ParisTech, France, through a visiting professorship of Y. Zhang at CEREA, the Joint Laboratory of École des Ponts ParisTech and EDF R&D, Paris, France, and COST ES1004.Thanks are due to contributions of former and current members of the air quality forecasting laboratory at NCSU, including Wei Wang, Shuai Zhu, Xu-Yan Liu, Changjie Cai, Xin Zhang, and Kai Wang, who obtained dataset information, processed data, and made plots.

Figure 1 .
Figure 1.Simulation domains: D01 over western Europe, D02 over France, Germany, Netherlands, Belgium, Switzerland, Luxembourg, Slovenia, most of Austia, and part of U.K., Italy, Czech Republic, Spain, Croatia, and Poland, and D03 over the greater Paris region in France.

Fig. 1 .
Fig. 1.Simulation domains: D01 over Western Europe, D02 over France, Germany, Netherlands, Belgium, Switzerland, Luxembourg, Slovenia, most of Austria, and parts of UK, Italy, Czech Republic, Spain, Croatia, and Poland, and D03 over the greater Paris region in France.

Figure 2 .
Figure 2. Observational data from the three networks: AirBase, B model evaluation.

Fig. 3 .
Fig. 3. Simulated T2, RH2, WSP10, WD10, and precipitation by WRF overlaid with observations in January 2001 (left column) and their associated mean biases (right column).Note that for WD10, 0 degree is equivalent to 360 degrees in the wind rose plot.

Fig. 4 .
Fig. 4. Simulated T2, RH2, WSP10, WD10, and precipitation by WRF overlaid with observations in January 2001 (left column) and their associated mean biases (right column).Note that for WD10, 0 degree is equivalent to 360 degrees in the wind rose plot.

Figure 5 .
Figure 5. Simulated changes in meteorological variables due to the direct, semi-direct, and indirect effects of aerosols by WRF/Chem-MADRID in July 2001 over D01.Fig. 5. Simulated changes in meteorological variables due to the direct, semi-direct, and indirect effects of aerosols by WRF/Chem-MADRID in July 2001 over D01.

Figure 8 .
Figure 8. Simulated and observed 2-m temperatures in January 2001 at selected sites.

Fig. 8 .
Fig. 8. Simulated and observed 2 m temperatures in January 2001 at selected sites.

Figure 9 .
Figure 9. Simulated and observed 2-m temperatures in July 2001 at selected sites.

Figure 11 .
Figure 11.Simulated and observed 2-m specific humidity in July 2001 at selected sites.Statistics are not available (NA) at a few sites where observations were not available.

Fig. 11 .
Fig. 11.Simulated and observed 2 m specific humidity in July 2001 at selected sites.Statistics is not available (NA) at a few sites where observations were not available.

Figure 12 .Fig. 12 .
Figure 12.Simulated and observed 10-m wind speed in January 2001 at selected sites.Statistics is not available (NA) at one site (Stockholm 1) where observations were not available.

Figure 13 .
Figure 13.Simulated and observed 10-m wind speed in July 2001 at selected sites.Statistics is not available (NA) at one site (Stockholm 1) where observations were not available.

Figure 15 .
Figure 15.Simulated and observed 10-m wind direction in July 2001 at selected sites.Statistics is not available (NA) at one site (Stockholm 1) where observations were not available.

Figure 16 .
Figure 16.Simulated and observed daily precipitation in January 2001 at selected sites.

Fig. 16 .
Fig. 16.Simulated and observed daily precipitation in January 2001 at selected sites.

Fig. 17 .
Fig. 17.Simulated and observed daily precipitation in July 2001 at selected site.

Figure 18 .
Figure18.Monthly mean observed and simulated skew-T plots of temperature (red solid and black solid lines, respectively), dew point temperature (red dash and black dash lines, respectively), and wind speed and direction (red and black staffs and attached barbs, respectively, with a triangle, a long barb, and a short barb perpendicular to the overall staff representing 50, 10, and 5 knots, respectively) at eight stations over D01 in January, 2001.

Fig. 18 .
Fig. 18.Monthly mean observed and simulated skew-T plots of temperature (red solid and black solid lines, respectively), dew point temperature (red dash and black dash lines, respectively), and wind speed and direction (red and black staffs and attached barbs, respectively, with a triangle, a long barb, and a short barb perpendicular to the overall staff representing 50, 10, and 5 knots, respectively) at eight stations over D01 in January 2001.

Figure 19 .
Figure19.Monthly mean observed and simulated skew-T plots of temperature (red solid and black solid lines, respectively), dew point temperature (red dash and black dash lines, respectively), and wind speed and direction (red and black staffs and attached barbs, respectively, with a triangle, a long barb, and a short barb perpendicular to the overall staff representing 50, 10, and 5 knots, respectively) at eight stations over D01 in July, 2001.

Fig. 19 .
Fig. 19.Monthly mean observed and simulated skew-T plots of temperature (red solid and black solid lines, respectively), dew point temperature (red dash and black dash lines, respectively), and wind speed and direction (red and black staffs and attached barbs, respectively, with a triangle, a long barb, and a short barb perpendicular to the overall staff representing 50, 10, and 5 knots, respectively) at eight stations over D01 in July 2001.
for Polyphemus -an updated version of the MADRID 1 SOA module for WRF/Chem-MADRID).However, the Super-Sorgam SOA module in Polyphemus accounts for the NO x dependency for SOA formation from biogenic substances and aromatic compounds based on Ng et al. (2007) that is not treated in the MADRID SOA module.WRF/Chem-MADRID simulates the homogeneous binary nucleation of sulfuric acid (H 2 SO 4 ) and water vapor (H 2 O) based on the approach of McMurry

Table 1 .
Model configurations and major atmospheric process treatments in WRF/Chem and Polyphemus.

Table 3 .
Parameters and associated observational database included in the model evaluation.

Table 4a .
Comparison of performance statistics of WRF in January and July 2001 1,2 .Simulations over D01 at a horizontal grid resolution of 0.5 • against observations over D01.

Table 4b .
Comparison of performance statistics of WRF in January and July 2001 1,2 .

Table 4c .
Comparison of performance statistics of WRF in January and July 2001 1,2 .Simulations over D01 and D02 at horizontal grid resolutions of 0.5 • and 0.125 • , respectively, against observations over D02.

Table 5 .
Characteristics of sites selected for temporal analysis.Located in the fifth-largest city surrounded by the Jizera hills and mountains.Historically known for its textile industry, the Liberec region consists primarily of machinery industries, glass and plastic.It has a humid continental climate with warm summers and no dry season.The temperature varies from −6 • C to 24 • C and is rarely below −13 • C or above 31 • C.
• C and winter average minimum temperatures of 6-7 • C. Gasteiz has a mild humid temperate climate with warm summers and no dry season.The annual summer high temperature is 26.7 • C, and the winter low temperature is 1.1 • C. • C, and precipitation of 83.4 mm.
, which are not simulated in WRF.At most sites, MBs from the WRF simulation over D01 are within ±1 • C. Larger MBs occur at London 2, Liberec, Milan 1, and Bilbao for the reasons mentioned above.Figures 10-11 show simulated and observed hourly Q2 at the ten sites.Observed Q2 is generally well reproduced at most sites in both months (note that no observations were available at Stockholm 1/Stockholm 2, London 1/London 2, and Bilbao in January and at Stockholm 1/Stockholm 2 and Bilbao in July), with relatively poorer performance at several high altitude sites (e.g., Liberec, Madrid 1/Madrid

Time (local standard time) Bretigny-Sur-Orge (000764, (48.60N, 2.33E)) (co-located with Melun (07153, (48.62N, 2.68E)) and Melun (FR04069, (48.54N, 2.66E))), Urban, France
• , the use of 0.125 • slightly improves T2 performance at Milan 1 and Milan 2 in both months and significantly improves T2 performance at Bilbao (reducing MB from 1.84 • C to 0.46 • C) in January, but deteriorates it at Bilbao (increasing MB from −1.84 • C to −2.48 • C) in July.Compared to WRF at 0.5 • and 0.125 • , the use of 0.025 • slightly improves T2 performance at Paris Charles de Gaulle in both months (e.g., reducing MB from 0.53 • C at 0.5 • to 0.45 • C at 0.025 • in January and from −0.49 • C at 0.5 • to −0.35 • C at 0.025 • in July) and Melun in July, largely improves T2 performance at Paris Orly in July (e.g., reducing MB from −0.58 • C at 0.5 • to 0.03 • C at 0.025 • ), but performs worse at Paris Orly and Melun in January.As shown in Figs.10−11, WRF at 0.125 • gives slightly higher Q2 at Milan 1/Milan 2 but lower Q2 at Bilbao in both January and July, showing better agreement with observations in terms of magnitude and MBs at Milan 1 in both months.At Paris Charles de Gaulle, Paris Orly, and Melun, WRF predictions in January and July at 0.125 • over D02 and at 0.025 • over D03 are overall very similar, having the best agreement with observations at 0.025 • in July (e.g., MBs are reduced from 0.51 g kg −1 to −0.05 g kg −1 at Paris Orly in July) but slightly worse performance at 0.025 • in January.As shown in Figs.12-13, in January, at Paris Charles de Gaulle and Paris Orly, WRF at 0.125 • and 0.025 • gives very similar predictions; both are lower than predicted WS10 at 0.5 • .The differences in predicted WS10 at the three grid resolutions at Melun are smaller than those at Charles de Gaulle and Paris Orly, although the predicted WS10 values at 0.025 • remain the highest among the three simulations.Compared to observed WS10, WRF at the three grid resolutions captures well hourly variations of WS10, with better agreement at 0.025 • and 0.125 • than at 0.5 • at Paris Charles de Gaulle, Paris Orly, and Melun.WRF at 0.025 • gives the lowest MBs among all three simulations at Paris Orly and the second lowest MBs at the other two sites in January.Compared with WRF at 0.5 • , WRF at 0.125 • gives slightly lower WS10 at Milan 1/Milan 2 but higher WS10 at Bilbao in January.It gives slightly better agreement with observations at Milan 1/Milan 2 but slightly worse agreement at Bilbao in January in terms of magnitudes and MBs.In July, at both Paris Charles de Gaulle and Melun, compared to observations of WS10, WRF at the three grid resolutions underpredicts WS10 significantly, with the lowest MBs at 0.025 • .• gives the best agreement with the lowest MBs at Paris Charles de Gaulle, Paris Orly, and Melun.Compared to WRF at 0.5 • , WRF at 0.125 • also reduces MB at Milan but slightly increases MB at Bilbao.In July, the use of 0.025 • deteriorates WD10 performance at Paris Charles de Gaulle, Paris Orly, and Melun.The use of 0.125 • helps reduce MB significantly at Milan 1; WRF at 0.5 • gives better agreement at Bilbao.

Zhang et al.: Application of WRF/Chem-MADRID and WRF/Polyphemus in Europe Figure
• deviations at most sites in both months.Relatively large deviations from WS and WD observations aloft occur at Stockholm and Liberec in both months.17.Simulated and observed daily precipitation in July 2001 at selected site. Y.