An assessment of atmospheric mercury in the Community Multiscale Air Quality ( CMAQ ) model at an urban site and a rural site in the Great Lakes Region of North America

Quantitative analysis of three atmospheric mercury species – gaseous elemental mercury (Hg 0), reactive gaseous mercury (RGHg) and particulate mercury (PHg) – has been limited to date by lack of ambient measurement data as well as by uncertainties in numerical models and emission inventories. This study employs the Community Multiscale Air Quality Model version 4.6 with mercury chemistry (CMAQ-Hg), to examine how local emissions, meteorology, atmospheric chemistry, and deposition affect mercury concentration and deposition the Great Lakes Region (GLR), and two sites in Wisconsin in particular: the rural Devil’s Lake site and the urban Milwaukee site. Ambient mercury exhibits significant biases at both sites. Hg 0 is too low in CMAQ-Hg, with the model showing a 6 % low bias at the rural site and 36 % low bias at the urban site. Reactive mercury (RHg= RGHg+ PHg) is over-predicted by the model, with annual average biases > 250 %. Performance metrics for RHg are much worse than for mercury wet deposition, ozone (O3), nitrogen dioxide (NO2), or sulfur dioxide (SO2). Sensitivity simulations to isolate background inflow from regional emissions suggests that oxidation of imported Hg0 dominates model estimates of RHg at the rural study site (91 % of base case value), and contributes 55 % to the RHg at the urban site (local emissions contribute 45 %). 1 Introduction Although< 5 % of global mercury resides in reactive form, this fraction is subject to more rapid chemical reactions and faster deposition to the Earth surface (Lindberg et al., 2007). While the chemical stability of different forms of particulate mercury are not fully understood (Amos et al., 2012), the sum of reactive gaseous mercury (RGHg) and particulate mercury (PHg) is often called reactive mercury (RHg) due to its relatively short lifetime in the atmosphere compared to gaseous elemental mercury (Hg 0). The majority of mercury in the atmosphere exists as Hg 0, which has been shown to be a major source of RHg (Lin and Pehkonen, 1999). However, there is very real uncertainty in the understanding of the dominant chemical pathway for oxidation of Hg 0 to RHg, and many components of the mercury budget remain uncertain (Calvert and Lindberg, 2005; Hynes et al., 2009). In particular, recent model and observational studies have suggested that the bromine oxidation pathway may be important in the mid-latitudes (Holmes et al., 2010; Obrist et al., 2010). Here we focus on the Upper Midwestern US, where ambient measurements of Hg 0, RGHg and PHg were collected at two comparable sites, one rural and one urban, each affected by similar sources with measurements over nearly a full year. The two sites are rural Devil’s Lake, Wisconsin (Manolopoulos et al., 2007) and urban Milwaukee, Wisconsin. We evaluate the skill of a widely used regional chemical transport model for studying mercury, the Community Published by Copernicus Publications on behalf of the European Geosciences Union. 7118 T. Holloway et al.: An assessment of atmospheric mercury in the (CMAQ) Multiscale Air Quality (CMAQ) model, developed by the US Environmental Protection Agency (EPA) and used in mercury policy development (Bullock and Braverman, 2007). Here, we use CMAQ v. 4.6 including mercury chemistry, hereafter referred to as CMAQ-Hg. Although we employ CMAQ-Hg, it is important to note that our study is not configured in the same way as policy-directed EPA simulations, as discussed in Sect. 2.2. In presenting model performance at the two sites, we also consider the sensitivity of results to varying boundary inflow. For the most part, model performance in simulating mercury over North America has been evaluated against wet deposition measured in the US by the EPA Mercury Deposition Network (MDN). A number of studies have used these data to evaluate CMAQ-Hg (Bash, 2010; Bullock and Brehme, 2002; Bullock et al., 2008, 2009; Gbor et al., 2006, 2007; Lin et al., 2007; Lin and Tao, 2003; Pongprueksa et al., 2008; Sunderland et al., 2008; Vijayaraghavan et al., 2007), and other atmospheric chemistry models that include mercury (Cohen et al., 2004; Holmes et al., 2010; Sanei et al., 2010; Seigneur et al., 2003; Selin and Jacob, 2008; Vijayaraghavan et al., 2008). Until recently, most studies that evaluate model estimates of ambient mercury compare with Total Gaseous Mercury (TGM) and/or Hg 0 (Gbor et al., 2006, 2007; Holmes et al., 2010; Lin and Tao, 2003; Lohman et al., 2008; Selin et al., 2007; Soerensen et al., 2010; Wen et al., 2011). As the database of reactive species has expanded, newer studies have allowed for a more detailed evaluation of model chemical processes. Of particular relevance are the model evaluation results presented by Baker and Bash (2012) and Y. Zhang et al. (2012), both of which compare multiple regional chemical transport models with ground based measurements of speciated ambient mercury in the context of wet and dry deposition. Both studies find that the regional models overestimate RHg relative to observations, and that treatment of mercury deposition is a major source of divergence among model simulations (Baker and Bash, 2012; L. Zhang et al., 2012). An 80 % overestimation of annual mean RGHg is found a nested North American simulation in the global GEOS-Chem model (no clear PHg bias) (Y. Zhang et al., 2012). Amos et al. (2012) find that GEOS-Chem overestimates both components of RHg, with normalized mean biases of 117 % for RGHg and 18 % for PHg (210 % and 96 %, respectively, without in-plume oxidation). In comparing two weeks of ambient measurements over Europe, Ryaboshapko et al. (2007) find that RHg shows the largest model-observation discrepancies. In particular, CMAQ-Hg overestimates RGHg by 15–257 % and PHg by 82–380 % (Ryaboshapko et al., 2007). Aircraft measurements over the Eastern US for a 12-day period in June 2000 found that CMAQ-Hg overestimated RGHg near the surface, but underestimated concentrations aloft (Sillman et al., 2007). Over Asia RHg simulated by CMAQ-Hg was too high at all sites, whereas Hg 0 was under-predicted at urban sites (Lin et al., 2010). When annual average RGHg values were compared between the global GEOS-Chem model and observations, the model values exceeded observations at 10 of the 13 sites compared, averaged nearly 50 % higher (model average: 20.08 pg m−3; observation average: 13.58 pg m −3), and showed much less spatial variability (standard deviation of model values: 1.78 pg m −3; standard deviation of observed means: 8.02 pg m−3) (Selin et al., 2007). The global CTMHg also has been shown to calculate RGHg values exceeding observations by over a factor of four (Seigneur et al., 2004). The ROME plume model also tends to over-predict RGHg (Lohman et al., 2006). A trend among these studies is the over-prediction of reactive mercury. While it should be noted that that measurements also have errors, especially leading to potential under-representation of RGHg (Lyman et al., 2010), the performance of atmospheric models to date suggests that key processes are not well captured. And, it is possible that the adequate simulation of wet deposition (dominated by RHg) may in fact be due to compensating errors in deposition or other chemical processes. 2 Observations and model We focus our analysis on two measurement sites: Devil’s Lake (DL, located at 43.43° N, 89.68° W) and Milwaukee (MKE, located at 43.12° N, 87.88° W), shown in Fig. 1. The measurements were gathered for just under a full year at both sites: DL from 10 April 2003 to 19 March 2004; MKE from the 29 June 2004 to 13 May 2005. As described in Manolopoulos et al. (2007), samples were taken every two hours using a Tekran ambient mercury analyzer (Landis et al., 2002; Lu et al., 1998; Lynam and Keeler, 2002). The DL site is in a rural area dominated by agriculture; the MKE site is an urban setting on the shore of Lake Michigan. Urban emissions around MKE are dominated by a few electricitygenerating units (EGUs); DL has only one EGU and one nonEGU source within 100 km of the measurement site. Because air quality at MKE is heavily influenced by local emissions, simulated values at the DL site were considered to better reflect model processes. Thus, the model run was conducted for the full year of 2003 to allow for direct comparison with DL observations. As will be discussed below, CMAQ-Hg performance for 2003 was much worse than expected for ambient RHg, leading our team to revise the research plan. As part of this process, we opted not to complete a 2004 simulation year planned for MKE evaluation. Instead we conducted sensitivity studies on the 2003 year, including perturbing the boundary conditions and testing a range of chemical mechanism experiments (the latter not shown here due to their inconclusive results). For MKE evaluation, we compared monthly mean values between the 2003 model year and 2004–2005 measurements. Although far from ideal, it is not uncommon to compare monthly mean values of simulations and observations for different years (e.g. Amos et Atmos. Chem. Phys., 12, 7117– 7133, 2012 www.atmos-chem-phys.net/12/7117/2012/ T. Holloway et al.: An assessment of atmospheric mercury in the (CMAQ) 7119 Figures Figure	  captions	  given	  in	  the	  text	  file Figure	  1 Fig. 1. Domains used for CMAQ-Hg simulations: large box shows the CONUS domain, small box shows the GLR domain. Open circles show locations of MDN monitoring sites; those within the smaller box were analyzed in Table 1. Stars show locations of DL and MKE monitoring sites. al., 2012; Lin et al., 2010; Selin et al., 2007), and we do not expect the major conclusions of the MKE analysis to be sensitive to choice of year. The operationally defined nature of the current measurement methods means that there may be some oxidized mercury species which are not included in the model but which were collected and measured. In contrast, some species may be collected and measured by the instrument with less than 100 % eff


Introduction
Although < 5 % of global mercury resides in reactive form, this fraction is subject to more rapid chemical reactions and faster deposition to the Earth surface (Lindberg et al., 2007).While the chemical stability of different forms of particulate mercury are not fully understood (Amos et al., 2012), the sum of reactive gaseous mercury (RGHg) and particulate mercury (PHg) is often called reactive mercury (RHg) due to its relatively short lifetime in the atmosphere compared to gaseous elemental mercury (Hg 0 ).The majority of mercury in the atmosphere exists as Hg 0 , which has been shown to be a major source of RHg (Lin and Pehkonen, 1999).However, there is very real uncertainty in the understanding of the dominant chemical pathway for oxidation of Hg 0 to RHg, and many components of the mercury budget remain uncertain (Calvert and Lindberg, 2005;Hynes et al., 2009).In particular, recent model and observational studies have suggested that the bromine oxidation pathway may be important in the mid-latitudes (Holmes et al., 2010;Obrist et al., 2010).
Here we focus on the Upper Midwestern US, where ambient measurements of Hg 0 , RGHg and PHg were collected at two comparable sites, one rural and one urban, each affected by similar sources with measurements over nearly a full year.The two sites are rural Devil's Lake, Wisconsin (Manolopoulos et al., 2007) and urban Milwaukee, Wisconsin.We evaluate the skill of a widely used regional chemical transport model for studying mercury, the Community Published by Copernicus Publications on behalf of the European Geosciences Union.

T. Holloway et al.: An assessment of atmospheric mercury in the (CMAQ)
Multiscale Air Quality (CMAQ) model, developed by the US Environmental Protection Agency (EPA) and used in mercury policy development (Bullock and Braverman, 2007).Here, we use CMAQ v. 4.6 including mercury chemistry, hereafter referred to as CMAQ-Hg.Although we employ CMAQ-Hg, it is important to note that our study is not configured in the same way as policy-directed EPA simulations, as discussed in Sect.2.2.In presenting model performance at the two sites, we also consider the sensitivity of results to varying boundary inflow.
As the database of reactive species has expanded, newer studies have allowed for a more detailed evaluation of model chemical processes.Of particular relevance are the model evaluation results presented by Baker and Bash (2012) and Y. Zhang et al. (2012), both of which compare multiple regional chemical transport models with ground based measurements of speciated ambient mercury in the context of wet and dry deposition.Both studies find that the regional models overestimate RHg relative to observations, and that treatment of mercury deposition is a major source of divergence among model simulations (Baker and Bash, 2012;L. Zhang et al., 2012).An 80 % overestimation of annual mean RGHg is found a nested North American simulation in the global GEOS-Chem model (no clear PHg bias) (Y.Zhang et al., 2012).Amos et al. (2012) find that GEOS-Chem overestimates both components of RHg, with normalized mean biases of 117 % for RGHg and 18 % for PHg (210 % and 96 %, respectively, without in-plume oxidation).In comparing two weeks of ambient measurements over Europe, Ryaboshapko et al. (2007) find that RHg shows the largest model-observation discrepancies.In particular, CMAQ-Hg overestimates RGHg by 15-257 % and PHg by 82-380 % (Ryaboshapko et al., 2007).Aircraft measurements over the Eastern US for a 12-day period in June 2000 found that CMAQ-Hg overestimated RGHg near the surface, but underestimated concentrations aloft (Sillman et al., 2007).Over Asia RHg simulated by CMAQ-Hg was too high at all sites, whereas Hg 0 was under-predicted at urban sites (Lin et al., 2010).When annual average RGHg values were compared between the global GEOS-Chem model and observations, the model values exceeded observations at 10 of the 13 sites compared, averaged nearly 50 % higher (model average: 20.08 pg m −3 ; observation average: 13.58 pg m −3 ), and showed much less spatial variability (standard deviation of model values: 1.78 pg m −3 ; standard deviation of observed means: 8.02 pg m −3 ) (Selin et al., 2007).The global CTM-Hg also has been shown to calculate RGHg values exceeding observations by over a factor of four (Seigneur et al., 2004).The ROME plume model also tends to over-predict RGHg (Lohman et al., 2006).A trend among these studies is the over-prediction of reactive mercury.While it should be noted that that measurements also have errors, especially leading to potential under-representation of RGHg (Lyman et al., 2010), the performance of atmospheric models to date suggests that key processes are not well captured.And, it is possible that the adequate simulation of wet deposition (dominated by RHg) may in fact be due to compensating errors in deposition or other chemical processes.

Observations and model
We focus our analysis on two measurement sites: Devil's Lake (DL, located at 43.43°N, 89.68°W) and Milwaukee (MKE, located at 43.12°N, 87.88°W), shown in Fig. 1.The measurements were gathered for just under a full year at both sites: DL from 10 April 2003 to 19 March 2004; MKE from the 29 June 2004 to 13 May 2005.As described in Manolopoulos et al. (2007), samples were taken every two hours using a Tekran ambient mercury analyzer (Landis et al., 2002;Lu et al., 1998;Lynam and Keeler, 2002).The DL site is in a rural area dominated by agriculture; the MKE site is an urban setting on the shore of Lake Michigan.Urban emissions around MKE are dominated by a few electricitygenerating units (EGUs); DL has only one EGU and one non-EGU source within 100 km of the measurement site.
Because air quality at MKE is heavily influenced by local emissions, simulated values at the DL site were considered to better reflect model processes.Thus, the model run was conducted for the full year of 2003 to allow for direct comparison with DL observations.As will be discussed below, CMAQ-Hg performance for 2003 was much worse than expected for ambient RHg, leading our team to revise the research plan.As part of this process, we opted not to complete a 2004 simulation year planned for MKE evaluation.Instead we conducted sensitivity studies on the 2003 year, including perturbing the boundary conditions and testing a range of chemical mechanism experiments (the latter not shown here due to their inconclusive results).For MKE evaluation, we compared monthly mean values between the 2003 model year and 2004-2005 measurements.Although far from ideal, it is not uncommon to compare monthly mean values of simulations and observations for different years (e.g.Amos et  al., 2012;Lin et al., 2010;Selin et al., 2007), and we do not expect the major conclusions of the MKE analysis to be sensitive to choice of year.
The operationally defined nature of the current measurement methods means that there may be some oxidized mercury species which are not included in the model but which were collected and measured.In contrast, some species may be collected and measured by the instrument with less than 100 % efficiency, but which are included in the model without accounting for this.Recent publications have provided empirical evidence to suggest that the Tekran Ambient Mercury Analyzer may be subject to measurement artifacts diminishing collection efficiencies: Lyman et al. (2010) show that RGHg may be under-measured during high ozone events by up to 55 %, while Rutter et al. (2008b) and Talbot et al. (2011) both suggest that PHg detected on filters was lost and not measured, probably due to the instrument being heated to 50 • C rather than being held at ambient temperature.The possible impact of these errors is included in the Discussion section.
We employed CMAQ-Hg version 4.6 (Bullock and Braverman, 2007;Byun and Schere, 2006), using a horizontal resolution of 36 km × 36 km simulations for the Continental US (CONUS) domain and a horizontal resolution of 12 km × 12 km for the Great Lakes Region (GLR) domain, shown in Fig. 1.Both simulations used 15 model layers in the vertical with an average model top of 16 km altitude and an average surface layer thickness of 50 m.
CMAQ-Hg was run using the default boundary conditions for CONUS (Table A1), which assumes constant boundary mixing ratios for all pollutants, and mercury species varying only with altitude.CONUS model output was used to provide hourly boundary conditions for simulations over the GLR.We employed these constant CMAQ-Hg default values as a starting point, given the wide variation among global models in simulating mercury inflow to the US (Bullock et al., 2008).We had initially planned to compare static boundary simulations with time-varying boundary simulations from a global model, but -given the poor model performance discussed below -we modified our research plan to focus on other sensitivity tests.Among these, we present here evaluations with and without boundary inflow, which yielded the most conclusive results among our tested hypotheses.
Building off of the base case with default boundary conditions, we conducted a sensitivity simulation to isolate the impacts of background mercury on the DL and MKE study sites.Here we define "background" as import to the GLR from the broader CONUS domain, which in turn includes global import via fixed boundary values (Table A1).A "zero background" (ZB) scenario was run, with no mercury species advecting into the GLR for July 2003 (including a 10-day spin-up period).All other boundary conditions between the CONUS and GLR domains were unaltered.The results from the ZB run reflect the impacts of emissions alone for the GLR.Subtracting these results from the base case (BC) results discussed above yields an estimate of mercury associated with background only -i.e."zero emissions" (ZE).We compare the ZB and ZE results to quantify relative influence of local emissions versus imported mercury and precursors, and how these results compare to observations.
The model was configured to use the Carbon Bond Five (CB05) lumped gas phase chemistry mechanism (Sarwar et al., 2008), the AERO4 aerosol mechanism (Binkowski and Roselle, 2003), the global mass-conserving Yamartino advection scheme, and the Asymmetrical Convective Model with mercury (ACM2), which controls cloud formation, vertical diffusion, and eddy diffusion.This version of CMAQ-Hg builds on earlier releases of CMAQ-Hg (Bullock and Brehme, 2002).The CB05 mechanism with mercury is the only gas phase chemistry module in CMAQ to include mercury chemistry and, since it was developed for the regional scale (Gery et al., 1989), the mechanism is well suited for our focus on the GLR.The core mechanism includes 56 chemical species (52 core species and 4 mercury species), 156 non-mercury gas-phase reactions, 4 gas-phase reactions involving mercury, 6 aqueous reactions involving mercury and 7 sorption/de-sorption mercury reactions, which are included in Table A2.CMAQ-Hg 4.6 reports modal bulk oxidized mercury species (RGM and PHg) using the operationally defined nomenclature for compounds collected and measured (Lu et al., 1998;Landis et al., 2002).Internally, CMAQ-Hg 4.6 follows the gas-phase and aqueous-phase chemistry of mercury using the true chemical species, such as HgO and HgCl 2 , then converts these species to RGM and PHg whenever they are present in either the gas phase or in modal aerosol particles which have not been activated into cloud droplets.Aqueous mercury chemistry considers activated accumulation mode aerosols.For full details of chemical reactions and speciation see Table A2 and related discussion if a prior version of the mechanism in Bullock and Brehme (2002).We note that the mechanism omits bromine reactions (Lin et al., 2006), and may overestimate the importance of OH oxidation (Calvert and Lindberg, 2005).
Dry deposition is calculated in CMAQ-Hg in the meteorology preprocessor, MCIP v. 3.4 with the M3DRY scheme, which explicitly treats Hg 0 and RGHg dry deposition based on the amount of vegetation cover, vegetation type and stomatal resistance (Lin et al., 2006;Pleim et al., 1999).These parameters are taken from the Pleim-Xiu land surface model in WRF, and the resultant range of dry deposition rates are reported in Table A3 (with a comparison of prior studies).Dry deposition of PHg is governed by the aerosol scheme AERO4, and is a function of particle size, always treated as either Aitken or accumulation mode (Binkowski and Shankar, 1995).Wet deposition of Hg 0 and RGHg is calculated in a manner analogous to all other species with aqueous chemistry, and depends on cloud water concentration, and the rate of precipitation during the cloud's lifetime.For particulates, including PHg, wet deposition depends on the particle size.Due to the different treatment of accumulation and coarse mode than Aitken mode particles, any modeled precipitation will deposit all accumulation-mode PHg, but Aitken-mode PHg may remain in the atmosphere if the cloud lifetime is not sufficiently long.
The Advanced Research Weather Forecasting Model (ARW-WRF) version 3.0, referred to here as WRF (Skamarock and Klemp, 2008), was used to generate continuous meteorology over the study regions, constrained with assimilated data from the 2003 North American Regional Reanalysis (NARR) dataset (Mesinger et al., 2006).The simulation compares well with NARR and observational data from the National Climate Data Center (NCDC) for temperature and precipitation over the CONUS and GLR domains.The NCDC reports actual precipitation and surface temperature at hundreds of sites across the US and was used to complement the NARR dataset evaluation.Comparisons with the NCDC data include monthly total precipitation and monthly average temperatures as well as daily temperature and precipitation for a few episodes at both resolutions.At the 12 km × 12 km GLR resolution, modeled average monthly temperature fields were consistently within two degrees Celsius of NCDC measure temperatures, and temperature was rarely underestimated.At both resolutions, model precipitation is moderately under-predicted, with best performance in winter, spring and late fall, and was less capable during the summer and early fall.To inform MDN evaluation presented below, model precipitation is also evaluated against reported precipitation in the MDN database, and we find that precipitation likely accounts for ∼ 30 % of annual average wet deposition error.
All emissions are taken from the 2002 EPA National Emissions Inventory (NEI), and prepared for use in CMAQ with the Sparse Matrix Operator Kernel Emissions Model version 2.4 (SMOKE).At the time the study began, the 2002 NEI was the most up-to-date inventory of US emissions, and includes Canadian and Mexican sources.Over the GLR, EGUs are the largest source of mercury emissions at 55 %, a higher fraction than the CONUS average EGU contribution of 43 %.The highest total emissions, for both the CONUS and GLR domains, are clustered around coal-fired power plants along the Ohio River and western Pennsylvania, where local coal is high in mercury (Toole-O 'Neil et al., 1999), and over larger cities.
In the US, total anthropogenic mercury emissions are known with relative confidence due to governmental monitoring efforts such as the Toxic Release Inventory (TRI), although there remains significant uncertainty in mercury speciation (Lin et al., 2006).Recent work has suggested that the  (Amos et al., 2012).Beyond individual point sources, the spatial patterns in speciation of emissions, shown in Fig. A1, highlight differences in data collection and organizational methods leading to abrupt shifts in percentage Hg 0 and RHg contribution at the US-Canada border and certain state borders (e.g.Illinois-Wisconsin).Additional error in our treatment of emissions may be incurred by omitting natural sources and re-emissions (Gbor et al., 2007;Lin et al., 2012).

Results
To compare model performance at the two Wisconsin sites with observations, Table 1 presents mean concentrations, coefficient of determination (R 2 ), normalized mean bias, and normalized mean error, calculated as follows: Where  Seasonal behavior at the two sites is also apparent in Figs. 2 and 3, which compare ambient concentrations at the two sites on a 14-day running average basis.The running average allows us to compare CMAQ-Hg simulations for 2003, with observations from both 2003-2004 (for DL, Fig. 2) and 2004-2005 (for MKE, Fig. 3).We show here only results from the 12 km × 12 km GLR simulations, and note that differences between the coarser 36 km × 36 km simulation and the finer 12 km × 12 km simulation were not qualitatively significant.

Rural Site (Devil's Lake)
Over the April 2003-December 2003 measurement period, Hg 0 at DL averages 1.6 ng m −3 , but shows considerable variability on both seasonal and synoptic scales.Average Hg 0 concentrations and variability at DL for Hg 0 are consistent with recent long-term (June 2007-November 2007) observations in rural central Wisconsin using a similar measurement approach (Kolker et al., 2010).As shown in Fig. 2, observed summer values regularly dip below 1.5 ng m −3 whereas winter concentrations approach 2 ng m −3 .Simulated surface concentrations have a similar annual mean values, 1.5 ng m −3 , but show much less variability than observed.Variability between measurements and models shows almost no correlation either on a daily basis, R 2 = 0.01, or on a monthly mean basis, R 2 = 0.04.Thus, while mean values agree, and daily mean values reflect low model bias (−6 %) and reasonable error (13 %), model performance with respect to ambient Hg 0 at the rural site does not indicate any particular skill in capturing regional emissions, chemical processes, and/or transport processes.Rather, the model advects boundary values of Hg 0 into the domain, which captures mean values but not observed variability.
At DL, observed RGHg averages 5.4 pg m −3 and PHg, 8.3 pg m −3 , summing to 14.2 pg m −3 for RHg over the 2003 monitoring period (Table 1); both species are considerably higher at DL than in central Wisconsin observations in 2007 (Kolker et al., 2010).In contrast, CMAQ-Hg averages over three times higher over this same period, with simulated RGHg at 22.6 pg m −3 , PHg at 29.2 pg m −3 , and RHg at 52.6 pg m −3 .Neither RGHg nor PHg show agreement in variability between simulated and observed values, with daily R 2 values of 0.04 and 0.05, respectively.Interestingly, the daily R 2 of RHg (0.10) is twice as high as either RGHg or PHg, suggesting that the allocation of the reactive forms is a source of error, even if not the dominant source of error.Model performance is especially bad in summer months at DL, as shown in Fig. 2. In August, CMAQ-Hg overestimates RGHg by a factor of about 15 relative to observed values, and overestimates PHg by a factor of 5.
Only in April 2003 does RGHg show relative agreement between model and observations at DL (Fig. 2).At the rural site, local emissions are effectively zero, and measured RHg may be attributed to transport or to local chemical production.Observed RHg concentrations at DL were highest in April and the first half of May, and wind direction measurements taken at the site show that winds were predominantly easterly during that time, compared to prevailing westerly winds the rest of the year (Fig. 4).During the easterly wind events, the Devil's Lake measured RHg concentrations show a significant increase, most likely reflecting advection from high-emitting areas to the east (e.g.Milwaukee).Both observed and modeled winds show advection from the east in April and early May, versus from the west/southwest in June-December.
While CMAQ-Hg shows similar wind patterns to observations (Fig. 4) and a similar eastward concentration gradient in RGHg (Table 1), the model does not yield the higher springtime RGHg surface concentrations at DL seen in the observations (Fig. 2b).This evidence points to a modeled RGHg lifetime in CMAQ-Hg that is too short, impeding the transport of emitted RGHg to DL in Spring 2003.
Figure 5 (DL) compares concentrations for Hg 0 (Fig. 5a) and RHg (Fig. 5b) over July 2003 between measurements and simulated concentrations varying background inflow.As noted, BC reflects the base case results discussed above, ZB reflects the impact of zeroing out mercury inflow to the GLR region, and ZE reflects the impact of mercury inflow alone (ZE = BC-ZB).Metrics for the comparisons among the simulations are presented in Table 2, calculated from daily averaged values at DL and monthly mean values from MKE.
At DL, the CMAQ-Hg BC results are very similar to the ZE case, suggesting that boundary inflow contributes the majority of simulated mercury at the rural site, in both elemental and reactive forms.99 % of the BC-simulated Hg 0 is captured by the ZE scenario, and 91 % of BC-simulated RHg is captured.This attribution informs the errors discussed above, in that the major over-prediction of ground-level RHg at DL appears to be due to errors in chemistry and/or deposition affecting boundary inflow.Removing all inflow of mercury (ZB) leads to significantly improved model performance for RHg, with the normalized mean error dropping from 483 % to 68 %.The lower error and −44 % bias associated with the ZB case at DL still points to major model problems.Even with only regional emissions included, we find that R 2 = 0.01, suggesting no relationship between variability in observations and the transport and processing of RHg from local emissions.In contrast, Manolopoulos et al. (2007) find that power plant plumes do impact RGHg (but not Hg 0 ) at DL.These results suggest that that erroneous processing of boundary inflow combines with additional errors -especially regional emissions and/or deposition -such that the signature of nearby plumes is evident in the observations but not in the model.Wind direction was aggregated such that all directions between 0°and 179°were considered from the east and all wind directions greater than or equal to 180°were labeled as from the west.

Urban site (Milwaukee)
As at DL, observed ambient Hg 0 at MKE exhibits much higher variability than do the simulated values, and observations show a significantly higher mean value as well (Fig. 3).Measured annual MKE Hg 0 averages 2.4 ng m −3 , with peak values exceeding 4 ng m −3 , whereas CMAQ-Hg estimates a value for MKE almost identical to DL at 1.6 ng m −3 .This disagreement suggests that elevated urban emissions of Hg 0 and/or reduction of RHg in the urban environment are missing from CMAQ-Hg.Monthly mean values show no correlation between the model and observation (R 2 = 0.01, based on monthly mean values), further indicating that processes determining seasonal variability are missing from the model.
As at DL, CMAQ-Hg significantly over-predicts RHg concentrations at MKE, with mean biases in RGHg of 331 % and PHg of 215 %.Relative to DL, the model is better able to capture the seasonal cycle (variability of monthly means) at MKE, with R 2 values of 0.24 for RGHg and 0.11 for PHg (Table 1, Fig. 3).Both measurements and model suggest that annual RHg is about 50 % higher at MKE than at DL.However, the allocation between RGHg and PHg differs.Measurements show that PHg to be about 50 % higher than RGHg at the rural site, and about 20 % higher at the urban site, where the model shows less of a difference at the rural site (30 %) and the opposite pattern (higher RGHg) a the urban site.
We compared simulated and observed ambient concentrations at MKE from July 2004 with model results from July 2003 to assess the impact of boundary inflow at the urban site (Fig. 6, Table 2).As at DL, ambient Hg 0 values are near-zero when boundary inflow is removed (ZB).There is a clear difference, however, in the response of modeled RHg to the removal of inflow at the urban site.Whereas 91 % of DL RHg is captured by the ZE scenario, at MKE only 55 % of RHg is attributable to boundary inflow.The sensitivity of RHg at DL versus MKE to the removal of boundary inflow indicates that local emissions are having an impact on simulated RHg at the urban site, but that the atmospheric lifetime of these RHg emissions is not long enough in the model to promote transport to DL.
Despite the evident shortcomings in CMAQ-Hg's ability to resolve emissions and processes controlling ambient Hg 0 , the overall performance statistics are relatively good, with normalized biases of −36 % even for MKE, and normalized mean error in the range (or less than) that seen for any other common pollutants (Table A4).

Regional wet deposition
As noted above, RHg in CMAQ-Hg shows biases, error, and lack of correlation with observations far worse than Hg 0 or criteria pollutants (O 3 , NO 2 , and SO 2 , shown in Table A4), with annual average biases ranging from 215 to 331 %.This level of error is broadly consistent with other studies in which models were evaluated against ambient reactive mercury, noted above in Sect. 1.Because RHg is the dominant contributor to total wet deposition, we evaluate how CMAQ-Hg performs for this widely used metric.
The Mercury Deposition Network (MDN) reports total mercury wet deposition since 1995, and as of our analysis operated 110 monitoring sites across the US and Canada, with 31 sites across the GLR (Vermette et al., 1995).Simulated CMAQ-Hg wet deposition is compared with monthly totals from all 31 of these MDN sites (one of which is co-located with the DL sampler and one is in the same grid as the MKE sampler).CMAQ-Hg wet deposition values in grid cells containing MDN site were summed according to the MDN start and end times associated with each monitor, rounded down to the nearest hour.Monthly totals were calculated using the end date of the MDN measurement period, and MDN reports of zero wet deposition were included.Only in cases where 50 % or more of the reported sampling periods had data missing or invalid from a given MDN monitor, were those sites not counted towards monthly totals.
Simulated wet deposition from CMAQ-Hg is compared with observations from the MDN on an annual and seasonal basis in Table 1, where Fig. 1 shows the locations of the GLR measurement sites included in these calculations.For wet deposition R 2 , bias, and error, Eqs. ( 1) and (2), D reflects the annual (or seasonal) mean mercury wet deposition value at each site, so metrics reflect the agreement between model and observations in terms of spatial variability and spatial mean values.
On an annual basis, CMAQ-Hg underestimates wet deposition by 21 %, and shows average errors of 55 %.The seasonal low bias ranges from 26-32 % in spring (March, April, May, i.e.MAM), summer (June, July, August, i.e.JJA), and autumn (September, October, November, i.e.SON).How-ever, in winter (December, January, February, i.e.DJF), the model over-predicts total mercury deposition by 70 %.Winter appears to show the worst model performance in general, with errors about twice those of other seasons.Spring shows the lowest model error at 42 %, and the highest R 2 value at 0.54, two to three times higher than any other season.
These results show similar skill to previous studies comparing regional models to MDN results over the US, although biases differ among studies (even studies using CMAQ-Hg), and within single studies among sensitivity tests (Bullock et al., 2009;Lin et al., 2007;Seigneur et al., 2003).The most recent multi-model evaluation (Bullock et al., 2009) finds annual-average, regional average high biases ranging from 22-97 % for eight of nine regional simulations examined (three regional models, including CMAQ-Hg, forced with monthly mean boundary conditions from three different global chemical transport models); only CMAQ-Hg with CTM boundary conditions shows no bias.Among prior studies, modeled mercury wet deposition has been found to be biased high -e.g. 26 % (spring) and 60 % (summer); 69 % (July) (Bash, 2010); 22 % (annual, higher resolution) (Seigneur et al., 2003); biased low -e.g.−9 % (annual, lower resolution) (Seigneur et al., 2003), −11 % (Seigneur et al., 2003b), and others (Cohen et al., 2004;Gbor et al., 2006); and have little bias -e.g.Vijayaraghavan et al. (2008).Values of R 2 emerging from the literature vary widely, although our annual R 2 value of 0.27 is lower than the range of 0.50-0.69found in Bullock et al. (2009).Our seasonal values are as follows, as compared to the range of models evaluated in Bullock et al. (2009): winter R 2 = 0.20 vs. 0.50-0.66;spring R 2 = 0.54 vs. 0.32-0.42;summer R 2 = 0.23 vs. 0.27-0.60;autumn R 2 = 0.13 vs. 0.47-0.63.The low bias that we find in wet deposition is consistent in the low bias of WRF-simulated deposition against observed precipitation at the MDN sites, which we find accounts for about 30 % of the CMAQ-Hg error.

Discussion
The comparison of CMAQ-Hg performance with observations at a rural and urban site suggests that the model contains a number of significant errors in the treatment of atmospheric mercury.These errors are consistent with the 2-10 times overestimate in RHg found by Y. Zhang et al. (2012) in evaluating CMAQ (and the GRAHM model) over the Great Lakes region (with model data from 2002 and 2005, and measurements for 1 or 2 yr between 2003 and 2009); as well as with Baker and Bash (2012) in their qualitative comparison of 2005 CMAQ and CAMx model simulations with 2009 measurement data over the Eastern US.Overall, we find that RHg is much too high, whereas wet deposition shows a low bias, even after accounting for low precipitation (∼ 30 % of the wet deposition low bias is attributable to precipitation errors).Taken together, these two results are surprising given that RHg is the dominant contributor to total mercury wet deposition in our simulations and in past studies (Lin and Tao, 2003).These patterns might be explained by compensating errors in CMAQ-Hg.
Although measurements from our two study sites may also reflect inaccuracies in the measurement technique, these are not considered a viable explanation for the poor agreement in modeled and measured RHg.For example, even applying a correction factor to compensate for a potential upperbound RGHg loss of 55 % (from Lyman et al., 2010) CMAQ-Hg would still show an annual positive bias of over 85 % at both sites.Other sources of potential error include known problems in the speciation of mercury emissions (e.g.Weiss-Penzias et al., 2011) and the impacts of in-plume mercury reduction (Amos et al., 2012).Both of these issues likely contribute to the over-estimate of ambient RHg found in our study.However, the sensitivity tests with respect to boundary inflow highlight the dominant role of non-regional emissions (i.e. the ZE results approximate the base case simulation for both species at DL, for Hg 0 at MKE, and contribute ∼ 50 % of the base case for RHg at MKE), so perturbing our treatment of regional source emissions would not be expected to significantly impact results.A final source of known error in our study is incurred through the use of static, default bound-ary conditions, rather than time-varying inflow with appropriate seasonal and spatial patterns.Given the sensitivity of model results to global boundary conditions, shown here and in Bullock et al. (2008), it is unclear that coupling to a global model would actually improve agreement with observations, although such coupling would present a more physically realistic set of model assumptions.
Based on our analysis of an urban and rural site in Wisconsin, we have developed a set of hypotheses for further research.We posit the following: (1) production of RHg from Hg 0 is too high in the model; (2) emissions of RHg and Hg 0 in urban areas are incorrect; (3) mercury wet deposition rates are too low; and (4) the atmospheric lifetime of RHg in the domain is too short.These errors together could explain why ambient RHg is too high (even with zero regional emissions), yet wet deposition is too low; directly emitted RHg exhibits an unrealistically short lifetime in CMAQ-Hg, whereas boundary inflow contributes too much to surface RHg and wet deposition.
This suite of errors would suggest that CMAQ-Hg underestimates the degree to which regional emissions contribute to wet deposition, and overestimates the contribution of international sources to US mercury deposition.The finding that long-range transport of mercury dominates wet deposition has been advanced by global modeling studies suggesting that only 12-30 % of total US deposition is attributable North American anthropogenic emissions (Seigneur et al., 2004;Selin and Jacob, 2008;Y. Zhang et al., 2012), and a regional modeling study over Asia also attributes 30 % of total deposition to regional emissions (Pan et al., 2010).The degree to which these important estimates are valid depends critically on each model's ability to capture key processes controlling mercury deposition.Given the dominant contribution of Hg 0 to total atmospheric mercury, even small changes in reaction rates, chemical cycling, and deposition could have a pronounced impact on source attribution.
Our choice of CMAQ-Hg as an analysis tool was motivated by its strong track record of development and analysis, and good performance against available measurements from the US EPA MDN.In embarking on this study, we hoped to find the model reasonable in its ambient concentration estimates, and relevant for further scientific experimentation.Unfortunately, our evaluation uncovered fundamental errors in modeled ambient concentrations, traced back to likely errors in chemistry, deposition, and emissions.Like other studies, we find reasonable agreement between model estimates of total wet deposition and those measured by the MDN.However, the favorable agreement appears to be due to compensating errors, in light of the over-prediction of surface RHg concentrations and potential under-prediction of RHg lifetime and/or under-prediction of wet deposition rates.In this sense, the chemical processes for atmospheric mercury remain uncertain and advanced research into the chemical kinetics would aid modeling efforts and characterization of even basic source-receptor questions.2.179 × 10 −9 3.024 × 10 −9 4.167 × 10 −9 5.079 × 10 −9 6.204 × 10 −9 7.000 × 10 −9 PHg (mug m −3 ) 1.070 × 10 −5 1.025 × 10 −5 9.358 × 10 −6 7.293 × 10 −6 4.175 × 10 −6 1.620 × 10 Table A2.Chemical reactions and rates used in the CMAQ-Hg mechanism presented here.

Chemical Equation Rate
Gas-phase reactions for Hg RG1 Hg 0   2012) report only gaseous oxidized mercury, which is expected to be the majority of total mercury deposition.Average, minimum, and maximum annual deposition estimates from CMAQ-Hg are compared with equivalent or scaled values from model and measurement estimates (Caldwell et al., 2006;Castro et al., 2012;EPA, 1997;Landis et al., 2004;Lyman et al., 2007;L. Zhang et al., 2012). Annual

Fig. 1 .
Fig. 1.Domains used for CMAQ-Hg simulations: large box shows the CONUS domain, small box shows the GLR domain.Open circles show locations of MDN monitoring sites; those within the smaller box were analyzed in Table 1.Stars show locations of DL and MKE monitoring sites.

Fig. 2 .
Fig. 2. Comparison of CMAQ-Hg simulated ambient mercury species against observed values at the rural DL site for Hg 0 (a, units ng m −3 ), RGHg (b, units pg m −3 ), and PHg (c, units pg m −3 ).Values reflect a 14-day running average.Observed data spans 2003-2004; model data taken from a 2003 annual simulation.

Fig. 3 .
Fig. 3. Comparison of CMAQ-Hg simulated ambient mercury species against observed values at the urban MKE site for Hg 0 (a, units ng m −3 ), RGHg (b, units pg m −3 ), and PHg (c, units pg m −3 ).Values reflect a 14-day running average.Observed data spans 2004-2005; model data taken from a 2003 annual simulation.

Fig. 4 .
Fig. 4. 2003 wind direction at DL from (a) measurements (Rutter et al., 2008a); (b) CMAQ-Hg (10-m wind-speed generated by the WRF model and processed with MCIP v. 3.4).Wind direction was aggregated such that all directions between 0°and 179°were considered from the east and all wind directions greater than or equal to 180°were labeled as from the west.

Fig. 5 .
Fig. 5. Comparison of July 2003 CMAQ-Hg simulated ambient mercury species at DL in the base case (BC), zero boundary inflow (ZB), and zero emissions (ZE) cases against observed values for Hg 0 (a, units ng m −3 ) and RHg (b, units pg m −3 ).

Fig. A1 .
Fig. A1.Mercury emissions used in CMAQ-Hg, taken from the 1999 EPA National Emissions Inventory (NEI) and processed through the SMOKE model.Total mercury emissions (a) are given in g km −2 , as well as percent (%) allocation to Hg 0 (b), RGHg (c) and PHg (d).
3 , PAN, and CO given as the min-max range; PHg in j-mode only.

Table 1 .
Mean values and evaluation metrics are presented to compare CMAQ-Hg Hg 0 (units ng m −3 ), RGHg, PHg, and RHg (= RGHg + PHg, units pg m −3 ) at DL and MKE.Equations (1) and (2) used to calculate normalized mean bias and error.All model and measurement data from 2003, except for ambient mercury species at MKE, as discussed in the text.Evaluation of mercury wet deposition across the GLR region (units ng m −2 ) compares CMAQ-Hg with 31 MDN monitors.
(Weiss-Penzias et al., 2011)h in the 2002 NEI used here(Weiss-Penzias et al., 2011), although the apparent speciation error may actually reflect in-plume reduction of RHg to Hg 0 D m represents the model value and D o represents the observed value.Mixing ratios (D) are evaluated both as daily means and as monthly means to support comparison of mercury at DL, where daily mean values may be compared between model and observations for 2003, and MKE, where monthly mean values are compared between model (2003) and observations(2004)(2005).Table1includes annual performance metrics for ambient, speciated mercury.Values at DL are calculated from daily mean values for all days between 10 April 2003 (start of measurement period) and 31 December 2003 (end of modeling period) in which more than 25 % (6 h) of measurement data were available.Metrics at both MKE and DL are also calculated from monthly mean values (for months in which more

Table 2 .
Mean values and evaluation metrics are presented to compare sensitivity simulations -BC, ZB, and ZE -of CMAQ-Hg with available measurements of Hg 0 (units ng m −3 ), RGHg, PHg, and RHg (= RGHg + PHg, units pg m −3 ) at DL and MKE.Equations (1) and (2) used to calculate normalized mean bias and error.All model and DL measurement data from July 2003; measurement data at MKE from July 2004.

Table A3 .
Castro et al. (2012) deposition on a per-area basis (µg m −2 ).All values are scaled to reflect annual equivalent values.All exceptCastro et al. (2012)report total deposition of all mercury species;Castro et al. (

Table A4 .
Total Mercury Dry Deposition (µg m −2 ) Mean values and evaluation metrics are presented to compare CMAQ-Hg with available measurements of O 3 , NO 2 , and SO 2 (at DL and MKE, units ppbv).Equations (1) and (2) used to calculate normalized mean bias and error.Observations were obtained from the US EPA Air Quality System database (http://www.epa.gov/ttn/airs/aqsdatamart/).2003 daily average observations were compared with daily average values from the lowest model layer of CMAQ-Hg in the corresponding grid cell.For DL, there was only one AQS monitor in proximity (WI site #7).For the MKE location, there were several AQS monitors within close proximity to the mercury/Tekran site, so we chose the monitor in the same model grid cell (WI site #26).Daily average measurements were available year-round for O 3 at DL, and for SO 2 at both sites.Ozone at MKE was available during the summer season (15 April to 16 October).Nitrogen dioxide was available at MKE from 23 May onward and at DL from 27 March onward, continuing at both sites through the end of the year.