Basin-scale wind transport during the MILAGRO field campaign and comparison to climatology using cluster analysis

The MILAGRO field campaign was a multiagency international collaborative project to evaluate the regional impacts of the Mexico City air pollution plume as a means of understanding urban impacts on the global climate. Mexico City lies on an elevated plateau with mountains on three sides and has complex mountain and surface-driven wind flows. This paper asks what the wind transport was in the basin during the field campaign and how representative it was of the climatology. Surface meteorology and air quality data, radiosondes and radar wind profiler data were collected at sites in the basin and its vicinity. Cluster analysis was used to identify the dominant wind patterns both during the campaign and within the past 10 years of operational data from the warm dry season. Our analysis shows that March 2006 was representative of typical flow patterns experienced in the basin. Six episode types were identified for the basin-scale circulation providing a way of interpreting atmospheric chemistry and particulate data collected during the campaign. Decoupling between surface winds and those aloft had a strong influence in leading to convection and poor air quality episodes. Hourly characterisation of wind circulation during the MILAGRO, MCMA-2003 and IMADA field campaigns enables the comparisons of similar air pollution episodes and the evaluation of the impact of wind transport on measurements of the atmospheric chemistry taking place in the basin. Correspondence to: B. de Foy (bdefoy@slu.edu)


Introduction
By studying the regional impact of the Mexico City Metropolitan Area (MCMA) pollution plume, the MILA-GRO field campaign seeks to improve the understanding of the global atmospheric impacts of megacities around the world (http://www.eol.ucar.edu/projects/milagro). The field campaign consisted of four components: MCMA-2006, MAX-MEX, MIRAGE and INTEX-B ranging from the basin scale to the inter-continental scale. The MCMA-2006 field campaign, organised by the Molina Center for Energy and the Environment, focused on the Mexico City basin (http: //mce2.org/fc06/fc06.html). Its main aim is to improve the understanding of urban emissions and boundary layer concentrations in the basin in order to assist policy makers. To this end, numerous research teams were deployed to characterise emission sources, measure pollutant transport, detect large point sources, describe vertical mixing processes and assess health impacts.

Basin-scale meteorology
The MCMA is located in a basin at 2240 MSL. It is surrounded by mountains on three sides, with an opening towards the Mexican Plateau in the north, and a small gap in the basin rim in the southeast, see Fig. 1. It is located at 19.5 • N where the synoptic forcing is weak but the solar radiation intense, leading to mountain-valley and urban-induced wind patterns.
Published by Copernicus Publications on behalf of the European Geosciences Union. Convergent drainage flows in the basin have been associated with high pollutant concentrations in the city (Jauregui, 1988). Modelling studies of the basin circulation during the MARI field campaign highlighted the importance of the interaction of the synoptic and local, terrain-induced, winds (Bossert, 1997). Thermal plumes, convective eddies, low-level jets and entrainment into the boundary layer were observed by LIDAR during these episodes (Cooper and Eichinger, 1994). The Azteca experiment measured the impact of up-and down-slope flows on pollutant transport in and out of the urban area (Raga et al., 1999).
Extensive meteorological measurements were made during the IMADA field campaign (Doran et al., 1998). These identified thermal gradients as the driving force of the gap flow in the southeast passage . Terrain amplification of solar heating led to rapid boundary layer growth followed by sudden collapse due to cooling from wind flows from surrounding areas (Whiteman et al., 2000). These circulation patterns caused convergence zones in the basin with a significant impact on pollution dispersion but effective daily venting of the basin (Fast and Zhong, 1998).
Wind transport during the MCMA-2003 field campaign was classified into three episode types: "O3-South", "O3-North" and "Cold Surge" (Molina et al., 2007;de Foy et al., 2005). The strength of the gap flows during the campaign was found to be influenced by momentum down-mixing of winds aloft (de Foy et al., 2006a), leading to east-west convergence zones during O3-South events and north-south convergence zones during O3-North events. Basin venting was found to be rapid with little influence of day-to-day carryover (de Foy et al., 2006b). Similar convergence zones were analysed during different time periods by Jazcilevich et al. (2005). Fast et al. (2007) provided a meteorological overview of the MILAGRO field campaign focusing on the regional scale. The campaign was split into three parts: an initial dry part, a middle part with three cold surges and mixed circulation in between, and a third part with convective rainfall. They also described the meteorological measurements made during the campaign and provided a review of existing meteorological research on Mexico City.
Three Cold Surge events took place during MILAGRO on 14, 21 and 23 March . Before these, regional conditions were very dry leading to elevated levels of dust and biomass burning. After the cold surges, the conditions became moister and favourable for daily afternoon showers. Trajectories based on radar wind profiler data were used to identify days with potential transport of the urban plume past the outflow sites T1 and T2, see Fig. 1 (Doran et al., 2007).

Cluster analysis
Meteorological cluster analysis has been used to establish local climatology as well as to determine wind patterns associated with high air pollution episodes. On the meteorological side, Davis and Walker (1992) performed a climatology of the western United States using principal component analysis and a two-step clustering technique. Weber and Kaufmann (1995) and Kaufmann and Weber (1996) developed a two-step method consisting of a first pass with the complete linkage method followed by clustering with the k-means algorithm. This has been further used and described for surface winds in Switzerland (Weber and Furger, 2001) and for wind pattern classification over the Grand Canyon (Kaufmann and Whiteman, 1999). Kastendeuch and Kaufmann (1997) applied the method to identify terrain induced winds in valley environments. Kastendeuch and Najjar (2003) further extended it to upper-air wind profiles.
With regard to air pollution, Davis et al. (1998) performed meteorological cluster analysis on the synoptic scale to identify high ozone events in Houston. Cluster analysis was performed with the average linkage method alone, or with average linkage followed by k-means analysis. The two-step method was found to give improved results. Hart et al. (2006) used a similar method and identified one synoptic cluster out of 11 as responsible for most ozone exceedances in Sydney, Australia. Lu et al. (2006) compared hierarchical and nonhierarchical methods to classify PM 10 monitoring stations into five air quality basins. Darby (2005) also considered ozone pollution in Houston, but this time performing cluster analysis on the local surface winds with a partitioning method. Of the 16 clusters, several were clearly identified with ozone exceedances.
Atmos. Chem. Phys., 8, 1209Phys., 8, -1224Phys., 8, , 2008 www.atmos-chem-phys.net/8/1209/2008/ Oanh et al. (2005) applied the method of Davis and Walker (1992) to synoptic conditions over Thailand in order to identify episodes of high ambient SO 2 concentrations. Beaver and Palazoglu (2006) based their cluster analysis on the results of a principal component analysis for the surface winds in the San Francisco Bay Area, again identifying synoptic patterns associated with high ozone levels. Turias et al. (2006) used a neural network approach to classify surface winds near Gibraltar with a view to improving air pollution forecasts. Meteorological analysis of the Mexico City basin has been performed extensively on an episode-by-episode basis. An exception to this is Klaus et al. (2001), who carried out a principal component analysis of 11 months of air quality data and examined the corresponding wind fields. This identified four eigenvectors corresponding to north/south transport, east/west slope flows, centre/periphery drainage flows and northeast/southwest precipitation flows.

Outline
In this paper, existing cluster analysis methods were used to carry out a meteorological evaluation of the MCMA basin. The measurements used are described in Sect. 2. Air pollution levels during MILAGRO are compared to decade long trends in Sect. 3 to examine whether March 2006 was representative of Mexico City pollution levels. The cluster analysis is then performed on the last 8 years of radiosonde data in Sect. 4 and the last 10 years of surface wind data in Sect. 5. Vertical wind profiles obtained at three sites during the campaign are analysed in Sect. 6. Prefixes are used to distinguish the three different types of clusters by name: "Raob " for radiosonde clusters, "Sfc " for surface clusters and "Rwp " for radar wind profiler clusters. The wind circulation patterns during MILAGRO are then classified into six different meteorological episode types in Sect. 7. Figure 1 shows the location of the urban area within the basin along with the three supersites T0, T1 & T2, surrounding cities and physical features. Fig. 2 shows the location of the stations where data used in this study were measured. The MCMA-2006 field campaign was based at the T0 supersite in the Mexican Petroleum Institute (IMP). This is in the northern part of the city south of the Sierra de Guadalupe hills. Cerro de Chiquihuite, site of a radio antenna station, is 4.8 km to the north, and Pico Tres Padres (PTP), the summit rising ∼800 m above the basin floor, is 12 km to the north. A detailed description of the meteorological data collected during the campaign can be found in Fast et al. (2007).

Measurements
Radar wind profilers were installed at T0, T1 and T2. These were 915 MHz models manufactured by Vaisala. They were operated in a 5-beam mode with nominal 192-m range UTM North (km) Sierra  gates. As described in Doran et al. (2007), the NCAR Improved Moment Algorithm was used to obtain 30-min average consensus winds. Radiosonde observations have been carried out at the headquarters of the Mexican National Weather Service (GSMN) at 00:00 UTC and 12:00 UTC since 1999. A network of automated surface meteorological stations (EMA) is under operation since 2001 reporting standard parameters, including accumulated rainfall, at 10-min intervals. During MILAGRO, 5 stations were in operation in the basin: GSMN, ENCB, TEZO, CEMC and MADI. Meteorological stations were installed for the duration of the campaign at T0, T1 and T2 taking measurements at 1-minute intervals. Rain intensity and accumulated rainfall measurements at stations T0 and T1 were obtained using Vaisala WXT150 Weather Transmitters equipped with Vaisala Raincap sensors. Hourly cloud cover observations at the airport were obtained from the US National Climatic Data Center.
Both surface criteria pollutant concentrations and meteorological parameters are measured throughout the city by the Ambient Air Monitoring Network (Red Automática de Monitoreo Atmosférico, RAMA). 1-h average data is available online (http://www.sma.df.gob.mx/simat/) since 1986. Detailed information on all the stations including location, description of surroundings and site photographs is available at the same address under "Mapoteca". These stations are arranged into sectors based on location in the basin, as shown in de Foy et al. (2005). For this analysis, a distinct "Periphery" (PR) sector was used consisting of the following stations: VIF, CHA, CUA and TAH. Because of the continuity of the RAMA dataset, these surface winds will be used for the cluster analysis. Mexico City is in the Central Standard Time zone (CST), which is 6 hours behind UTC. All times in the paper are reported as CST unless marked otherwise.

Air pollution trends
Before analysing the basin meteorology, we ask whether the urban air quality during March 2006 was representative of longer time periods. Daily maximum measurements of O 3 , CO, PM 10 and PM 2.5 were obtained over a 10 year period for all the stations in the MCMA with a continuous record. This includes 18 stations for O 3 , 19 for CO, 6 for PM 10 and 8 for PM 2.5 . Figure 3 shows the range (5% and 95%), interquartile range and median of the daily maximum by month.
The downward trend of O 3 and CO is clearly visible with March 2006 well within the normal distribution. For O 3 , there is a slight annual pattern with higher values during March and April. This is because solar radiation has increased but the wet season has not yet started. Note however that high concentrations occur during the entire year. For CO the highest concentrations occur during January and February which are part of the cold dry season when temperature inversions are the strongest. The maximum takes place between 08:00 and 09:00 in the morning (data not shown), at the peak of rush hour but before the mixing layer has started to rise.
For PM 10 there is much less of a long-term trend. The end of the dry season typically has the highest aerosol loadings with a minimum during the wet season. While the median of the domain-wide maximum is normal for March, the 75% and 95% values are considerably higher and attain levels not seen since 2001. The maximum loadings take place either towards the end of the morning (11:00) or towards the end of the afternoon (18:00) (data not shown). The PM 2.5 loadings show less annual variation than PM 10 . Nevertheless there are peaks in the 95% values corresponding to the warm dry seasons. In particular, the warm dry season of 2006 had the highest concentrations since measurements began in the summer of 2003. Unlike PM 10 , however, the timing of the maximum is around noon.
By comparing domain-wide maximum air pollution measurements over a 10-year period, we have shown that March 2006 was a representative month. Levels of O 3 and CO continued their downward trends. Levels of PM 2.5 and PM 10 are mostly within their ordinary range although the top quartile is on the high end of what is to be expected.

Radiosonde analysis
Having situated March 2006 in terms of its "chemical weather", we now analyse the radiosonde record to see if the synoptic conditions were climatologically representative. Cluster analysis was performed on radiosonde profiles from the warm dry season, defined in this study as 15 February to 15 May. In 1998, the release site was moved from the airport (AERO), on the basin floor, to the Mexican National Weather Service headquarters on the western edge of the basin (GSMN, see Fig. 2). The analysis will therefore be restricted to the 8 years (1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006) in the new location. Data profiles were obtained from the US National Oceanic and Atmospheric Administration's Earth System Research Laboratory (http://raob.fsl.noaa.gov/) at a reduced vertical resolution compared to the original data files. The analysis is carried out with the 00 UTC sounding (18:00 local time). In some cases it was possible to extrapolate valid data to the reduced height ranges used in the clustering. Allowing this, data availability varied from 79% in 2004 to 96% in 2002.
For the radiosonde analysis, the clustering was performed using the k-means algorithm alone on an array containing potential temperature (K), humidity (g/kg), and meridional and zonal wind speed (m/s). The data were interpolated to heights every 500 m above ground level from 500 to 4500 m. Because all the variables vary in magnitude by about 20 units, and because sensitivity testing showed that this did not impact the analysis, it was decided not to renormalise the data. The distance between two profiles was calculated by taking the root mean square difference of all the data points. The maximum distance between profiles within a cluster decreases rapidly with an initial increase in the number of clusters, but then assumes a rather linear decrease. For this reason, it was decided to perform the analysis for 6 clusters. Figure 4 shows the median of the profiles in each cluster. For ease of analysis, these have been given a name based on the most distinctive feature. The "Raob Wet" cluster contains profiles that are the most humid and also that are among the warmest. Winds are weak and from the south within the surface layer and westerly aloft. The "Raob Hot" cluster in contrast has the hottest profiles with average humidity aloft but a drier surface layer. Again, the winds are weak, but they are northerly with some veering to northeasterly aloft. "Raob NCool" and "Raob WCool" clusters both have cool temperatures, average humidities and westerlies aloft. "Raob NCool" has the weakest winds aloft however and a shift to northwesterly in the mixing layer. "Raob WCool" in contrast has strong winds aloft and a shift to southerly in the surface layer. The "Raob SWarm cluster" contains slightly warmer profiles of average humidity and wind speed. The wind direction however is from the south throughout most of the profile with a slight shift to southeasterly in mid-levels. The "Raob BasinFlush" cluster is the most distinctive with strong, cold, dry winds blowing from the southwest. It has half the number of members as the other clusters and usually leads to clean air in the basin.
The distribution of clusters for each year, as well as for the last two field campaigns, is shown in fewer occurrences of the Raob Wet and Raob WCool clusters. The relative paucity of humid clusters suggests that the warm dry season of 2006 was drier than usual, thereby suggesting a meteorological cause for the high dust loadings and extensive biomass burning measured during the campaign prior to the third cold surge on 23 March. The month of March 2006 itself has a good representation of all the clusters, with a slight under-representation of the Raob Wet and Raob Hot clusters. In the last case, this is also fortunate as the best clusters for Lagrangian transport to the northeast are the Raob WCool and Raob SWarm clusters. In comparison, the analysis confirms the prior experience of April 2003 as an unusually moist period with most of the Raob Wet cluster for that season taking place in April alone.

Surface wind analysis
Because the radiosonde release site is on the foothills of the basin rim, it is more representative of local slope flows in the lower levels and synoptic conditions aloft. In order to identify patterns in the basin wind circulation, surface observations from multiple sites around the urban area are required. Cluster analysis was performed on 10 years of hourly surface wind data from the RAMA network for the warm dry season as defined in Sect. 4. The stations selected were XAL, TLA, EAC, TAC, PLA, PED, CES and MER based on data availability. The percentage of valid data varied from 57% in 1998 to 99% in 2006. Overall, there was valid data for 79% of the hours in the periods selected, corresponding to 16 791 data fields out of a total of 21 168.
As described in Sect. 1.2, clusters were first created with the complete linkage hierarchical method. The resulting medians were used to seed the k-means clustering algorithm. As for the radiosonde data, the distance between two wind fields for the k-means algorithm was calculated by taking the root  mean square difference of all the data points. The number of clusters was chosen to be 8 as this coincided with a local minimum in maximum distance within the clusters.
The clusters were separated into three drainage types: "Sfc Drain1", "Sfc Drain2" and "Sfc Drain3", three northerly to easterly types: "Sfc Northeast", "Sfc East" and "Sfc North", and two southerly types: "Sfc South" and "Sfc Southwest". Fig. 6 shows maps of wind roses for the 8 clusters during March 2006. For clarity of presentation, 5 representative locations were chosen from the 8 sites available. The wind roses are classified by time of day rather than wind speed so as to show the diurnal distribution of the clusters. The drainage clusters are characterised by down-slope flow into the basin centre with similar flows for the stations on the basin rim (TLA, PED and CES). The difference in the clusters arises from the flow at MER, in the urban centre and closer to the basin centre, and XAL in the north of the old urban area and close to the Sierra de Guadalupe. These start of with northerly flow in Sfc Drain1 which then turns easterly first for MER in Sfc Drain2 and then for XAL as well in Sfc Drain3, following a progression by time of day (see below).
During the day, flows are more spatially uniform. The Sfc Northeast cluster has northerly flows in the north of the basin and northeasterly further south. For the Sfc East cluster, the winds have turned and exhibit some divergence with northward movement in the north and southward in the south respectively. The Sfc North cluster has the least variance in wind direction and also happens to have the strongest winds blowing due south in the basin.
The Sfc South cluster has more variation both in time and in wind direction, with northward winds at all the stations. PED in the southwest however sometimes has a stronger westerly component associated with downslope flows and MER a stronger easterly component associated with gap flows from the southeast. As for Sfc South, the Sfc Southwest cluster also has more variability. It contains sweeping flows from the southwest that flush the urban plume to the northeast. The westerly component is stronger at TLA and PED due to reinforcement from the down-slope winds.
The histogram of cluster distribution is shown in Fig. 7 for both the entire data set and for the March 2006 subset. This shows that the clustering method automatically recognised the diurnal structure of the basin wind circulation, with a clear progression from Sfc Drain1 to Sfc Drain2, and then to Sfc Drain3. After this, the circulation goes to the Sfc East and Sfc Northeast clusters before being replaced by either the Sfc North or Sfc South cluster in the mid to late afternoon and some Sfc Southwest clusters in the late afternoons.
Comparison between the two histograms show that MI-LAGRO was representative of the warm dry season with similar diurnal distributions and relative fractions of clusters. The main difference is the under-representation of the Sfc Southwest cluster. This has a comparable number of members as the Sfc South cluster over the 10-year period, but only one fifth as many members during MILAGRO. This is a notable difference with MCMA-2003 where it had been a feature of evening venting leading to high ozone levels in the north of the MCMA.

Radar wind profilers
Radiosonde clusters provided a long term analysis of the synoptic conditions influencing the MCMA and RAMA surface wind clusters provided a description of the surface flow patterns. Together, they showed that March 2006 was representative of the climatology of the basin. The three radar Northeast (55) North (32) North−Veering (30) Northwest (25)   T2  T2  T2  T2   T1  T1  T1  T1   T0  T0  T0  T0 0 0.5 1 1.5 2 3 km agl South−Veering (109) South (91) Southwest (40) H−Shear (12) T2 T2  T2  T2   T1  T1  T1  T1   T0  T0  T0  T0 0 0.5 1 1.5 2 3 km agl wind profilers provided detailed information on the vertical structure of the wind circulation and its variation along the T0-T1-T2 axis during the field campaign. Cluster analysis was used to identify dominant flow types following the same method as for the surface analysis. Profiles of wind direction every 30 min were averaged to three height intervals: 500-1000 m, 1000-1500 m and 1500-2000 m. Because we are interested in the surface layer flows rather than the synoptic conditions aloft, all heights are in meters above ground level. Clustering was performed on the combination of the profiles from T0 and T1, but not T2. T2 was not included because each additional data source adds to the overall fraction of missing data, and in this case including East−Shear (49) V−Shear (51) West−Shear (12) Nsfc−Sw (34)   T2  T2  T2  T2   T1  T1  T1  T1   T0  T0  T0  the data did not change the clusters substantially but did decrease data availability. In this way, there were 830 times with valid profiles between 6 March 18:30 and 28 March 16:30 corresponding to valid profiles 79% of the time. The number of clusters was chosen to be 12, again basing the decision on the presence of a local minimum in the maximum distance within the clusters. There are more clusters from the profilers than from the surface winds because these clusters identify vertical features in the boundary layer that the surface winds cannot see.
Displaying the considerable amount of information contained in the profiles can be problematic, and the reader interested in specific episodes will find it most useful to look at the wind vectors for selected times. To summarise the information from the whole campaign however, Figs. 8, 9 and 10 show wind roses for the three sites for each of the 12 clusters. These are coloured by height ranges and are based on the original profiles rather than the reduced averages on which the clustering was performed. The roses for T0 and T1 show the profiles used in the clustering algorithm whereas the roses for T2 show the profile data for the corresponding times, when available.
The spread in direction for each height range gives an indication of the fuzziness of the clusters. A general comparison of the wind roses for T0, T1 and T2 for all the clusters shows that T0 has the most sharply defined clusters and T2 has the fuzziest ones. For T0, this may be due to its location within the basin itself where the surrounding mountains are higher and have a stronger impact on the circulation. For T2 on the other hand, it is due in part to the fact that T2 data were not included in the clustering algorithm and in part from the fact that it is on the northern edge of the plateau on complex terrain of its own.
Atmos. Chem. Phys., 8, 1209-1224, 2008 www.atmos-chem-phys.net/8/1209/2008/ In interpreting the profiler clusters, it is useful to link them with the surface wind clusters. Table 1 shows the correspondence between the two sets of clusters. While clear patterns emerge, there remains considerable scatter in the mapping between the two sets. As will be discussed below, this reflects periods of decoupling and wind shear between the surface and winds aloft. It also reflects spatial variations as T0 is at the northern edge of the domain used in the surface analysis and T1 is well outside of the domain.

Cluster description
The clusters were separated into three groups: North (1-4), South (5-7) and Shear (8-12). The North group contains flow mainly from the north going southward, with general transport from T2 to T0 and the MCMA. "Rwp Northeast" has northerly surface flow at T0 veering to northeasterly aloft. Flow at T1 is similar with a stronger easterly component. At T2 the winds veer so that winds aloft are nearly from the south. This suggests a channelling of flows from the direction of the Gulf southward into the basin. "Rwp North" contains straight northerly flow, coinciding with Sfc North and, to a lesser degree, Sfc Drain2. Note the predominantly southwesterly flow aloft at T2 highlighting the importance of terrain blocking at T1 and T0. "Rwp North-Veering" has northerly surface flow that is very similar for all 3 sites. This is most strongly associated with Sfc Drain2 indicating a decoupling between surface drainage flows entering the basin from the north and westerly winds aloft. "Rwp Northwest" contains northwesterly surface winds backing to westerly with height. These correspond to northeasterly surface winds in the basin.
The South group contains flow mainly from the south and west, with general transport from the MCMA towards T1 and T2. Clusters in this group are very similar at all three sites, suggesting a strong influence from the dominant westerlies and southwesterlies aloft. "Rwp South-Veering" contains southerly surface flow veering to southwesterly aloft. "Rwp South" has straightforward southerly flow at all the sites. "Rwp Southwest" has southwesterly surface flows with some veering to westerly aloft. These are all associated with Sfc South and with venting of the basin to the north and northeast.
The Shear group is the most interesting with stronger variations between the stations and in the vertical. "Rwp H-Shear" is a small group with southwesterlies at T1 and T2 but surface easterlies at T0. These occur when the surface flows in the basin are from the south or east. In fair weather cases it suggests channelling around the Sierra de Guadalupe. "Rwp East-Shear" has northeasterly surface flows turning to southerly with height. This suggests decoupling between a surface drainage or northeast flow and southerlies aloft. "Rwp V-Shear" has southerly flow at all sites, but with a well defined surface layer from the north at T0 that extends up to 750 m. These correspond to night-time drainage flows where the terrain blocking prevents the prevailing winds aloft from affecting the surface.
Sfc Northeast flows suggesting that the surface layer is fairly uniform and extends over most of the MCMA.

Campaign classification
The combination of the surface wind vectors, the radar wind profilers, the radiosondes and the basin air quality data were used to classify the campaign days into six typical meteorological episode types. Fig. 11 shows both the surface wind clusters and the wind profiler clusters associated with each hour of the campaign. From this, it is possible to establish patterns in the evolution of the surface wind fields: blues were used for the drainage clusters taking place during the night, reddish (including yellow/brown) were used for transport to the south and greens for transport to the north. Similarly for the radar wind profiler clusters: yellow/reds were used for southward flow, greens for northward transport and blues were used for the shear flow clusters. Figure 12 shows a conceptual diagram of salient wind patterns of the six episode types. Up to and including 7 March, the days are dominated by the Sfc Northeast cluster followed by the Sfc North clusters. This represents straightforward, uniform transport to the south and these days were therefore labelled "South Venting". After this, on 8, 12 and 15-17 March, the Sfc Northeast cluster yields to the Sfc South cluster in the late afternoon. This corresponds to a wind shift with transport initially to the south moving back to the north and is labelled "O3-South" as ozone peaks in the south of the city on these days. For 9-11, 18-20 and 22 March, the northeast flow yields to the Sfc East cluster and then the Sfc South and Sfc Southwest clusters during the afternoon. This causes some pollutant accumulation in the morning which is then vented to the north of the basin and has been labelled "O3-North". Cold Surges took place on 14, 21 and 23 March as discussed in Fast et al. (2007). These are more variable than the other categories but do have a strong southward flushing of the basin late into the evening. After the last Cold Surge of Atmos. Chem. Phys., 8, 1209-1224, 2008 www.atmos-chem-phys.net/8/1209/2008/ the campaign, the air remained considerably more humid and there were frequent afternoon rains. The winds were not as persistent as for the other categories, as indicated by numerous cluster types especially in the afternoon. These "Convection" days were split into two sub-groups. The first three days, 24-26 March, along with 31 March were classified as "Convection-South" as there was a stronger component of southward transport in the late afternoon. The next four days, 27-30 March, were classified as "Convection-North" as they had more northward transport in the late afternoon.
Of necessity, any classification scheme imposes strict distinctions where in reality there are but fuzzy regions. An advantage of displaying the wind clusters hour by hour is that it provides a visual method of assessing the variability within each episode type and the distance or proximity between days of different classification. For example, 18 and 19 March have a stronger, more persistent northward flow than the other O3-North days. 12 and 15 March, although classified as O3-South, are not so far removed from some of the O3-North days. In these cases, there is a fine distinction that is based on the actual plume transport in the basin and its importance for the interpretation of MILAGRO data. For example, the 8 March exhibited a sharp wind shift in the late afternoon which was observed chemically by the Aerodyne mobile laboratory (Kolb et al., 2004) on Pico Tres Padres.
The meteorological classification just described is shown in Table 2 along with summarised cloud and rain observations and air quality measurements. This shows that the very clear skies of South Venting gave way to scattered cloudiness during O3-South and O3-North days. After the Cold Surges, the skies were mostly clear during the morning but mostly covered in the afternoons with strong showers. Breaking with this pattern, an isolated thunderstorm took place at T1 on the 16 March.

Vertical wind variations
The surface wind clusters have a clear diurnal trend. Days differ from each other mainly in the presence and timing of the afternoon wind shift. In contrast, the radar wind profiler clusters do not have a clear diurnal pattern but have variations from day to day. Fig. 11 shows that the South Venting and O3-South days have more profiler clusters from the North group whereas O3-North and Convection days have clusters from the South group. Variations within the episode types can be seen for example on 18 and 19 March. These are O3-North days but compared to the other days in their category, they have more persistent southerly winds in the vertical contributing to very clean air. In contrast, 11 March had shallow northerly surface flows with southwesterly winds aloft in the mid-afternoon (Rwp Nsfc-Sw). This combination led to the maximum 1-h ozone concentration of the campaign (185 ppb).
The horizontal shear of Rwp H-Shear is to be found in cases of southeasterly flow in the basin in the late afternoon on O3-South and Convection days. This indicates the possible influence of the Sierra de Guadalupe in maintaining southeasterly flow at T0 when the winds have turned to southwesterly at T1.
The northerly surface layer of Rwp V-Shear occurs during the latest or earliest hours of the day, often before or after Rwp East-Shear. This highlights the strong vertical wind shear that can take place between the surface drainage flows and the prevailing winds aloft.
The decoupled flow of Rwp East-Shear, which had northerly surface flows with southerlies aloft is to be found on the first Cold Surge and also on the O3-South days. This leads to a situation analogous to 11 March with accumulation of pollutants in the lower level and recirculation in the verti-cal leading to the next three highest ozone levels of the campaign. Furthermore, Rwp East-Shear is associated with a shallow layer transported from the Gulf leading to increased cloudiness (except for 12 March which was too dry to begin with). The peak occurrences are in the early afternoon and are consistent with the descriptions of a propagating density current into the basin (Bossert, 1997.
Rwp Nsfc-Sw had a northerly surface layer with southwesterly winds aloft. The main occurrences start with the first Cold Surge day and are mainly in the mid-to lateafternoon. After this, each occurrence of this cluster is associated with rain at T1, including the thunderstorm of the 16th. Overall, this suggests that this cluster can also be associated with propagating density currents similar to Rwp East-Shear. Moist air from the Gulf is forced into the basin where it meets winds in the opposite direction leading to convection.

Discussion
The O3-South, O3-North and Cold Surge episode types described for MCMA-2003 -8-1209-2008-supplement.zip). From these, episodes of similar transport can be identified across the field campaigns in order to obtain meaningful comparisons of air quality measurements. The surface wind clusters can also be compared to the winds associated with the ozone eigenvectors of Klaus et al. (2001). Despite the difference in temporal coverage (Klaus et al. (2001) analysed February-December 1995), a correspondence can be seen between the wind vectors associated with the high and low values of the eigenvectors and the cluster analysis of the present work. The eigenvector for north/south transport corresponds to Sfc North and Sfc South winds. The eigenvector for east/west slope flows corresponds to Sfc East and Sfc Southwest. The eigenvector for centre/periphery drainage flows matches Sfc Drain1 and Sfc Drain2. The eigenvector for northeast/southwest precipiation flows matches Sfc Northeast and Sfc Drain3. This suggests that the clusters identified are robust features of the MCMA circulation with a direct link to ozone patterns in the urban area.
The gap flow coming from the southeast of the basin and creating a convergence line is a determining factor for air quality in the MCMA , Jazcilevich et al. (2005), de Foy et al. (2006a)). The surface clusters see the impact of the gap flow from Chalco as a southeasterly signature in the stations closest to the gap. The profiler clusters however do not distinguish between a southeasterly gap flow and a more general southerly flow. To characterise the vertical structure of gap flows extra profiler data would be needed closer to the gap as was the case during the IMADA campaign (Doran et al., 1998).
Automatic identification of the gap flow feature is an open problem due to the complexity of the basin winds. Extensive gap flow measurements were made during the Mesoscale Alpine Programme (MAP) and found that gap flows took place with and without cross-barrier flows and capping inversions (Mayr et al., 2004). Results from a single episode were reported, but cluster analysis could assist in determining patterns in the gap flow structure over longer time periods. Case et al. (2004) used a time-averaging filter to estimate the seabreeze transition time at Cape Canaveral from both observations and simulations. This served to evaluate model performance in terms of objective flow features. Case et al. (2005) developed an objective method for performing a seven year climatological study of sea breezes. de Foy et al. (2006a) used a metric based on the strength of the meridional flow in the basin to estimate the gap flow strength and the transition time. From Fig. 11, it can be seen that cluster analysis could provide an alternative way of evaluating transition times in the basin. This has the advantage of accounting for the spatial complexity of the flow and being resistant to variations at individual stations. By using cluster analysis over a month-long period, this study identified periods of decoupled flow, propagating currents and horizontal wind variations that led to strong variations in air pollution levels in the basin. Kossmann et al. (2002) found similar layering of winds in the basin in Lake Tekapo, New Zealand, where the interaction of valley and gap flows led to rapid cooling of the basin. Egger et al. (2005) found shallow inflow into the Bolivian Altiplano in winter that developed three to five hours after sunrise. These were associated with strong up-valley flows as well as return flows aloft. This is consistent with the flow patterns identified in Mexico although the shallowness and time delay in the formation of the boundary layer are due to the difference in seasons. Regmi et al. (2003) studied inflows into the Kathmandu valley/basin. Converging gap flows led to vertical layering due to the temperature and humidity contrasts of the air masses. This suppressed vertical mixing and led to high air pollution episodes. This situation can be compared to the cold humid surface influx during a Cold Surge episode suppressing mixing and decoupling the flow from the westerlies aloft. Banta et al. (2004) analysed the evolution of the nocturnal low level jet in the Great Salt Lake basin during the Vertical Transport and Mixing (VTMX) field campaign. Radial wind velocities from a Doppler lidar revealed the interaction of local drainage flows with the basin jet during the night. This generated localised regions of convergence and divergence in the basin. This flow description is consistent with the differences between the surface drainage clusters in the MCMA. Wind directions in the urban centre depend on the relative strength of the gap flow and the northerly plateau flow. Future analysis could identify whether this is a significant source of vertical mixing at night. Lemonsu et al. (2006) studied the structure of the urban boundary layer over Marseille during the ESCOMPTE field program. As with the VTMX study, radial velocities from a Doppler lidar were used to identify the presence of a shallow sea breeze superimposed on a deep sea breeze. As with the MCMA basin, complex surrounding orography led to splitting of the incoming flows and convergence patterns over the city centre.
We have analysed meteorological data from daily radiosondes for the last eight years, hourly observations at eight surface stations for the last ten years and three radar wind profilers during the MILAGRO field campaign. Cluster analysis was used to classify the data into dominant wind patterns so as to interpret circulation patterns in the basin. Histograms of the cluster distributions were used to evaluate the climatological representativeness of the MILAGRO campaign and the diurnal structure of the wind flow. A linkage table relating the surface and profiler clusters served to combine the analysis of the surface winds with the vertical structure. Plots of the hourly evolution of the clusters were then used to identify meteorological episodes during MILAGRO.
Six daily weather types were identified during MILAGRO, three of which took place during MCMA-2003. South Venting had strong, dry, southward winds leading to clear skies and low pollution levels. O3-South days had a gap flow from the southeast passage causing an east-west convergence zone that moved northwards into the early evening. This was associated with high O 3 in the south of the city. O3-North days had stronger southwesterly flow aloft that led to winds coming over the western and southern rims of the basin. A northsouth convergence zone formed with high pollution levels in the north of the city. Three Cold Surges took place bringing cold humid air along the Gulf coast and into the basin. Humid conditions persisted after the last Cold Surge and led to days with afternoon convection and rainfall. While generally similar, these were split into days where the convection was more to the south and days where it was more to the north. Note that because the convection took place in the late afternoon, 1-h ozone concentrations still reached high levels during these days.
Analysis of the vertical structure found evidence of strong horizontal and vertical wind shear. Wind flows were identified with fairly uniform transport to the south or north with some turning aloft due to the prevailing southwesterlies. Horizontal wind shear between T0 and T1 suggested channelling around the Sierra Guadalupe associated with both clear sky events and convection events. Northerly surface winds associated with propagating density currents were found that were decoupled from the southwesterly winds aloft. These led to high pollutant concentrations as surface emissions were transported in a shallow layer towards the urban area before being blown back to the north.
Long term trends of domain-wide maximum pollutant concentrations showed that MILAGRO experienced normal levels of O 3 and CO, subject to the continuously decreasing trend. PM loadings however were found to be higher than normal, with 75% and 95% of daily maximum among the highest measured.
Comparison of histograms of both radiosonde clusters and surface clusters showed that MILAGRO was representative of the warm dry season. For radiosondes, the only flow type not represented was the strong basin-flushing flows from the south (note however that surface and wind profiler analysis found this type of flow on 18 and 19 March). Other types of flows were well represented, especially the ones leading to northward transport. Over the whole season, 2006 was considerably drier which provides an explanation for the high PM loadings observed.
The diurnal variation of surface wind patterns was very similar for March 2006 as for the warm dry seasons of the last ten years. This consisted of very clearly defined drainage flows into the basin every morning followed by northeasterly and easterly winds after sunrise and into the early afternoon. The distinction between different days came after that from either northerly or southerly winds. The main difference during MILAGRO was an under-representation of the southwesterly flows compared to the southerly and northerly winds.
The classification of the wind patterns will assist in analysis and interpretation of the MILAGRO dataset by enabling the evaluation of the impact of wind transport on measurements of gas and aerosol phase chemistry. Similar transport episodes during IMADA and MCMA-2003 obtained from surface cluster analysis can be used to obtain meaningful comparisons of measurements across the field campaigns, thereby increasing the value of each individual dataset. The meteorological classification will be used to identify episodes for intensive modelling studies. The evaluation of models will be able to build on the simulation of salient features in addition to standard statistical metrics.
Finally, this paper used only a portion of the meteorological data. Future studies will be able to build upon this, for example with the detailed observations of the boundary layer structure, with profiles and measurements aloft from airborne platforms, from the measurements of the mobile laboratory, from other ground based measurements and from different kinds of balloon measurements. These measurements are much more specialised and have intermittent or irregular sampling intervals which need to be analysed differently from the operational measurements used in this paper. Cluster analysis to identify transport episodes during the field campaign serves to create a link between the two types of measurements and provide a climatological basis for episodic analysis of the MILAGRO dataset.