Articles | Volume 16, issue 11
Research article
06 Jun 2016
Research article |  | 06 Jun 2016

Cluster analysis of European surface ozone observations for evaluation of MACC reanalysis data

Olga Lyapina, Martin G. Schultz, and Andreas Hense

Abstract. The high density of European surface ozone monitoring sites provides unique opportunities for the investigation of regional ozone representativeness and for the evaluation of chemistry climate models. The regional representativeness of European ozone measurements is examined through a cluster analysis (CA) of 4 years of 3-hourly ozone data from 1492 European surface monitoring stations in the Airbase database; the time resolution corresponds to the output frequency of the model that is compared to the data in this study. K-means clustering is implemented for seasonal–diurnal variations (i) in absolute mixing ratio units and (ii) normalized by the overall mean ozone mixing ratio at each site. Statistical tests suggest that each CA can distinguish between four and five different ozone pollution regimes. The individual clusters reveal differences in seasonal–diurnal cycles, showing typical patterns of the ozone behavior for more polluted stations or more rural background. The robustness of the clustering was tested with a series of k-means runs decreasing randomly the size of the initial data set or lengths of the time series. Except for the Po Valley, the clustering does not provide a regional differentiation, as the member stations within each cluster are generally distributed all over Europe. The typical seasonal, diurnal, and weekly cycles of each cluster are compared to the output of the multi-year global reanalysis produced within the Monitoring of Atmospheric Composition and Climate (MACC) project. While the MACC reanalysis generally captures the shape of the diurnal cycles and the diurnal amplitudes, it is not able to reproduce the seasonal cycles very well and it exhibits a high bias up to 12 nmol mol−1. The bias decreases from more polluted clusters to cleaner ones. Also, the seasonal and weekly cycles and frequency distributions of ozone mixing ratios are better described for clusters with relatively clean signatures. Due to relative sparsity of CO and NOx measurements these were not included in the CA. However, simulated CO and NOx mixing ratios are consistent with the general classification into more polluted and more background sites. Mean CO mixing ratios are within 140–145 nmol mol−1 (CL1–CL3) and 130–135 nmol mol−1 (CL4 and CL5), and NOx mixing ratios are within 4–6 nmol mol−1 and 2–3 nmol mol−1, respectively. These results confirm that relatively coarse-scale global models are more suitable for simulation of regional background concentrations, which are less variable in space and time. We conclude that CA of surface ozone observations provides a powerful and robust way to stratify sets of stations, being thus more suitable for model evaluation.

Short summary
This study applies numerical clustering for the classification of about 1500 ozone data sets in Europe. We show the usefulness of cluster analysis (CA) for the quantitative evaluation of a global model: pre-selection of stations and validation of a global model in a phase-space produce clearer and more interpretable results. CA can be easily updated for new stations, different length of data, and other type of input properties, as well as other type of data (for example, meteorological).
Final-revised paper