Eastern China (27–41

We further evaluate simulations of nested GEOS-Chem v9-02 and
WRF/CMAQ v5.0.1 in capturing the
spatiotemporal variability of pollutants. GEOS-Chem underestimates

Eastern China (EC, 25–41

Over Eastern China,

For PM

This study analyzes the spatiotemporal variability of

We further use the EOF–EEMD package to evaluate how well chemical transport models (CTMs) can reproduce the observed pollution variability. Although popularly used in air pollution diagnosis, forecast and projection, and remote sensing (Geng et al., 2015; Lin et al., 2015), models are subject to errors in emissions, chemistry, transport, PBL mixing, and other processes (Lin et al., 2008, 2012; Zhang et al., 2016b). This study evaluates two representative models, GEOS-Chem and WRF/CMAQ, with a note that such an evaluation can be applied to other models.

The rest of the paper is organized as follows. Section 2 introduces in situ
measurements of

We focus on pollution over Eastern China
(25–41

Our study period is from 25 October to 25 December 2013, with a total of 1488 h in 62 days. Most air pollution data are missing in January and February 2014 because of instrumental failure or data retrieval failure, and data before 25 October are not available.

We retrieve hourly measurements of

At the monitoring sites,

Here we follow Lamsal et al. (2008) to correct for the interference by
introducing a correction factor (CF) based on GEOS-Chem-simulated nitrogen
species (

Regional mean hourly time series of raw and “corrected”

Figure 2 compares the regional mean hourly time series of raw and
corrected

Prior to an EOF–EEMD analysis, we fill in missing values in hourly pollution observations. If data are missing for more than a consecutive 12 h period, we fill in the missing value in each hour with data on that hour averaged over all days; as such, the diurnal cycle is maintained. In other cases, linear interpolation from adjacent valid data is applied. Our interpolation does not introduce significant artificial information for spatiotemporal analysis, as validated by a sensitivity test with GEOS-Chem model data. Specifically, the EOF–EEMD results based on the original GEOS-Chem data (i.e., no missing values) are similar to the results based on model data sampled at times of valid observations with missing values filled in with the same technique as for the observation data.

Since different cities have different numbers of stations, we calculate city mean observations by averaging across all stations of each city. Compared to a station-based analysis, the city-based EOF–EEMD results reduce the spatial noise, leading to more distinctive temporal patterns. All analyses hereafter are based on city mean data. The longitude and latitude of each city center are used to identify the respective model grid cell.

We use 3-hourly measurements of 2 m air temperature, 2 m relative humidity, and 10 m wind speed from meteorological stations recorded at the National Oceanic and Atmospheric Administration National Centers for Environment Information (NOAA NCEI). We do not use surface pressure additionally because it is highly correlated with air temperature and relative humidity on the day-to-day scale. The locations of these stations do not always coincide with air pollution stations. Thus, we select 36 meteorological stations within 10 km of air pollution stations (red hollow dots in Fig. 1). Despite the difference (in number and location) between pollution and meteorological stations, an analysis of the regional temporal patterns of pollutants and meteorology is still informative (see Sect. 3.2).

To fill in missing values, we apply an interpolation process that accounts for diurnal variability using information for an adjacent day. For example, if the temperature on 26 October at 12:00 is missing, we calculate the temperature difference between 09:00 and 12:00 on the 25th as well as the difference between 15:00 and 12:00 on the 25th. We then use these differences to adjust the temperatures at 09:00 and 15:00 on the 26th, and finally use the mean of the two adjusted temperatures as the temperature on the 26th at 12:00.

For consistency with the hourly pollution data, we linearly interpolate the 3-hourly meteorological measurements to each hour. This interpolation does not distort the EOF–EEMD analysis, as confirmed by comparing the statistical analysis on 1-hourly GEOS-FP meteorological parameters versus an analysis on 3-hourly GEOS-FP data. Note that the GEOS-FP meteorology is used to drive GEOS-Chem.

We use the nested GEOS-Chem CTM version 9-02 (L. Zhang et
al., 2016) to simulate

Chinese anthropogenic emissions of

GEOS-Chem modeled PM

The nested model simulation is from 15 October to 25 December in 2013, allowing for a 10-day spin-up period. Its lateral boundary
conditions of chemicals are updated every 3 h by results from a
corresponding global simulation on a 2.5

We use the Weather Research and Forecasting (WRF) model v3.5.1
(

Chinese anthropogenic emissions are from MEIC (

The PM

The simulation is from 15 October to 25 December 2013,
allowing for a 10-day spin-up period. Initial conditions and boundary
conditions are from GEOS-Chem (Zheng et al., 2015). Modeled

The flowchart of the EOF–EEMD analysis visualization package. The red boxes represent the quantities visualized.

As shown in Fig. 3, our EOF–EEMD analysis visualization package consists, in order, of an EOF analysis (Lorenz, 1956), an EEMD analysis (Wu et al., 2009), a Hilbert transform (HT) with marginal spectrum analysis (MSA), and a visualization step to quantitatively depict the spatial–temporal scales of measurement or model data.

The basic purpose of our package is to quickly and simultaneously identify
and visualize various spatial and temporal scales of interest in the
observation or model datasets. As shown by Feng et al. (2014) and Wu et al. (2016),
combining EOF with EEMD to decompose the datasets leads to a faster
calculation than MEEMD by 1 or 2 orders of magnitude because here the EEMD is
applied to the temporal components (i.e., PCs) out of an EOF analysis rather
than to all dimensions. Also, our EOF–EEMD package conducts additional
HT–MSA and provides visualization of all spatial and temporal scales of
interest.

EOF analysis to decompose a two-dimensional dataset (time series at multiple locations) into spatial and temporal components.

Suppose there are

EEMD analysis of each PC time series to obtain its “intrinsic mode functions” (IMFs) of descending frequencies.

Each PC is mixed with multiple scales, which requires further decomposition in the time domain. Unlike fast Fourier transform (FFT) or wavelet transform (WT), EEMD does not need a priori bases, and it can be appropriately applied to delineate nonlinear and nonstationary time series, as in our pollution study.

EEMD consists of an ensemble of empirical mode decomposition (EMD) performed
on each PC time series (denoted as

EMD may be sensitive to noise in the real data to encounter a “mode
mixing” problem (Wu et al., 2009). EEMD solves this
problem by performing an ensemble of hundreds of EMDs, each with certain
white noise added to

Hilbert transform and marginal spectrum analysis of each IMF to reveal its representative frequency range.

There are no discrete periods or frequencies in the pollution and meteorological time series. Correspondingly, an IMF also has a continuous frequency range (rather than a constant frequency) that can be determined by HT–MSA. The HT reveals the IMF energy–frequency–time distribution (Huang et al., 1999). The MSA further shows the IMF distribution of variance (energy) with respect to different frequencies. The spectral peak represents the largest contribution to total variance.

A spurious oscillation may occur near the edges of certain IMF time series,
resulting in an inaccurate calculation of variance under HT–MSA. We apply a
box-car filter (Gubbins, 2004) to select the internal 60 % of
an IMF time series (from 20 % to 80 % of the 1488 h) to perform
HT–MSA. Figure 4b shows an example of the visualized result of HT–MSA, in
which
the horizontal axis is the number of occurrences within the whole period
(frequency, in h

Based on HT–MSA, we determine a representative frequency range (RFR) such that the range encompasses the peak frequency and that the frequencies within the range contribute 50 % of the total variance of an IMF. The frequencies below and above the RFR bounds each contribute 25 % of the total variance of the IMF. Before calculating the RFR, we smooth the marginal spectrum by connecting all local maxima of the spectrum with a cubic spline.

Visualization of the spatial and temporal scales in a two-dimensional plot.

Finally, we simultaneously visualize the spatial and temporal scales as well
as their contributions to the total variance of

EEMD–HT–MSA result for PC1 of observed

Observed (filled circles) and modeled (color maps)

The colored dots in Fig. 5a and b show the observed spatial distributions of
city mean

Figure 6a and b show the diurnal variations of

Figure 6c and d further show the time series of daily mean

Daily anomalies of observed meteorological parameters and
pollutant concentrations averaged over NEC and SEC, as well as their
correlations. All data are de-trended. Correlation coefficients with “*”
and “**” are statistically significant with

Figure 7 shows day-to-day anomalies of observed pollutant concentrations and
meteorological parameters over NEC and SEC. All data are de-trended. Over
NEC, wind speed is clearly anticorrelated with pollutant levels. The
correlation coefficient reaches

Over SEC (Fig. 7), the relationship between pollutant levels and
meteorological parameters is more complex. The correlation between daily
mean PM

Although informative, the time series analyses of regional mean pollution in Sect. 3.1 do not provide adequate quantitative information on the spatiotemporal variability and embedded scales. In fact, the separate discussion on NEC and SEC in Sect. 3.1 is largely inspired by the following EOF–EEMD analysis that suggests distinctive features between these two subregions. In this section, we use the EOF–EEMD package to distinguish and visualize the quantitative contributions of individual spatial and temporal modes to variations in the pollutant and meteorological data.

EOF–EEMD–HT–MSA results for the observed temperature, RH, wind
speed,

The columns in Fig. 8 show the EOF–EEMD results for the observed
temperature, RH, wind speed,

The fourth column in Fig. 8 for

We further investigate the physical meanings of PC1 and PC2 for

Correlation between PCs and regional mean values in terms of diurnal
and day-to-day variability for

** The correlation coefficient is statistically significant with the

The last column in Fig. 8 shows the EOF–EEMD result for PM

Correlation between PCs and regional mean values in terms of
diurnal and day-to-day variability for PM

** The correlation coefficient is statistically significant with the

For comparison, the first three columns in Fig. 8 show the EOF–EEMD results
for the observed temperature, RH, and wind speed. The EOF–EEMD result for
wind speed (the third column in Fig. 8) is closest to that for

The EOF–EEMD analysis for temperature (the first column in Fig. 8) shows
that PC1 contributes 88 % of the total variance, and it is dominated by
the IMF with a period of 24 h. The contribution of PC2 is negligible
(4 %). For RH (the second column in Fig. 8), PC2 plays a minor role, and
there are IMFs of PC1 with periods near 3 and 12 days, contributing to the
correlation between RH and PM

The color contours in Fig. 5a–d show the horizontal distributions of

Observed and simulated diurnal and day-to-day variations of

Figure 9 evaluates the regional mean diurnal and day-to-day variations of
modeled pollutant levels over NEC and SEC. Here model data are sampled from
days and locations with valid observations. All trends are negligible and
have been removed, consistent with the observational analysis. GEOS-Chem
underestimates the observations by about 17

Observed and simulated pollutants and their correlations.

Figure 9 also shows that WRF/CMAQ overestimates the nighttime observations
by about 30

EOF–EEMD–HT–MSA results for observed, GEOS-Chem, and CMAQ

EOF–EEMD results for observed, GEOS-Chem, and CMAQ PM

Figures 10 and 11 evaluate the EOF–EEMD results for modeled

The first two rows in Fig. 10 show EOF1 and EOF2 of

The last three rows in Fig. 10 show that both models underestimate the
contribution of day-to-day variability to the total variance of

Figure 11 shows that both GEOS-Chem and CMAQ capture the synchronous pattern
(EOF1) and the NEC–SEC contrasting pattern (EOF2) of PM

WRF/CMAQ overestimates the diurnal variation of

GEOS-Chem (the first model layer) underestimates surface

Eastern China mean

GEOS-Chem (the first model layer) also underestimates the Eastern China
synchronous day-to-day variation of

The magnitude of emission differences between the two models plays an
insignificant role in the differences between their simulated

We further use CMAQ simulations to investigate whether the inclusion of SOA
affects our analysis of the spatiotemporal patterns of PM

This study uses a newly compiled EOF–EEMD analysis visualization package to
evaluate the spatiotemporal variations of hourly

An EOF–EEMD analysis of the observed PM

Further evaluation of GEOS-Chem and WRF/CMAQ simulations shows that both
models simulate the observed EOF1 and EOF2 patterns well. Both models
capture the day-to-day variability of PM

This study suggests that the EOF–EEMD package is a useful tool providing a simultaneous and quantitative view of the spatial and temporal (both stationary and nonstationary) scales embedded in a dataset. The package can be applied to other chemical, meteorological, or climatic variables and will be freely accessible to the public.

Air pollution observations are taken from the Ministry of
Environmental Protection (

The supplement related to this article is available online at:

ML, YW, and JL designed research, constructed the EOF–EEMD package, and performed the research. ZW provided the EEMD code in MATLAB, and JC and ZF contributed to revision of EEMD. YS provided pollution measurement data. JS and LZ provided GEOS-Chem code, ML, LC, and YY conducted GEOS-Chem simulations, and BZ, QZ, and YZ provided CMAQ results. ML, YW, and JL analyzed the results and wrote the paper with input from all authors.

The authors declare that they have no conflict of interest.

This article is part of the special issue “Regional transport and transformation of air pollution in eastern China”. It is not associated with a conference.

This research is supported by the National Natural Science Foundation of China (41775115) and the 973 program (2014CB441303). Edited by: Hang Su Reviewed by: two anonymous referees