Preprints
https://doi.org/10.5194/acp-2021-726
https://doi.org/10.5194/acp-2021-726

  03 Sep 2021

03 Sep 2021

Review status: this preprint is currently under review for the journal ACP.

Estimation of the vertical distribution of particle matter (PM2.5) concentration and its transport flux from lidar measurements based on machine learning algorithms

Yingying Ma1, Yang Zhu2, Hui Li1, Shikuan Jin1, Yiqun Zhang1, Ruonan Fan1, Boming Liu1, and Wei Gong3 Yingying Ma et al.
  • 1State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan, China
  • 2School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China
  • 3School of Electronic Information, Wuhan University

Abstract. The vertical distribution of aerosol extinction coefficient (EC) measured by lidar system has been used to retrieve the profile of particle matter with a diameter < 2.5 μm (PM2.5). However, the traditional linear model (LM) cannot consider the influence of multiple meteorological variables sufficiently, and then inducing the low inversion accuracy. Generally, the machine learning (ML) algorithms can input multiple features which may provide us with a new way to solve this constraint. In this study, the surface aerosol EC and meteorological data from January 2014 to December 2017 were used to explore the conversion of aerosol EC to PM2.5 concentrations. Four ML algorithms were used to train the PM2.5 prediction models, including Random Forest (RF), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and eXtreme Gradient Boosting Decision Tree (XGB). The mean absolute error (root mean square error) of LM, RF, KNN, SVM and XGB models were 11.66 (15.68), 5.35 (7.96), 7.95 (11.54), 6.96 (11.18) and 5.62 (8.27) μg/m3, respectively. This result show that the RF model is the most suitable model for PM2.5 inversions from EC and meteorological data. Moreover, the sensitivity analysis of model input parameters was also conducted. All these results further indicated that it is necessary to consider the effect of meteorological variables when using EC to retrieve PM2.5 concentrations. Finally, the diurnal and seasonal variations of transport flux (TF) and PM2.5 profiles were analyzed based on the lidar data. The large PM2.5 concentration occurred at approximately 13:00–17:00 Location Time (LT) in 0.2–0.8 km. The diurnal variations of the TF shows a clear conveyor belt at approximately 12:00–18:00 LT in 0.5–0.8 km. These results indicated that air pollutants transport over Wuhan mainly occurs at approximately 12:00–18:00 LT in 0.5–0.8 km. The TF near the ground usually have the highest value in winter (0.26 mg/m2 s), followed by the autumn and summer (0.2 and 0.19 mg/m2 s), respectively, and the lowest value in spring (0.14 mg/m2 s). These findings give us important information of atmospheric profile and provide us sufficient confidence to apply lidar in the study of air quality monitoring.

Yingying Ma et al.

Status: open (until 15 Oct 2021)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Yingying Ma et al.

Yingying Ma et al.

Viewed

Total article views: 330 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
263 63 4 330 1 5
  • HTML: 263
  • PDF: 63
  • XML: 4
  • Total: 330
  • BibTeX: 1
  • EndNote: 5
Views and downloads (calculated since 03 Sep 2021)
Cumulative views and downloads (calculated since 03 Sep 2021)

Viewed (geographical distribution)

Total article views: 317 (including HTML, PDF, and XML) Thereof 317 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 19 Sep 2021
Download
Short summary
The vertical distribution of aerosol extinction coefficient (EC) measured by lidar system has been used to retrieve the profile of particle matter with a diameter < 2.5 μm (PM2.5). However, the traditional linear model cannot consider the influence of multiple meteorological variables sufficiently, and then inducing the low inversion accuracy. In this study, the machine learning algorithms which can input multiple features are used to solve this constraint.
Altmetrics