<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">ACP</journal-id><journal-title-group>
    <journal-title>Atmospheric Chemistry and Physics</journal-title>
    <abbrev-journal-title abbrev-type="publisher">ACP</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Atmos. Chem. Phys.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1680-7324</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/acp-26-6117-2026</article-id><title-group><article-title>Multi-machine-learning approaches to modeling small-scale source attribution of ozone formation</article-title><alt-title>Multi-ML approaches to modeling small-scale source attribution of O<sub>3</sub> formation</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2">
          <name><surname>Xiao</surname><given-names>Zheng</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff3">
          <name><surname>Lu</surname><given-names>Yifeng</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff2">
          <name><surname>Xiu</surname><given-names>Guangli</given-names></name>
          <email>xiugl@ecust.edu.cn</email>
        <ext-link>https://orcid.org/0000-0002-1122-331X</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>State Environmental Protection Key Lab of Environmental Risk Assessment and Control on Chemical Processes, School of Resources &amp; Environmental Engineering, East China University of Science and Technology, Shanghai 200237, China</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Shanghai Environmental Protection Key Laboratory for Environmental Standard and Risk Management Of Chemical Pollutants, School of Resources &amp; Environmental Engineering, East China University of Science and Technology, Shanghai 200237, China</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Shanghai Chemical Industry Park Administration Committee, Shanghai 201507, China</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Guangli Xiu (xiugl@ecust.edu.cn)</corresp></author-notes><pub-date><day>8</day><month>May</month><year>2026</year></pub-date>
      
      <volume>26</volume>
      <issue>9</issue>
      <fpage>6117</fpage><lpage>6132</lpage>
      <history>
        <date date-type="received"><day>14</day><month>January</month><year>2025</year></date>
           <date date-type="rev-request"><day>5</day><month>March</month><year>2025</year></date>
           <date date-type="rev-recd"><day>28</day><month>June</month><year>2025</year></date>
           <date date-type="accepted"><day>9</day><month>July</month><year>2025</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Zheng Xiao et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026.html">This article is available from https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026.html</self-uri><self-uri xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026.pdf">The full text article is available as a PDF file from https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e121">Accurate source apportionment of ozone (O<sub>3</sub>) precursors is crucial for implementing scientific O<sub>3</sub> control strategies. While traditional approaches rely on complex calculations of volatile organic compounds (VOCs) and meteorological parameters, their applicability in real-time scenarios remains limited. Taking the Shanghai chemical industrial park as an example, we propose a novel two-step machine learning (ML) approach that integrates positive matrix factorization (PMF) with other ML methods to systematically quantify the spatiotemporal impacts of VOCs on O<sub>3</sub> formation. Analysis of high-frequency data from 12 VOC monitoring stations (2021–2023) using six ML models revealed XGBoost as the optimal predictor (<inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.644</mml:mn></mml:mrow></mml:math></inline-formula>) for local VOC emissions. By combining SHapley Additive exPlanations (SHAP) with ML modeling, we precisely evaluated VOC–O<sub>3</sub> relationships and located emission sources. Results identified solvent use (SU) and fuel evaporation (FE) as primary O<sub>3</sub> formation contributors, followed by combustion sources (CS) and vehicle emissions (VE). PMF analysis further distinguished six VOC sources: petrochemical processes (PP), FE, CS, SU, polymer fabrication (PF) and VE. Temporal analysis revealed seasonal variations, with CS and FE dominant in spring/summer, while PF prevailed in autumn. This innovative framework demonstrates exceptional capability for rapid source identification and precise contribution quantification, establishing a new paradigm for high-resolution O<sub>3</sub> source apportionment.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Ministry of Science and Technology of the People's Republic of China</funding-source>
<award-id>2022YFC3703501</award-id>
<award-id>2022YFC3703503</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e207">Ozone (O<sub>3</sub>) pollution has become a significant environmental issue, posing serious threats to human health and ecosystems worldwide (Long et al., 2023; Sharma et al., 2023; Masui et al., 2023; Sharma et al., 2024). In particular, industrial parks, which are characterized by high levels of anthropogenic emissions, serve as critical hotspots for the formation of ground-level O<sub>3</sub> due to the abundance of precursor pollutants such as volatile organic compounds (VOCs) and nitrogen oxides (NO<sub><italic>x</italic></sub>) (Pinthong et al., 2022; Kim et al., 2023; He et al., 2024). From the current point of view, the pollution characteristics and sources of VOCs in industrial parks are relatively more complex (Cao et al., 2024). Understanding the contribution of VOCs to O<sub>3</sub> formation is essential for developing effective mitigation strategies aimed at reducing air pollution and improving air quality.</p>
      <p id="d2e246">At present, VOC source identification technology mainly includes source emission inventories (Lu et al., 2025), chemical transport models (CTMs) (Choi et al., 2020; Wang et al., 2023) and receptor models (Wu et al., 2023). Traditionally, the quantification of VOC contributions to O<sub>3</sub> formation has relied heavily on CTMs, which require detailed knowledge of atmospheric chemistry and complex computational resources (Li et al., 2014). However, these models often suffer from uncertainties related to emission inventories and chemical mechanisms (Sharma et al., 2017; Baklanov and Korsholm, 2008). In addition, the detailed input data and computing power requirements of CTMs leave some areas for improvement (Nelson et al., 2023). The receptor model has the advantages of fewer hardware requirements, higher precision and easy configuration, among which the positive matrix factorization (PMF) model has been widely used to identify VOC emission sources (Tan et al., 2021; Yang et al., 2023). However, it should be emphasized that PMF and CTMs are methodologically distinct: CTMs simulate physico-chemical processes driven by emission inventories and meteorology, enabling dynamic regional-scale predictions, whereas PMF statistically decomposes sources based on covariance in observed species concentrations. While PMF demonstrates strong performance in settings with spatially dense monitoring, its accuracy is inherently constrained by input data representativeness and chemical stability of source profiles. Consequently, PMF is best positioned as a complementary tool for CTMs in integrated assessment frameworks. PMF models are typically combined with the O<sub>3</sub> formation potential (OFP) calculated by the maximum incremental reactivity (MIR) value of VOC species (Carter, 2010) to assess the relationship between VOCs and O<sub>3</sub> concentrations (Xiao et al., 2024). As the key parameter MIR value used in this method is usually calculated based on 39 cities in the United States where O<sub>3</sub> exceeds the standard, whether the MIR value can fully reflect the contribution of VOCs to O<sub>3</sub> under complex atmospheric pollution conditions in China is controversial (Zhang et al., 2021). Furthermore, conventional methods like PMF, while effective for source categorization, face critical limitations: they cannot differentiate rapid O<sub>3</sub> formation via photostationary state perturbations (e.g., alkene depletion) from slower HO<sub><italic>x</italic></sub>-mediated pathways (e.g., alkane oxidation) (Sillman, 1999). This mechanistic gap introduces spatiotemporal biases when quantifying source contributions in chemically complex environments like coastal petrochemical zones. To address these challenges, our study integrates interpretable machine learning (ML) with PMF, explicitly resolving fast vs. slow O<sub>3</sub> production pathways while leveraging the study area's unique spatial and industrial characteristics.</p>
      <p id="d2e322">In recent years, more ML techniques have emerged as powerful tools for analyzing complex datasets and predicting environmental phenomena (Salcedo-Sanz et al., 2024; Essamlali et al., 2024). However, the “black box” nature of ML models makes their results difficult to interpret and generalize to real scenarios (Guidotti et al., 2018). Recently, the SHapley Additive Interpretation (SHAP) algorithm has been applied to solve these problems (Louhichi et al., 2023; Li et al., 2024; Lundberg and Lee, 2017). Novel ML approaches can provide robust predictions with less reliance on mechanistic details, making them attractive alternatives or complementary methods to traditional ML models. To address these gaps, we integrate PMF with other advanced ML methods to develop a two-step ML framework. This approach not only enhances the efficiency and accuracy of data analysis but also addresses the limitations of traditional PMF methods under complex environmental conditions, providing a more robust solution for pollution source identification and contribution analysis.</p>
      <p id="d2e325">In the past, researchers have used different ML models to study the transformation mechanism of O<sub>3</sub> and its precursors (Cheng et al., 2024; Kuo and Fu, 2023). More recently, ML-coupled receptor models have been used to better identify and quantify the drivers of pollutants. For example, Cheng et al. (2023) assessed the impact of emission sources on O<sub>3</sub> formation by combining PMF with four ML models. Chen et al. (2024) used a PMF model combined with the Category Boosting (CatBoost) model and the Shapley additive interpretation algorithm to quantify the influence of pollution sources and meteorological factors on VOCs. Ning et al. (2024) constructed a cross-stacked ensemble learning model (CSEM) to predict O<sub>3</sub> concentrations under different NO<sub><italic>x</italic></sub> and VOC emission reduction scenarios.</p>
      <p id="d2e365">Our study focuses on a coastal petrochemical industrial zone in Shanghai, where proximity to the ocean and dense aggregation of ethylene crackers and aromatics plants create a unique microenvironment. High humidity and solar irradiance amplify atmospheric oxidative capacity (Zhang et al., 2021), while emissions of short-lived unsaturated VOCs (e.g., propylene, 1,3-butadiene) drive rapid O<sub>3</sub> formation via alkene-NO<sub><italic>x</italic></sub> reactions within <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> km under sea–land breezes (3–5 m s<sup>−1</sup>) (Zhang et al., 2025; Zhao et al., 2022; Wang et al., 2018). Additionally, our study pioneers high-resolution source attribution (1 km <inline-formula><mml:math id="M29" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1 km) in a coastal petrochemical cluster in Shanghai, a methodology particularly suited to resolve localized ozone production under elevated regional backgrounds (<inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula> ppb). By leveraging the unique microenvironment of dense ethylene crackers and aromatics plants adjacent to the ocean, we isolate rapid photochemical processes driven by short-lived, highly reactive VOCs (e.g., propylene, 1,3-butadiene; atmospheric lifetime <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:math></inline-formula> h) that dominate O<sub>3</sub> formation within <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> km.</p>
      <p id="d2e455">Considering the advantages of the diverse types and complex sources of VOCs in chemical industrial parks, this characteristic enhances model robustness and improves the model's feature extraction capabilities. This study selected a chemical industrial park in Shanghai as an ideal case for model training. In this study, we present a novel small-scale approach that integrates multiple ML models to quantify the impact of VOCs on O<sub>3</sub> formation with unprecedented spatial resolution. Our methodology harnesses the analytical power of ML algorithms to process high-frequency data from 12 strategically positioned VOC boundary monitoring stations, enabling rapid and accurate source identification at a fine-grained spatial scale previously unattainable through conventional methods. The analytical framework consists of three sophisticated components: first, we conduct a systematic evaluation of diverse ML algorithms to identify the optimal model configuration, ensuring robust predictive performance at the local scale. Subsequently, we leverage the interpretable ML technique SHAP to quantitatively assess VOC–O<sub>3</sub> relationships and precisely pinpoint emission sources with high spatial accuracy. Finally, we develop an innovative hybrid approach that combines PMF for source apportionment with ML-SHAP analysis to achieve rapid and precise identification of key pollution sources contributing to O<sub>3</sub> formation at the facility level. This advanced methodological framework demonstrates significant advantages in both efficiency and spatial precision: it enables swift identification of specific emission sources while maintaining high accuracy in quantifying their individual contributions to O<sub>3</sub> formation. The approach transcends traditional analytical limitations by offering a powerful tool for high-resolution source traceability, thereby providing crucial support for implementing targeted and effective O<sub>3</sub> control strategies at the facility or district level. Moreover, this novel integration of multiple analytical techniques establishes a new paradigm for addressing complex air quality challenges through sophisticated data-driven approaches that bridge the gap between regional-scale analysis and facility-level source identification.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Material and methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Study area and data details</title>
      <p id="d2e518">The Jinshan District is located in the southwestern part of Shanghai, along the northern shore of Hangzhou Bay (HZB) at geographical coordinates 30°40<sup>′</sup>–30°58<sup>′</sup> N and 121–121°25<sup>′</sup> E, covering a total land area of 586.05 km<sup>2</sup> (Fig. 1). It is recognized as an important resource-based city. The Jinshan Industrial Zone, situated in the southeastern part of Jinshan District, serves as a fine chemical industrial park. Within and surrounding the industrial zone, 12 monitoring stations have been established to assess the environmental air quality in the area (as detailed in Table S1 in the Supplement). The distribution of these monitoring stations is illustrated in Fig. 1. Hourly O<sub>3</sub> concentration data utilized in this study were obtained from the Atmospheric Environmental Monitoring Routine Data Management System of the Shanghai Environmental Monitoring Station (<ext-link xlink:href="https://github.com/bobooob/Multi-machine-learning-approaches-to-modeling-small-scale-source-attribution-of-ozone-formation">https://github.com/bobooob/Multi-machine-learning-approaches</ext-link>, last access: 29 April 2026). The concentration data for VOCs were sourced from the 12 monitoring stations and can be accessed through the Intelligent Analysis System for VOC Emissions and Pollution Source Tracing in Key Industrial Parks of Shanghai (<ext-link xlink:href="https://github.com/bobooob/Multi-machine-learning-approaches-to-modeling-small-scale-source-attribution-of-ozone-formation">https://github.com/bobooob/Multi-machine-learning-approaches</ext-link>, last access: 29 April 2026). The dataset covers a collection period from 1 January 2021 to 31 December 2023, with measurements recorded on an hourly basis.</p>
      <p id="d2e573">Two types of VOC monitoring instruments were deployed at the boundary observation stations. The first type utilized gas chromatography–flame ionization detector (GC-FID) technology, featuring low-carbon (C<sub>2</sub>–C<sub>5</sub>) and high-carbon (C<sub>6</sub>–C<sub>12</sub>) analyzers (Synspec GC955-615/815, Juguang Technology Co., Ltd.; A11000/A21022, Chromatotec Inc., France; Spectra SYS GC3000-315L/H, Pu Yu Technology Development Co., Ltd.) and enabling automatic hourly measurements of 89 VOC species. The second type employed a combined GC-FID and mass spectrometry (GC-FID/MS) approach, which included FID detection for C<sub>2</sub>–C<sub>5</sub> aromatic hydrocarbons and mass spectrometry detection for other compounds. The instrument operated by sampling at 30 L min<sup>−1</sup> for the initial 10 min each hour, utilizing a cryogenic cold trap for sample preservation before separation and detection on specific chromatographic columns. Quality assurance and quality control (QA/QC) protocols adhered to the “Technical Specifications for Operation and Quality Control of Continuous Automatic Monitoring Systems for Gaseous Pollutants in Ambient Air” (HJ 818-2018), issued by China's Ministry of Ecology and Environment. Daily checks ensured data completeness and chromatogram integrity, with any detected abnormalities prompting immediate on-site maintenance. Routine data audits involved the removal of abnormal data, while measurement accuracy was verified biweekly, with calibration curves, method detection limits (MDLs) and instrument precision assessed quarterly. Standard gas accuracy checks showed relative errors below 20 %, and in blank tests, absolute errors were less than 0.3 ppbv. Calibration curves were established using five standard gases (1, 5, 10, 15 and 20 ppbv), yielding correlation coefficients greater than 0.995, and MDLs for PAMS and TO-14 species were maintained at or below 0.3 and 0.5 ppbv, respectively. Throughout the study period, a total of 26 280 sets of VOC data were collected, resulting in the identification of 36 distinct VOC species, comprising 12 alkanes, 7 alkenes, 11 aromatics and 6 halogenated hydrocarbons after data screening.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e645">Study area and sampling sites. The triangles indicate the 12 sites utilized for the evaluation of the model.</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026-f01.png"/>

        </fig>

</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Positive matrix factorization model (PMF)</title>
      <p id="d2e662">The PMF 5.0 software, developed by the United States Environmental Protection Agency (EPA), is widely utilized to assess and quantify the contributions from various sources to samples based on their chemical composition or unique fingerprints. Initially introduced by Paatero (1997) at the University of Helsinki, this model decomposes the sample matrix (which is non-negative) into two distinct matrices: the source contribution matrix (<inline-formula><mml:math id="M51" display="inline"><mml:mi mathvariant="bold">g</mml:mi></mml:math></inline-formula>) and the source component spectral matrix (<inline-formula><mml:math id="M52" display="inline"><mml:mi mathvariant="bold">f</mml:mi></mml:math></inline-formula>). The least-squares method is subsequently employed to estimate the contribution rates and identify major pollution sources, with the objective of minimizing the discrepancy between the calculated <inline-formula><mml:math id="M53" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> value and the theoretical <inline-formula><mml:math id="M54" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> value. The formulation of the PMF model is represented by Eq. (1):

            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M55" display="block"><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>p</mml:mi></mml:munderover><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the concentration of <inline-formula><mml:math id="M57" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>th species in the <inline-formula><mml:math id="M58" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th sample, <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the concentration of <inline-formula><mml:math id="M60" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th source in the <inline-formula><mml:math id="M61" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th sample, <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the mass percentage of <inline-formula><mml:math id="M63" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>th species in the <inline-formula><mml:math id="M64" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th sample, and <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the residue factor for <inline-formula><mml:math id="M66" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>th species in the <inline-formula><mml:math id="M67" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th sample.</p>
      <p id="d2e863">In the PMF model, factor contributions and source fingerprint spectra are derived by minimizing the objective function (<inline-formula><mml:math id="M68" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula>), as described in Eq. (2):

            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M69" display="block"><mml:mrow><mml:mi>Q</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>p</mml:mi></mml:munderover><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

          The uncertainty (<inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) of the <inline-formula><mml:math id="M71" display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>th species in the <inline-formula><mml:math id="M72" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th sample is calculated by considering the minimum detection limit (MDL) and error scores of each species.</p>
      <p id="d2e997">Crucially, PMF resolves source-specific contributions to VOC mass concentrations (<inline-formula><mml:math id="M73" display="inline"><mml:mi mathvariant="bold">g</mml:mi></mml:math></inline-formula> matrix), not directly to ozone formation. To attribute ozone impacts, we integrate the PMF-derived <inline-formula><mml:math id="M74" display="inline"><mml:mi mathvariant="bold">g</mml:mi></mml:math></inline-formula> matrix as inputs to machine learning models.</p>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Machine learning models</title>
      <p id="d2e1022">To explore data characteristics and select the optimal ML model, we utilized data from the Jinshan Industrial Zone covering the period from January 2021 to December 2023 for training and evaluating ML models. In this study, hourly data of O<sub>3</sub> and total volatile organic compound (TVOC) concentrations from 12 monitoring stations were input into the ML models. The dataset was partitioned into training (70 %) and testing (30 %) subsets using chronological splitting to preserve temporal integrity. Model robustness was further validated via 10-fold cross-validation on the training set. Subsequently, the ML models were integrated with the SHAP (SHapley Additive exPlanations) (Sect. S1.2 for model details) framework to obtain SHAP values for the 12 stations, quantifying their contributions to O<sub>3</sub> formation. SHAP values quantify the relative influence of input features (e.g., site-specific TVOC concentrations) on ML-predicted O<sub>3</sub> levels – not absolute physicochemical contributions to ozone formation. Higher absolute SHAP magnitudes indicate stronger feature importance in the model's decision-making, enabling spatial prioritization of emission hotspots. The stations contributing the most to O<sub>3</sub> were identified, and the source of characteristic VOC data was recognized using the PMF model. The emission factors derived from the PMF analysis, along with hourly O<sub>3</sub> concentration data, were then input into the ML models for SHAP analysis. Finally, the optimal ML model and the most severe pollution sources were identified. The source contribution matrix (<inline-formula><mml:math id="M80" display="inline"><mml:mi mathvariant="bold">g</mml:mi></mml:math></inline-formula> matrix) derived from PMF analysis, representing the hourly contribution rates (%) of the six resolved sources, served as input features alongside concurrent O<sub>3</sub> concentration data for training the machine learning models. This approach transformed source apportionment results into predictive variables for O<sub>3</sub> formation modeling. Subsequently, the SHAP algorithm was applied to quantify the contribution of each PMF-resolved source to O<sub>3</sub> predictions generated by the optimized ML model, enabling high-resolution attribution of emission sources to ozone formation dynamics.</p>
      <p id="d2e1105">In this study, we implemented six ML models, including Decision Tree Regression (DTR), Random Forest Regression (RF), Support Vector Regression (SVR), XGBoost Model, CatBoost Model and LightGBM Model (refer to Sect. S1.1 for model details). Compared to complex deep learning architectures (e.g., CNNs, RNNs), these models offer distinct advantages for high-dimensional spatiotemporal datasets: (1) native support for feature importance metrics (e.g., Gini, permutation importance) enables direct interpretation of predictor contributions without post hoc explainers, (2) computational efficiency facilitates rigorous hyperparameter optimization with Bayesian methods and (3) robustness to collinear features common in atmospheric chemistry (Pichler and Hartig, 2023; Kaur et al., 2020). Bayesian optimization (Robin et al., 2021) was then applied to determine optimal hyperparameters across more the 120 000 hourly samples. Detailed implementation procedures of Bayesian optimization, including model-specific hyperparameter spaces, convergence criteria and computational configurations, are comprehensively documented in Sect. S1.3 of the Supplement. Ultimately, through iterative optimization of training configurations (<inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:mn mathvariant="normal">70</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> train–test partitioning, 10-fold cross-validation) and six repetitions of randomized initialization experiments with distinct random seeds (Seed <inline-formula><mml:math id="M85" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 42, 87, 124, 256, 512, 1024), we selected the model with the highest <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> from the six ML models, along with its corresponding parameter combinations, as the optimal ML model. Additionally, to assess the robustness and stability of the models, we employed mean absolute error (MAE) and root mean squared error (RMSE) as evaluation metrics.</p>
      <p id="d2e1138"><inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> is an indicator that measures the overall goodness of fit of a regression model, and its calculation is shown in Eq. (3):

            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M88" display="block"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents the actual observed values, <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denotes the predicted values, <inline-formula><mml:math id="M91" display="inline"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">¯</mml:mo></mml:mover></mml:math></inline-formula> is the mean of the actual observed values, and <inline-formula><mml:math id="M92" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> indicates the total number of observations.</p>
      <p id="d2e1277">MAE is the average of the absolute differences between the predicted values and the actual values, as calculated in Eq. (4):

            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M93" display="block"><mml:mrow><mml:mtext>MAE</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mfenced close="|" open="|"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the actual observed value, <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the predicted value and <inline-formula><mml:math id="M96" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the total number of observations.</p>
      <p id="d2e1360">RMSE is the square root of the average of the squared differences between the predicted values and the actual values, as shown in Eq. (5):

            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M97" display="block"><mml:mrow><mml:mtext>RMSE</mml:mtext><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents the actual observed values, <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denotes the predicted values and <inline-formula><mml:math id="M100" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the total number of observations. This metric provides insight into the model's accuracy by giving a higher weight to larger errors, making it sensitive to outliers.</p>
      <p id="d2e1447">To ensure data integrity, a thorough check for missing values was conducted on the original observational data prior to its input into the machine learning models. Subsequently, these missing values were imputed using the RF model, and validation was performed using the available observational data. The resulting dataset was then transformed into a supervised learning dataset with time-dependent features. To guarantee the accuracy of the experiments, 70 % of the dataset was consistently used as the training set and 30 % as the test set for each ML model. Hyperparameter tuning was performed utilizing the Bayesian optimization technique. Furthermore, to mitigate the influence of model randomness on the test set results, all ML models underwent 10-fold cross-validation experiments.</p>
      <p id="d2e1450">In this study, the ML models and the SHAP algorithm were primarily implemented using Python 3.6 and Anaconda 4.5 platforms. The research methodology framework used in this study is shown in Fig. 2.</p>

      <fig id="F2"><label>Figure 2</label><caption><p id="d2e1455">Schematic workflow of the integrated ML–PMF framework for ozone source attribution (see Sect. 2.3 for methodological details).</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026-f02.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results and discussions</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Spatiotemporal characterization of ozone and VOC concentrations: a multi-scale analysis for enhanced ML performance</title>
      <p id="d2e1480">A comprehensive understanding of the spatiotemporal characteristics of the input dataset is fundamental for optimizing machine learning model interpretability and performance. Our analysis of VOC and O<sub>3</sub> concentrations across multiple monitoring sites from 2021 to 2023 reveals distinct patterns that inform our ML-based source attribution approach. The temporal evolution of O<sub>3</sub> concentrations shows a notable progression, with mean values of 64.45, 68.79 and 68.90 <inline-formula><mml:math id="M103" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> recorded in 2021, 2022 and 2023, respectively (Fig. 3a). The relatively lower O<sub>3</sub> levels observed in 2021 coincide with the COVID-19 pandemic period, when reduced anthropogenic activities, particularly decreased vehicular traffic and industrial operations, significantly altered urban emission patterns in Shanghai (Lu et al., 2023). A marked reduction in VOC concentrations was observed across all monitoring sites in 2022, primarily attributed to the implementation of stringent emission control measures by local regulatory authorities (Xiao et al., 2024). The seasonal analysis reveals distinctive O<sub>3</sub> formation patterns in the study area (Fig. 3b). O<sub>3</sub> concentrations exhibit a rapid acceleration during spring, reaching maximum levels in May, followed by a notable decline during the summer months (June–August). This pattern differs from typical urban environments, as Shanghai's unique high-pressure meteorological conditions drive peak daytime O<sub>3</sub> levels in late May with unprecedented rates of increase (Chang et al., 2021). Notably, elevated VOC concentrations coincide with the June industrial maintenance period, providing critical temporal markers for our ML-based source attribution.</p>
      <p id="d2e1557">The diurnal profile analysis (Fig. 3c) reveals sophisticated photochemical patterns that enhance our ML model's temporal resolution. O<sub>3</sub> concentrations follow a characteristic curve, initiating a rapid increase at 08:00 (UTC<inline-formula><mml:math id="M109" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula>8), sustaining growth until reaching peak levels at 15:00, followed by a gradual decline. This pattern aligns with the intensification of photochemical reactions driven by increasing solar radiation. The observed dual-peak pattern in VOC concentrations (06:00–08:00 and 19:00–22:00) corresponds to industrial operational cycles, providing essential precursor availability for daytime O<sub>3</sub> formation. This rich temporal diversity in our dataset significantly enhances the ML model's capability to capture complex source–receptor relationships.</p>
      <p id="d2e1585">The correlation analysis provides crucial insights for optimizing our ML framework's feature selection and interpretation capabilities (Fig. 3d). The heterogeneous correlations between VOCs and O<sub>3</sub> across monitoring sites reveal complex source–receptor relationships: while most sites exhibit negative correlations, Sites D and K show positive associations. Notably, Sites C (<inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.12</mml:mn></mml:mrow></mml:math></inline-formula>) and L (<inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.081</mml:mn></mml:mrow></mml:math></inline-formula>) demonstrate the strongest negative correlations, marking them as critical locations for detailed feature importance analysis in our ML framework. These diverse correlation patterns enhance our model's ability to capture local-scale emission–concentration relationships, crucial for precise source attribution.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e1628">Temporal and spatial dynamics of VOCs and O<sub>3</sub>, along with their correlations, as observed at 12 monitoring stations. <bold>(a)</bold> Mean concentration variations of O<sub>3</sub> and VOCs from 2021 to 2023. <bold>(b)</bold> Monthly profiles of O<sub>3</sub> and VOC concentrations. <bold>(c)</bold> Daily profiles of O<sub>3</sub> and VOC concentrations. <bold>(d)</bold> The correlation matrix between O<sub>3</sub> and VOC measurements at the 12 monitoring sites.</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026-f03.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Comparison of ML models</title>
<sec id="Ch1.S3.SS2.SSS1">
  <label>3.2.1</label><title>Model performance evaluation and selection of optimal ML algorithm</title>
      <p id="d2e1710">The selection and evaluation of appropriate machine learning algorithms are fundamental to ensuring robust and reliable analytical outcomes, particularly when dealing with complex environmental datasets (Liu et al., 2022). In this investigation, we implemented and systematically evaluated six state-of-the-art machine learning algorithms: DT, RF, SVM, XGBoost, CatBoost and LightGBM. These models were rigorously trained and tested using spatially distributed VOC and O<sub>3</sub> monitoring data from multiple sampling sites.</p>
      <p id="d2e1722">The comparative performance metrics of these models are comprehensively presented in Table 1. Notably, <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values obtained through 10-fold cross-validation demonstrated remarkable consistency with those derived from the independent test dataset. This concordance strongly indicates the robust generalization capabilities of our machine learning framework, suggesting effective pattern recognition within the data while avoiding overfitting issues. Among the evaluated algorithms, the XGBoost model demonstrated superior predictive performance across all assessment metrics. Specifically, it achieved the lowest MAE of 13.828, mean absolute percentage error (MAPE) of 0.445 and RMSE of 11.620, coupled with the highest <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> value of 0.637.</p>
      <p id="d2e1747">Further validation through scatter plot analysis (Fig. S2) revealed that the XGBoost model exhibited significantly smaller deviations between predicted and observed values compared to other algorithms, confirming its enhanced predictive accuracy. Based on these comprehensive evaluation results, the XGBoost model was selected as the optimal algorithm for O<sub>3</sub> concentration prediction. Consequently, the subsequent SHAP analysis was conducted exclusively on the XGBoost model to ensure the highest level of interpretability and reliability of source attribution results.</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e1763">Comparison of six ML models based on different evaluation metrics.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model</oasis:entry>
         <oasis:entry colname="col2">10-fold cross-validation (<inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">MAE</oasis:entry>
         <oasis:entry colname="col5">MAPE</oasis:entry>
         <oasis:entry colname="col6">RMSE</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">DT</oasis:entry>
         <oasis:entry colname="col2">0.167</oasis:entry>
         <oasis:entry colname="col3">0.187</oasis:entry>
         <oasis:entry colname="col4">23.142</oasis:entry>
         <oasis:entry colname="col5">0.589</oasis:entry>
         <oasis:entry colname="col6">29.980</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RF</oasis:entry>
         <oasis:entry colname="col2">0.345</oasis:entry>
         <oasis:entry colname="col3">0.383</oasis:entry>
         <oasis:entry colname="col4">21.579</oasis:entry>
         <oasis:entry colname="col5">2.918</oasis:entry>
         <oasis:entry colname="col6">28.155</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">SVM</oasis:entry>
         <oasis:entry colname="col2">0.330</oasis:entry>
         <oasis:entry colname="col3">0.328</oasis:entry>
         <oasis:entry colname="col4">25.359</oasis:entry>
         <oasis:entry colname="col5">0.618</oasis:entry>
         <oasis:entry colname="col6">34.309</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">XGBoost</oasis:entry>
         <oasis:entry colname="col2">0.637</oasis:entry>
         <oasis:entry colname="col3">0.644</oasis:entry>
         <oasis:entry colname="col4">13.828</oasis:entry>
         <oasis:entry colname="col5">0.445</oasis:entry>
         <oasis:entry colname="col6">11.620</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CatBoost</oasis:entry>
         <oasis:entry colname="col2">0.689</oasis:entry>
         <oasis:entry colname="col3">0.630</oasis:entry>
         <oasis:entry colname="col4">15.601</oasis:entry>
         <oasis:entry colname="col5">0.495</oasis:entry>
         <oasis:entry colname="col6">17.216</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LightGBM</oasis:entry>
         <oasis:entry colname="col2">0.675</oasis:entry>
         <oasis:entry colname="col3">0.624</oasis:entry>
         <oasis:entry colname="col4">14.603</oasis:entry>
         <oasis:entry colname="col5">0.493</oasis:entry>
         <oasis:entry colname="col6">17.352</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e1766">MAE: mean absolute error, MAPE: mean absolute percentage error, RMSE: root mean squared error, DT: decision tree, RF: Random Forest, SVM: Support Vector Machine, XGBoost: Extreme Gradient Boosting, CatBoost: Category Boosting, LightGBM: Light Gradient Boosting Machine</p></table-wrap-foot></table-wrap>

</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <label>3.2.2</label><title>Spatial distribution analysis of TVOC contributions using SHAP interpretation</title>
      <p id="d2e1976">To quantitatively assess the spatial heterogeneity of TVOC contributions to O<sub>3</sub> formation, we conducted a comprehensive SHAP analysis across all monitoring sites within the established XGBoost regression model (Fig. 4). The analysis revealed distinct patterns of TVOC influence, with Site C demonstrating notably dispersed sample distribution patterns, indicating its predominant influence on O<sub>3</sub> formation dynamics. The analysis identified a significant negative correlation between TVOC concentrations at Site C and model predictions, evidenced by the high-magnitude SHAP values concentrated in the negative region. This SHAP-negative TVOC signal reflects the model's recognition of Site C's unique emission mix, where VOC-rich plumes initially suppress ozone via NO titration under proximate NO<sub><italic>x</italic></sub>-rich conditions but subsequently fuel downwind ozone formation as air masses age and transition to VOC-limited chemical regimes. This duality aligns with observational studies in petrochemical clusters, where proximal VOC hotspots exhibit transient O<sub>3</sub> suppression due to localized precursor interactions while driving net regional ozone increases through secondary photochemical pathways (Guo et al., 2022; Ren et al., 2024). Conversely, Site K exhibited a strong positive correlation, characterized by elevated SHAP values consistently distributed in the positive region, corroborating the correlation analysis findings presented in Sect. 3.1.</p>
      <p id="d2e2015">The pronounced influence of Site C can be attributed to its strategic location within the southeastern sector of the Jinshan Chemical Zone, where it is surrounded by diverse industrial activities. The site's immediate vicinity encompasses plastic manufacturing facilities to the east, a petroleum products transportation port to the south, chemical production facilities to the west and a public transportation hub featuring five gas stations to the north. This complex industrial landscape surrounding Site C facilitates intensive O<sub>3</sub> formation while presenting significant challenges for precise source attribution of TVOC emissions.</p>
      <p id="d2e2027">To validate the robustness of these findings, we extended the SHAP analysis to encompass five additional ML models (Fig. S3). The results consistently identified Site C as the most influential monitoring location across all model analyses, substantiating its critical role in local O<sub>3</sub> formation processes. This convergence of results across multiple ML platforms reinforces the reliability of our spatial analysis and highlights the importance of Site C in understanding regional O<sub>3</sub> pollution patterns. Importantly, SHAP values here reflect the relative importance of each site's TVOC concentrations to XGBoost-predicted O<sub>3</sub> – not direct source contributions. This approach identifies locations where VOC variations most strongly perturb O<sub>3</sub> predictions (e.g., Site C's dominant role), guiding subsequent PMF-based source apportionment.</p>
      <p id="d2e2066">In addition, to ensure robustness of feature importance interpretations, we employed three complementary attribution schemes beyond SHAP: (1) SAGE (Shapley Additive Global importance) for global feature relevance; (2) Gini importance for intrinsic tree-based rankings and (3) permutation importance evaluating prediction degradation under feature shuffling. Figure S6 compares normalized importance scores across SHAP, SAGE, Gini and permutation methods for the sites. For Site C, SHAP, SAGE and permutation importance uniformly assign it the highest score (normalized score <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:mo>≈</mml:mo><mml:mn mathvariant="normal">1.0</mml:mn></mml:mrow></mml:math></inline-formula>), confirming its dominance in prediction-sensitivity-based frameworks. In contrast, Gini importance ranks Site C third (normalized score <inline-formula><mml:math id="M135" display="inline"><mml:mo>≈</mml:mo></mml:math></inline-formula> 0.6). To further quantify the consistency of feature importance rankings across methodologies, we computed pairwise Pearson correlation coefficients between SHAP, SAGE, Gini and permutation importance scores (Fig. S7). The correlation matrix reveals near-perfect agreement between SHAP, SAGE and permutation importance (Pearson <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.98</mml:mn></mml:mrow></mml:math></inline-formula> for all inter-method pairs), confirming these techniques capture overlapping dimensions of feature relevance tied to global prediction sensitivity. In contrast, Gini importance exhibits weak correlation with the other three schemes (<inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">0.12</mml:mn></mml:mrow></mml:math></inline-formula>). This divergence arises because Gini importance prioritizes local decision-tree split purity (e.g., maximizing variance reduction at individual nodes), which can overemphasize features with high within-tree variability but low global relevance to ozone photochemistry. Conversely, SHAP, SAGE and permutation – grounded in global prediction sensitivity – robustly identify Site C as the primary driver of O<sub>3</sub> formation, as their concordance (<inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.98</mml:mn></mml:mrow></mml:math></inline-formula>) reflects shared sensitivity to chemically meaningful VOC–O<sub>3</sub> interactions. Thus, the methodological triangulation of SHAP, SAGE and permutation – coupled with Gini's demonstrated insensitivity to global photochemical dynamics – unequivocally confirms Site C's preeminent role in regional ozone formation.</p>

      <fig id="F4"><label>Figure 4</label><caption><p id="d2e2144">Feature importance of total volatile organic compound (TVOC) drivers obtained by XGBoost model. Blue points indicate negative contributions to the prediction, while red points represent positive contributions.</p></caption>
            <graphic xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026-f04.png"/>

          </fig>


</sec>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Source apportionment and temporal–spatial characterization of VOCs based on the PMF model</title>
<sec id="Ch1.S3.SS3.SSS1">
  <label>3.3.1</label><title>Analytical framework and results of the PMF model</title>
      <p id="d2e2171">Inter-species correlation analysis revealed distinct clustering patterns among VOCs (Fig. S4). Strong positive correlations (<inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn></mml:mrow></mml:math></inline-formula>) were observed among alkane homologues such as ethane (C<sub>2</sub>), propane (C<sub>3</sub>), <inline-formula><mml:math id="M144" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>-butane (C<sub>4</sub>) and isopentane (C<sub>5</sub>) (<inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.71</mml:mn></mml:mrow></mml:math></inline-formula>–0.92), indicative of co-emission from fuel evaporation and petrochemical activities (Yang et al., 2024). Strong positive correlations were observed between propylene and 1,3-butadiene (<inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.82</mml:mn></mml:mrow></mml:math></inline-formula>), indicative of shared emission pathways from petrochemical cracking processes (Zhou et al., 2021; White, 2007). In addition, aromatic compounds including toluene, <inline-formula><mml:math id="M149" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>-/<inline-formula><mml:math id="M150" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>-xylene and ethylbenzene formed a highly correlated group (<inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.75</mml:mn></mml:mrow></mml:math></inline-formula>–0.95), frequently used in solvent applications and industrial coatings (Weiss, 1997). Chlorinated VOCs such as 1,2-dichloroethane and monochlorobenzene exhibit moderate correlations (<inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.44</mml:mn></mml:mrow></mml:math></inline-formula>), suggesting mixed contributions from polymer manufacturing and solvent use (Huang et al., 2014). Conversely, propane displayed weak correlations with aromatic species (<inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn></mml:mrow></mml:math></inline-formula>), aligning with its dominance in combustion sources. The observed clustering can provide a basis for PMF-derived source profiles, as the species of interest are predominantly aligned with common emission processes.</p>
      <p id="d2e2305">The application of the PMF model for the analysis of VOCs at Site C offers valuable insights into source apportionment and pollution characteristics. The species selection process adhered to specific criteria (Liu et al., 2016; Hui et al., 2019): (1) species with missing sample rates exceeding 25 % or concentrations below 35 % of the method detection limits (MDLs) were excluded, and (2) highly reactive compounds were omitted unless serving as specific tracers for particular sources. Ultimately, 36 VOC species were selected for model input. Based on their signal-to-noise ratio (<inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mo>/</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>) and model fit, species were classified as “strong”, “weak” or “bad” in terms of their computational significance. Specifically, species with <inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mo>/</mml:mo><mml:mi>N</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn></mml:mrow></mml:math></inline-formula> were labeled as “bad”, those with <inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.2</mml:mn><mml:mo>&lt;</mml:mo><mml:mi>S</mml:mi><mml:mo>/</mml:mo><mml:mi>N</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">0.6</mml:mn></mml:mrow></mml:math></inline-formula> or poor fit as “weak”, and <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mo>/</mml:mo><mml:mi>N</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.6</mml:mn></mml:mrow></mml:math></inline-formula> as “strong”. In total, 30 species were deemed “strong”, 2 “weak” and 4 “bad”. The “bad” species were excluded from model calculations due to their concentration uncertainty, while the uncertainty for “weak” species was tripled to reduce their influence. Ultimately, 32 pollutants were included in the model.</p>
      <p id="d2e2372">To ascertain the optimal number of sources, 20 iterations with varying random seeds were conducted to evaluate the stability of the <inline-formula><mml:math id="M158" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> value across solutions ranging from 3 to 10 factors. A 6-factor solution was selected as the rate of <inline-formula><mml:math id="M159" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> value reduction significantly diminished beyond this point, and the model results remained interpretable. The <inline-formula><mml:math id="M160" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> (true) <inline-formula><mml:math id="M161" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula> <inline-formula><mml:math id="M162" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> (robust) value of 1.23 fell within the acceptable range of <inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">1.5</mml:mn></mml:mrow></mml:math></inline-formula> (Hui et al., 2020), while the <inline-formula><mml:math id="M164" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> (robust) <inline-formula><mml:math id="M165" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula> <inline-formula><mml:math id="M166" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> (theoretical) value of 1.1 indicated proximity to 1. The standardized residuals for each factor ranged from <inline-formula><mml:math id="M167" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> to 3, demonstrating a satisfactory fit between the model predictions and observed data. The six sources identified using the PMF model are as follows: petrochemical process (PP), fuel evaporation (FE), combustion source (CS), solvent use (SU), polymer fabrication (PF) and vehicle emission (VE). The source profiles are illustrated in Fig. 5.</p>
      <p id="d2e2452">Factor 1 exhibited the highest concentration of propylene (60.60 %), suggesting its predominant source is local petrochemical processes. This finding aligns with previous studies that link propylene emissions to petrochemical activities in industrial zones (Washenfelder et al., 2010; Ragothaman and Anderson, 2017). Therefore, Factor 1 was determined to be PP. Factor 2 was characterized mainly by C<sub>2</sub>–C<sub>5</sub> alkanes (47 %) and toluene (13.68 %); they are typical components of gasoline and diesel (Mu et al., 2023). Thus, Factor 2 is defined as a FE.</p>
      <p id="d2e2474">Factor 3 had a high percentage of propane (69.36 %), ethene (12.17 %) and ethane (5.12 %), which conformed to the emission characteristics of CS (Song et al., 2021; Chen et al., 2024). The relative contribution rate of aromatic hydrocarbons in Factor 4 reached 69.12 %. Currently, aromatic hydrocarbons are widely used as solvents in industrial production (Zhang et al., 2021; Mukhamatdinov et al., 2020). Factor 4 was defined as SU.</p>
      <p id="d2e2477">Factor 5 was primarily characterized by ethene (92.59 %), with its concentration far exceeding that of other species. It indicated that its source was closely related to the polymer manufacturing processes commonly found in nearby production facilities. Previous studies had identified ethene as a major pollutant associated with such industrial activities (Burdett and Eisinger, 2017). Therefore, Factor 5 was identified as originating from PF. Factor 6 was characterized by relatively high proportions of C<sub>4</sub>–C<sub>6</sub> alkanes (37.63 %) and 1,2-dichloroethane (12.01 %), which are important indicators of VE (Song et al., 2021; Chen et al., 2024).</p>
      <p id="d2e2498">The PMF analysis revealed distinct temporal patterns in source contributions to VOC emissions throughout the study period (Fig. 6). On an annual basis, CS and FE emerged as the primary contributors, accounting for 16.94 % and 16.89 % of total VOC emissions, respectively. The remaining sources exhibited comparable contributions, SU (16.56 %), PF (16.54 %), VE (16.54 %) and PP (16.52 %), indicating a relatively balanced distribution of emission sources in the industrial park.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2503">Source profiles calculated using the PMF model in site C.</p></caption>
            <graphic xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026-f05.png"/>

          </fig>

</sec>
<sec id="Ch1.S3.SS3.SSS2">
  <label>3.3.2</label><title>Seasonal variation and contributing factors of VOC source distribution</title>
      <p id="d2e2520">Seasonal analysis unveiled significant temporal variations in source contributions (Fig. 6). During spring, CS dominated the VOC emissions with a 22.39 % contribution, followed by PP (19.59 %), PF (16.07 %) and VE (14.76 %), while FE showed a relatively lower contribution of 12.83 %. This pattern shifted notably in summer, where FE became the predominant source (21.68 %), accompanied by substantial contributions from SU (18.21 %) and VE (17.59 %). The autumn period witnessed PF emerging as the primary contributor (22.47 %), with CS maintaining a significant presence (16.95 %). Winter emissions were primarily attributed to VE (18.64 %), SU (18.61 %) and FE (18.02 %).</p>
      <p id="d2e2523">The observed seasonal variations align with regional industrial activities and meteorological conditions. The pronounced CS contributions during spring coincide with increased biomass burning activities across industrial parks in China, corroborating findings from previous studies (Yang et al., 2023; Yao et al., 2021; Chen et al., 2022). The elevated FE contributions in summer can be attributed to enhanced fuel volatilization under high temperatures characteristic of the Yangtze River Delta region, subsequently promoting O<sub>3</sub> formation (Xu et al., 2023). The autumn dominance of PF emissions reflects the operational patterns of polymer manufacturing facilities, while the significant winter contribution from VE aligns with previously documented patterns of vehicular emissions in Shanghai's urban areas (Liu et al., 2021; Wang et al., 2022).</p>

      <fig id="F6"><label>Figure 6</label><caption><p id="d2e2537">Influence of various sources on atmospheric VOCs across different seasons.</p></caption>
            <graphic xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026-f06.png"/>

          </fig>

</sec>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Quantitative assessment of source-specific contributions to O<sub>3</sub> formation</title>
      <p id="d2e2565">The integrated PMF-SHAP framework was employed to quantitatively evaluate the source-specific contributions to O<sub>3</sub> formation in the Jinshan industrial complex during 2021–2023. Note that SHAP-derived source contributions are derived from the PMF-resolved source profiles (<inline-formula><mml:math id="M175" display="inline"><mml:mi mathvariant="bold">g</mml:mi></mml:math></inline-formula> matrix), not site-level TVOC data. This integrated approach directly links emission sources to O<sub>3</sub> impacts. Through the coupling of XGBoost-SHAP modeling with corresponding SHAP values, we systematically assessed the relative importance of various emission sources in driving O<sub>3</sub> pollution dynamics (Fig. 7). The analysis revealed that SU and FE were the predominant contributors to O<sub>3</sub> formation, exhibiting SHAP values of 3.00 and 2.77, respectively. This finding underscores the critical role of solvent usage and fuel-related emissions in industrial processes as primary drivers of O<sub>3</sub> generation. CS demonstrated moderate influence with a SHAP value of 2.19, followed by VE (1.92) and PP (0.55), while PF exhibited the lowest impact (0.16). These results emphasize the particular significance of SU and FE in O<sub>3</sub> pollution control strategies within industrial contexts.</p>
      <p id="d2e2630">To quantify the contribution of VOC sources identified by PMF to O<sub>3</sub> formation, we integrated the mass contributions of VOCs from specific sources with their respective maximum incremental reactivity (MIR) values (Carter, 2010; Venecek et al., 2018). Subsequently, we employed ozone formation potential (OFP) to assess the relative contributions of different VOCs to O<sub>3</sub> generation. The OFP quantification of PMF-resolved sources (Fig. S5) aligns robustly with SHAP-derived source prioritization, validating the scientific coherence of the PMF framework. SU exhibits the highest OFP contribution (32.20 <inline-formula><mml:math id="M183" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), driven by its dominant aromatic constituents (<inline-formula><mml:math id="M184" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>-/<inline-formula><mml:math id="M185" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>-xylene: 14.55 %, MIR <inline-formula><mml:math id="M186" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 5.47 gO<sub>3</sub>/gVOC; ethyl benzene: 10.57 %, MIR <inline-formula><mml:math id="M188" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 3.11 gO<sub>3</sub>/gVOC; <inline-formula><mml:math id="M190" display="inline"><mml:mi>o</mml:mi></mml:math></inline-formula>-xylene: 9.28 %, MIR <inline-formula><mml:math id="M191" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 7.17 gO<sub>3</sub>/gVOC) whose combined reactivity (MIR-weighted OFP <inline-formula><mml:math id="M193" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 14.85 <inline-formula><mml:math id="M194" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) amplifies its ozone-driving potential despite moderate mass abundance (34.40 % of total VOCs). In contrast, FE (26.3 <inline-formula><mml:math id="M195" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) demonstrated a higher OFP, driven by the substantial mass proportion of C<sub>2</sub>–C<sub>5</sub> hydrocarbons (68.15 %) and their high mean concentration (11.81 <inline-formula><mml:math id="M198" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>). The SHAP value (3.00) confirmed its disproportionate influence on O<sub>3</sub> formation relative to its mass contribution. Additionally, VE (10.10 <inline-formula><mml:math id="M200" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) exhibited the lowest OFP, primarily due to the low reactivity of 1,2-dichloroethane (12.50 % by mass, MIR <inline-formula><mml:math id="M201" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.23). The SHAP value (1.92) reflected its limited photochemical impact. To contextualize the ML-derived findings within established regional photochemistry, we explicitly reconcile our SHAP-based source rankings (SU <inline-formula><mml:math id="M202" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> FE <inline-formula><mml:math id="M203" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> CS) with empirical kinetic modeling approach (EKMA) studies specific to the Shanghai. The dominance of solvent use (SU) and fuel evaporation (FE) sources aligns with EKMA analyses demonstrating VOC-limited regimes in Shanghai's industrial corridors (Zhang et al., 2024), where reactive aromatics (<inline-formula><mml:math id="M204" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>-/<inline-formula><mml:math id="M205" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>-xylene) and short-chain alkenes (propylene, 1,3-butadiene) drive <inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">70</mml:mn></mml:mrow></mml:math></inline-formula> % of incremental reactivity.</p>
      <p id="d2e2898">The seasonal decomposition of source contributions through SHAP analysis demonstrated strong concordance with PMF-derived temporal patterns (Fig. 8). CS emerged as the dominant contributor during spring (30.51 %), while summer was characterized by substantial contributions from FE (27.25 %) and SU (24.48 %). The autumn period was dominated by PF emissions (27.83 %), whereas SU showed peak influence during winter (31.23 %). This temporal heterogeneity in source contributions, particularly the significant TVOC emissions associated with Site C, provides crucial insights for targeted O<sub>3</sub> management strategies.</p>
      <p id="d2e2911">Contrary to conventional CTM-based studies emphasizing combustion sources (CS) as primary O<sub>3</sub> drivers, our ML–PMF–SHAP integration identifies solvent use (SU, SHAP <inline-formula><mml:math id="M209" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 3.00) and fuel evaporation (FE, SHAP <inline-formula><mml:math id="M210" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 2.77) as dominant contributors. This shift highlights industrial process-specific emissions (e.g., aromatic solvents, light alkanes) as critical O<sub>3</sub> precursors in petrochemical zones, challenging broad regional assumptions. Site C exhibits a unique negative SHAP-O<sub>3</sub> correlation despite high VOC loads. We attribute this to rapid NO titration from proximate NO<sub><italic>x</italic></sub>-rich plumes (e.g., gas stations, port activities), suppressing local O<sub>3</sub> while fueling downwind formation (<inline-formula><mml:math id="M215" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> km) as air masses age into VOC-limited regimes – a micro-scale dynamic unresolvable by traditional PMF-OFP methods. SHAP quantifies disproportionate O<sub>3</sub> impacts unconstrained by VOC mass. For example, SU contributes only 34.40 % of total VOCs but drives 32.20 <inline-formula><mml:math id="M217" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> OFP due to high-reactivity aromatics (e.g., <inline-formula><mml:math id="M218" display="inline"><mml:mi>o</mml:mi></mml:math></inline-formula>-xylene, MIR <inline-formula><mml:math id="M219" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 7.17 gO<sub>3</sub> <inline-formula><mml:math id="M221" display="inline"><mml:mo>/</mml:mo></mml:math></inline-formula> gVOC), whereas FE's higher mass (68.15 % alkanes) yields lower OFP (26.3 <inline-formula><mml:math id="M222" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">µ</mml:mi><mml:mi mathvariant="normal">g</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) but amplified SHAP influence via spatial persistence. These insights establish a paradigm for facility-scale O<sub>3</sub> control, prioritizing SU/FE reductions over combustion sources in coastal industrial clusters. These findings have significant implications for policy development and implementation. The results suggest that regulatory attention should prioritize emission controls around Site C, with particular emphasis on seasonal variation in source contributions. Implementation of season-specific control strategies, especially targeting predominant sources during high-O<sub>3</sub> periods, would optimize the effectiveness of O<sub>3</sub> pollution mitigation efforts in industrial areas.</p>

      <fig id="F7"><label>Figure 7</label><caption><p id="d2e3092">Mean SHAP values for six pollution sources affecting ozone levels.</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026-f07.png"/>

        </fig>

      <fig id="F8"><label>Figure 8</label><caption><p id="d2e3103">Proportional contributions of six pollution sources to ozone levels, represented by SHAP values across different seasons.</p></caption>
          <graphic xlink:href="https://acp.copernicus.org/articles/26/6117/2026/acp-26-6117-2026-f08.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <label>4</label><title>Conclusions</title>
      <p id="d2e3121">This study presents a novel methodological framework for quantifying VOC contributions to O<sub>3</sub> formation in industrial park environments, integrating advanced ML techniques with traditional source apportionment methods. Through the synergistic combination of ML algorithms, SHAP interpretation and PMF modeling, we have developed a robust analytical approach that provides unprecedented spatial and temporal resolution in source identification and contribution assessment.</p>
      <p id="d2e3133">The investigation revealed distinct patterns in VOC–O<sub>3</sub> relationships, with solvent utilization and fuel evaporation emerging as primary drivers of O<sub>3</sub> formation in the industrial complex. The XGBoost model demonstrated superior predictive performance (<inline-formula><mml:math id="M229" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.644</mml:mn></mml:mrow></mml:math></inline-formula>) among the evaluated ML algorithms, while SHAP analysis enabled precise quantification of source-specific contributions. The PMF analysis further delineated six distinct emission sources, exhibiting pronounced seasonal variations in their relative contributions to O<sub>3</sub> formation.</p>
      <p id="d2e3178">Notably, combustion sources dominated spring emissions (30.51 %), while fuel evaporation (27.25 %) and solvent use (24.48 %) were predominant during summer months. This temporal heterogeneity in source contributions underscores the necessity for season-specific control strategies tailored to industrial operational patterns and meteorological conditions. The significant influence of Site C, characterized by diverse industrial activities, highlights the importance of targeted emission controls in areas with complex source profiles.</p>
      <p id="d2e3181">These findings provide crucial insights for evidence-based policy development in industrial air quality management. The methodology established herein offers a powerful tool for rapid source identification and precise contribution quantification, enabling the implementation of targeted control strategies at facility-level resolution. While our integrated PMF-ML-SHAP framework advances high-resolution source attribution for industrial O<sub>3</sub> formation, three key limitations warrant consideration: the 1 km <inline-formula><mml:math id="M232" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 1 km spatial resolution, though unprecedented for facility-scale analysis, may not resolve sub-facility emission hotspots (e.g., individual storage tanks or pipeline leaks), necessitating future integration of drone-based hyperspectral sensors or stack-level monitors for hyperlocal validation. Furthermore, reliance on hourly GC-FID measurements could underestimate fast-reacting alkenes (e.g., propylene, isoprene) with atmospheric lifetimes <inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> h; real-time proton-transfer-reaction mass spectrometry (PTR-MS) would enhance temporal resolution for such compounds. Lastly, while optimized for Shanghai's coastal petrochemical environment – characterized by high humidity (<inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">75</mml:mn></mml:mrow></mml:math></inline-formula> %), sea–land breezes (3–5 m s<sup>−1</sup>) and reactive aromatic–alkene mixtures – the framework's efficacy requires validation in inland/arid industrial regions with distinct meteorology (e.g., lower humidity, weaker advection) and emission profiles (e.g., solvent-dominated manufacturing clusters). These constraints highlight inherent trade-offs between resolution and practicality but do not invalidate the framework's utility for targeted O<sub>3</sub> management in complex industrial zones. Future research directions should focus on expanding the temporal and spatial coverage of monitoring networks and exploring the application of this framework across diverse industrial settings to enhance its generalizability and predictive capabilities.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title>Acronym glossary</title>
      <p id="d2e3254"><table-wrap position="anchor"><oasis:table><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="6cm"/>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">CatBoost</oasis:entry>
         <oasis:entry colname="col2">Category Boosting</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CS</oasis:entry>
         <oasis:entry colname="col2">combustion sources</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CSEM</oasis:entry>
         <oasis:entry colname="col2">cross-stacked ensemble learning model</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">CTMs</oasis:entry>
         <oasis:entry colname="col2">chemical transport models</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">DTR</oasis:entry>
         <oasis:entry colname="col2">Decision Tree Regression</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">EF</oasis:entry>
         <oasis:entry colname="col2">error score</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FE</oasis:entry>
         <oasis:entry colname="col2">fuel evaporation</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">GC-FID</oasis:entry>
         <oasis:entry colname="col2">gas chromatography–flame ionization detector</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">GC-FID/MS</oasis:entry>
         <oasis:entry colname="col2">gas chromatography–flame ionization detector and mass spectrometry</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">LightGBM</oasis:entry>
         <oasis:entry colname="col2">Light Gradient Boosting Machine</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MAE</oasis:entry>
         <oasis:entry colname="col2">mean absolute error</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MDL</oasis:entry>
         <oasis:entry colname="col2">minimum detection limit</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MDLs</oasis:entry>
         <oasis:entry colname="col2">method detection limits</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MIR</oasis:entry>
         <oasis:entry colname="col2">maximum incremental reactivity</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">ML</oasis:entry>
         <oasis:entry colname="col2">machine learning</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">NO<sub><italic>x</italic></sub></oasis:entry>
         <oasis:entry colname="col2">nitrogen oxides</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">O<sub>3</sub></oasis:entry>
         <oasis:entry colname="col2">ozone</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">OFP</oasis:entry>
         <oasis:entry colname="col2">ozone formation potential</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">PF</oasis:entry>
         <oasis:entry colname="col2">polymer fabrication</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">PMF</oasis:entry>
         <oasis:entry colname="col2">positive matrix factorization</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">PP</oasis:entry>
         <oasis:entry colname="col2">petrochemical processes</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RF</oasis:entry>
         <oasis:entry colname="col2">Random Forest</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RMSE</oasis:entry>
         <oasis:entry colname="col2">root mean squared error</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">SHAP</oasis:entry>
         <oasis:entry colname="col2">SHapley Additive exPlanations</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">SU</oasis:entry>
         <oasis:entry colname="col2">solvent use</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">SVM</oasis:entry>
         <oasis:entry colname="col2">Support Vector Machine</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">SVR</oasis:entry>
         <oasis:entry colname="col2">Support Vector Regression</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">TVOCs</oasis:entry>
         <oasis:entry colname="col2">total volatile organic compounds</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">VE</oasis:entry>
         <oasis:entry colname="col2">vehicle emissions</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">VOCs</oasis:entry>
         <oasis:entry colname="col2">volatile organic compounds</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">XGBoost</oasis:entry>
         <oasis:entry colname="col2">Extreme Gradient Boosting</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap></p>
</app>
  </app-group><notes notes-type="codeavailability"><title>Code availability</title>

      <p id="d2e3571">Code related to this paper may be requested from the authors.</p>
  </notes><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d2e3577">The observational data from Jinshan District, Shanghai, from 2021 to 2023, are confidential. Hourly O<sub>3</sub> concentration data utilized in this study were obtained from the Atmospheric Environmental Monitoring Routine Data Management System of the Shanghai Environmental Monitoring Station: <ext-link xlink:href="https://github.com/bobooob/Multi-machine-learning-approaches-to-modeling-small-scale-source-attribution-of-ozone-formation">https://github.com/bobooob/Multi-machine-learning-approaches</ext-link> (last access: 29 April 2026). The concentration data for VOCs were sourced from the 12 monitoring stations and can be accessed through the Intelligent Analysis System for VOC Emissions and Pollution Source Tracing in Key Industrial Parks of Shanghai: <ext-link xlink:href="https://github.com/bobooob/Multi-machine-learning-approaches-to-modeling-small-scale-source-attribution-of-ozone-formation">https://github.com/bobooob/Multi-machine-learning-approaches</ext-link> (last access: 29 April 2026).</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e3596">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/acp-26-6117-2026-supplement" xlink:title="pdf">https://doi.org/10.5194/acp-26-6117-2026-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e3605">ZX, GX and YL conceived and supervised the study. ZX analyzed the data. ZX wrote the paper with input from GX and YL. GX reviewed and commented on the paper. All authors contributed to discussing the results and revising the draft.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e3611">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e3617">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e3623">The authors feel very appreciate for the help of observation from Shanghai Environmental Monitoring Center and Shanghai Chemical Industry Park Administration Committee.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e3628">The work is financially supported by grants from the National Ministry of Science and Technology (grant nos. 2022YFC3703501 and 2022YFC3703503) and the Shanghai Jinshan Municipal Bureau of Ecology and Environment (grant no. Huhuanke-2025-10).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e3634">This paper was edited by Rob MacKenzie and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>Baklanov, A. and Korsholm, U.: On-line integrated meteorological and chemical transport modelling: advantages and prospectives, Air Pollution Modeling and Its Application XIX, Springer Netherlands, 3–17, <ext-link xlink:href="https://doi.org/10.1007/978-1-4020-8453-9_1" ext-link-type="DOI">10.1007/978-1-4020-8453-9_1</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Burdett, I. D. and Eisinger, R. S.: Ethylene polymerization processes and manufacture of polyethylene, Handbook of Industrial Polyethylene and Technology: Definitive Guide to Manufacturing, Properties, Processing, Applications and Markets, Wiley, 61–103, <ext-link xlink:href="https://doi.org/10.1002/9781119159797" ext-link-type="DOI">10.1002/9781119159797</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Cao, X., Yi, J., Li, Y., Zhao, M., Duan, Y., Zhang, F., and Duan, L.: Characteristics and Source Apportionment of Volatile Organic Compounds in an Industrial Area at the Zhejiang–Shanghai Boundary, China, Atmosphere, 15, 237, <ext-link xlink:href="https://doi.org/10.3390/atmos15020237" ext-link-type="DOI">10.3390/atmos15020237</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Carter, W. P.: Development of the SAPRC-07 chemical mechanism, Atmos. Environ., 44, 5324–5335, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2010.01.026" ext-link-type="DOI">10.1016/j.atmosenv.2010.01.026</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>Chang, L., He, F., Tie, X., Xu, J., and Gao, W.: Meteorology driving the highest ozone level occurred during mid-spring to early summer in Shanghai, China, Sci. Total Environ., 785, 147253, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2021.147253" ext-link-type="DOI">10.1016/j.scitotenv.2021.147253</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>Chen, D., Zhou, L., Wang, C., Liu, H., Qiu, Y., Shi, G., Song, D., Tan, Q., and Yang, F.: Characteristics of ambient volatile organic compounds during spring O<sub>3</sub> pollution episode in Chengdu, China, J. Environ. Sci., 114, 115–125, <ext-link xlink:href="https://doi.org/10.1016/j.jes.2021.08.014" ext-link-type="DOI">10.1016/j.jes.2021.08.014</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Chen, W., Xu, X., and Liu, W.: Combined PMF modelling and machine learning to identify sources and meteorological influencers of volatile organic compound pollution in an industrial city in eastern China, Atmos. Environ., 334, 120714, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2024.120714" ext-link-type="DOI">10.1016/j.atmosenv.2024.120714</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>Cheng, N., Jing, D., Gu, Z., Cai, X., Shi, Z., Li, S., Chen, L., Li, W., and Wang, Q.: Observation-Based Ozone Formation Rules by Gradient Boosting Decision Trees Model in Typical Chemical Industrial Parks, Atmosphere, 15, 600, <ext-link xlink:href="https://doi.org/10.3390/atmos15050600" ext-link-type="DOI">10.3390/atmos15050600</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>Cheng, Y., Huang, X.-F., Peng, Y., Tang, M.-X., Zhu, B., Xia, S.-Y., and He, L.-Y.: A novel machine learning method for evaluating the impact of emission sources on ozone formation, Environ. Pollut., 316, 120685, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2022.120685" ext-link-type="DOI">10.1016/j.envpol.2022.120685</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Choi, M. S., Qiu, X., Zhang, J., Wang, S., Li, X., Sun, Y., Chen, J., and Ying, Q.: Study of secondary organic aerosol formation from chlorine radical-initiated oxidation of volatile organic compounds in a polluted atmosphere using a 3D chemical transport model, Environ. Sci. Technol., 54, 13409–13418, <ext-link xlink:href="https://doi.org/10.1021/acs.est.0c02958" ext-link-type="DOI">10.1021/acs.est.0c02958</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>Essamlali, I., Nhaila, H., and El Khaili, M.: Supervised Machine Learning Approaches for Predicting Key Pollutants and for the Sustainable Enhancement of Urban Air Quality: A Systematic Review, Sustainability, 16, 976, <ext-link xlink:href="https://doi.org/10.3390/su16030976" ext-link-type="DOI">10.3390/su16030976</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D.: A survey of methods for explaining black box models, ACM. Comput. Surv, 51, 1–42, <ext-link xlink:href="https://doi.org/10.1145/3236009" ext-link-type="DOI">10.1145/3236009</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Guo, W., Yang, Y., Chen, Q., Zhu, Y., Zhang, Y., Zhang, Y., Liu, Y., Li, G., Sun, W., and She, J.: Chemical reactivity of volatile organic compounds and their effects on ozone formation in a petrochemical industrial area of Lanzhou, Western China, Sci. Total Environ., 839, 155901, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2022.155901" ext-link-type="DOI">10.1016/j.scitotenv.2022.155901</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>He, L., Duan, Y., Zhang, Y., Yu, Q., Huo, J., Chen, J., Cui, H., Li, Y., and Ma, W.: Effects of VOC emissions from chemical industrial parks on regional O<sub>3</sub>-PM<sub>2.5</sub> compound pollution in the Yangtze River Delta, Sci. Total Environ., 906, 167503, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2023.167503" ext-link-type="DOI">10.1016/j.scitotenv.2023.167503</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Huang, B., Lei, C., Wei, C., and Zeng, G.: Chlorinated volatile organic compounds (Cl-VOCs) in environment – sources, potential human health impacts, and current remediation technologies, Environ. Int., 71, 118–138, <ext-link xlink:href="https://doi.org/10.1016/j.envint.2014.06.013" ext-link-type="DOI">10.1016/j.envint.2014.06.013</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Hui, L., Liu, X., Tan, Q., Feng, M., An, J., Qu, Y., Zhang, Y., and Cheng, N.: VOC characteristics, sources and contributions to SOA formation during haze events in Wuhan, Central China, Sci. Total Environ., 650, 2624–2639, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2018.10.029" ext-link-type="DOI">10.1016/j.scitotenv.2018.10.029</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Hui, L., Liu, X., Tan, Q., Feng, M., An, J., Qu, Y., Zhang, Y., Deng, Y., Zhai, R., and Wang, Z.: VOC characteristics, chemical reactivity and sources in urban Wuhan, central China, Atmos. Environ., 224, 117340, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2020.117340" ext-link-type="DOI">10.1016/j.atmosenv.2020.117340</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><mixed-citation>Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., and Wortman Vaughan, J.: Interpreting interpretability: understanding data scientists' use of interpretability tools for machine learning, Proceedings of the 2020 CHI conference on human factors in computing systems, 2020, Honolulu, HI, USA, 1–14, <ext-link xlink:href="https://doi.org/10.1145/3313831.3376219" ext-link-type="DOI">10.1145/3313831.3376219</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><mixed-citation>Kim, S.-J., Lee, H.-Y., Lee, S.-J., and Choi, S.-D.: Passive air sampling of VOCs, O<sub>3</sub>, NO<sub>2</sub>, and SO<sub>2</sub> in the large industrial city of Ulsan, South Korea: spatial–temporal variations, source identification, and ozone formation potential, Environ. Sci. Pollut. Res., 30, 125478–125491, <ext-link xlink:href="https://doi.org/10.1007/s11356-023-31109-z" ext-link-type="DOI">10.1007/s11356-023-31109-z</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><mixed-citation>Kuo, C.-P. and Fu, J. S.: Ozone response modeling to NO<sub><italic>x</italic></sub> and VOC emissions: Examining machine learning models, Environ. Int., 176, 107969, <ext-link xlink:href="https://doi.org/10.1016/j.envint.2023.107969" ext-link-type="DOI">10.1016/j.envint.2023.107969</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><mixed-citation>Li, M., Zhang, Q., Streets, D. G., He, K. B., Cheng, Y. F., Emmons, L. K., Huo, H., Kang, S. C., Lu, Z., Shao, M., Su, H., Yu, X., and Zhang, Y.: Mapping Asian anthropogenic emissions of non-methane volatile organic compounds to multiple chemical mechanisms, Atmos. Chem. Phys., 14, 5617–5638, <ext-link xlink:href="https://doi.org/10.5194/acp-14-5617-2014" ext-link-type="DOI">10.5194/acp-14-5617-2014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><mixed-citation>Li, M., Sun, H., Huang, Y., and Chen, H.: Shapley value: from cooperative game to explainable artificial intelligence, Auton. Intell. Syst., 4, 2, <ext-link xlink:href="https://doi.org/10.1007/s43684-023-00060-8" ext-link-type="DOI">10.1007/s43684-023-00060-8</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><mixed-citation>Liu, B., Liang, D., Yang, J., Dai, Q., Bi, X., Feng, Y., Yuan, J., Xiao, Z., Zhang, Y., and Xu, H.: Characterization and source apportionment of volatile organic compounds based on 1-year of observational data in Tianjin, China, Environ. Pollut., 218, 757–769, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2016.07.072" ext-link-type="DOI">10.1016/j.envpol.2016.07.072</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><mixed-citation>Liu, X., Lu, D., Zhang, A., Liu, Q., and Jiang, G.: Data-driven machine learning in environmental pollution: gains and problems, Environ. Sci. Technol., 56, 2124–2133, <ext-link xlink:href="https://doi.org/10.1021/acs.est.1c06157" ext-link-type="DOI">10.1021/acs.est.1c06157</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><mixed-citation>Liu, Y., Wang, H., Jing, S., Peng, Y., Gao, Y., Yan, R., Wang, Q., Lou, S., Cheng, T., and Huang, C.: Strong regional transport of volatile organic compounds (VOCs) during wintertime in Shanghai megacity of China, Atmos. Environ., 244, 117940, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2020.117940" ext-link-type="DOI">10.1016/j.atmosenv.2020.117940</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><mixed-citation>Long, Y., Wu, Y., Xie, Y., Huang, L., Wang, W., Liu, X., Zhou, Z., Zhang, Y., Hanaoka, T., and Ju, Y.: PM<sub>2.5</sub> and ozone pollution-related health challenges in Japan with regards to climate change, Global Environ. Change, 79, 102640, <ext-link xlink:href="https://doi.org/10.1016/j.gloenvcha.2023.102640" ext-link-type="DOI">10.1016/j.gloenvcha.2023.102640</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><mixed-citation>Louhichi, M., Nesmaoui, R., Mbarek, M., and Lazaar, M.: Shapley values for explaining the black box nature of machine learning model clustering, Procedia Comput. Sci., 220, 806–811, <ext-link xlink:href="https://doi.org/10.1016/j.procs.2023.03.107" ext-link-type="DOI">10.1016/j.procs.2023.03.107</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><mixed-citation>Lu, B., Zhang, Z., Jiang, J., Meng, X., Liu, C., Herrmann, H., Chen, J., Xue, L., and Li, X.: Unraveling the O<sub>3</sub>-NO<sub><italic>x</italic></sub>-VOCs relationships induced by anomalous ozone in industrial regions during COVID-19 in Shanghai, Atmos. Environ., 308, 119864, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2023.119864" ext-link-type="DOI">10.1016/j.atmosenv.2023.119864</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><mixed-citation>Lu, X., Zhang, D., Wang, L., Wang, S., Zhang, X., Liu, Y., Chen, K., Song, X., Yin, S., and Zhang, R.: Establishment and verification of anthropogenic speciated VOCs emission inventory of Central China, J. Environ. Sci., 149, 406–418, <ext-link xlink:href="https://doi.org/10.1016/j.jes.2024.01.033" ext-link-type="DOI">10.1016/j.jes.2024.01.033</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><mixed-citation>Lundberg, S. M. and Lee, S.-I.: A unified approach to interpreting model predictions, arXiv [preprint], 30, <ext-link xlink:href="https://doi.org/10.48550/arXiv.1705.07874" ext-link-type="DOI">10.48550/arXiv.1705.07874</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><mixed-citation>Masui, N., Shiojiri, K., Agathokleous, E., Tani, A., and Koike, T.: Elevated O<sub>3</sub> threatens biological communications mediated by plant volatiles: A review focusing on the urban environment, Crit. Rev. Environ. Sci. Technol., 53, 1982–2001, <ext-link xlink:href="https://doi.org/10.1080/10643389.2023.2202105" ext-link-type="DOI">10.1080/10643389.2023.2202105</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><mixed-citation>Mu, J., Zhang, Y., Xia, Z., Fan, G., Zhao, M., Sun, X., Liu, Y., Chen, T., Shen, H., Zhang, Z., Zhang, H., Pan, G., Wang, W., and Xue, L.: Two-year online measurements of volatile organic compounds (VOCs) at four sites in a Chinese city: Significant impact of petrochemical industry, Sci. Total Environ., 858, 159951, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2022.159951" ext-link-type="DOI">10.1016/j.scitotenv.2022.159951</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><mixed-citation>Mukhamatdinov, I. I., Salih, I. S., Khelkhal, M. A., and Vakhin, A. V.: Application of aromatic and industrial solvents for enhancing heavy oil recovery from the Ashalcha field, Energy Fuels, 35, 374–385, <ext-link xlink:href="https://doi.org/10.1021/acs.energyfuels.0c03090" ext-link-type="DOI">10.1021/acs.energyfuels.0c03090</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><mixed-citation>Nelson, D., Choi, Y., Sadeghi, B., Yeganeh, A. K., Ghahremanloo, M., and Park, J.: A comprehensive approach combining positive matrix factorization modeling, meteorology, and machine learning for source apportionment of surface ozone precursors: Underlying factors contributing to ozone formation in Houston, Texas, Environ. Pollut., 334, 122223, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2023.122223" ext-link-type="DOI">10.1016/j.envpol.2023.122223</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><mixed-citation>Ning, Z., Gao, S., Gu, Z., Ni, C., Fang, F., Nie, Y., Jiao, Z., and Wang, C.: Prediction and explanation for ozone variability using cross-stacked ensemble learning model, Sci. Total Environ., 935, 173382, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2024.173382" ext-link-type="DOI">10.1016/j.scitotenv.2024.173382</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><mixed-citation>Paatero, P.: Least squares formulation of robust non-negative factor analysis, Chemometrics Intell. Lab. Syst., 37, 1, <ext-link xlink:href="https://doi.org/10.1016/S0169-7439(96)00044-5" ext-link-type="DOI">10.1016/S0169-7439(96)00044-5</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><mixed-citation>Pichler, M. and Hartig, F.: Machine learning and deep learning – A review for ecologists, Methods Ecol. Evol., 14, 994–1016, <ext-link xlink:href="https://doi.org/10.1111/2041-210X.14061" ext-link-type="DOI">10.1111/2041-210X.14061</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><mixed-citation>Pinthong, N., Thepanondh, S., Kultan, V., and Keawboonchu, J.: Characteristics and impact of VOCs on ozone formation potential in a petrochemical industrial area, Thailand, Atmosphere, 13, 732, <ext-link xlink:href="https://doi.org/10.3390/atmos13050732" ext-link-type="DOI">10.3390/atmos13050732</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><mixed-citation>Ragothaman, A. and Anderson, W. A.: Air quality impacts of petroleum refining and petrochemical industries, Environments, 4, 66, <ext-link xlink:href="https://doi.org/10.3390/environments4030066" ext-link-type="DOI">10.3390/environments4030066</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><mixed-citation>Ren, H., Xia, Z., Yao, L., Qin, G., Zhang, Y., Xu, H., Wang, Z., and Cheng, J.: Investigation on ozone formation mechanism and control strategy of VOCs in petrochemical region: insights from chemical reactivity and photochemical loss, Sci. Total Environ., 914, 169891, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2024.169891" ext-link-type="DOI">10.1016/j.scitotenv.2024.169891</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><mixed-citation>Robin, Y., Amann, J., Baur, T., Goodarzi, P., Schultealbert, C., Schneider, T., and Schütze, A.: High-performance VOC quantification for IAQ monitoring using advanced sensor systems and deep learning, Atmosphere, 12, 1487, <ext-link xlink:href="https://doi.org/10.3390/atmos12111487" ext-link-type="DOI">10.3390/atmos12111487</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><mixed-citation>Salcedo-Sanz, S., Pérez-Aracil, J., Ascenso, G., Del Ser, J., Casillas-Pérez, D., Kadow, C., Fister, D., Barriopedro, D., García-Herrera, R., and Giuliani, M.: Analysis, characterization, prediction, and attribution of extreme atmospheric events with machine learning and deep learning techniques: a review, Theor. Appl. Climatol., 155, 1–44, <ext-link xlink:href="https://doi.org/10.1007/s00704-023-04571-5" ext-link-type="DOI">10.1007/s00704-023-04571-5</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><mixed-citation>Sharma, A. K., Sharma, M., Sharma, A. K., and Sharma, M.: Mapping the impact of environmental pollutants on human health and environment: A systematic review and meta-analysis, J. Geochem. Explor., 255, 107325, <ext-link xlink:href="https://doi.org/10.1016/j.gexplo.2023.107325" ext-link-type="DOI">10.1016/j.gexplo.2023.107325</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib44"><label>44</label><mixed-citation>Sharma, S., Sharma, P., and Khare, M.: Photo-chemical transport modelling of tropospheric ozone: A review, Atmos. Environ., 159, 34–54, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2017.03.047" ext-link-type="DOI">10.1016/j.atmosenv.2017.03.047</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib45"><label>45</label><mixed-citation>Sharma, S., Singhal, A., Venkatramanan, V., Verma, P., and Pandey, M.: Variability in air quality, ozone formation potential by VOCs, and associated air pollution attributable health risks for Delhi's inhabitants, Environ. Sci.-Atmos., 4, 897–910, <ext-link xlink:href="https://doi.org/10.1039/d4ea00064a" ext-link-type="DOI">10.1039/d4ea00064a</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib46"><label>46</label><mixed-citation>Sillman, S.: The relation between ozone, NOx and hydrocarbons in urban and polluted rural environments, Atmos. Environ., 33, 1821–1845, <ext-link xlink:href="https://doi.org/10.1016/S1352-2310(98)00345-8" ext-link-type="DOI">10.1016/S1352-2310(98)00345-8</ext-link>, 1999.</mixed-citation></ref>
      <ref id="bib1.bib47"><label>47</label><mixed-citation>Song, M., Li, X., Yang, S., Yu, X., Zhou, S., Yang, Y., Chen, S., Dong, H., Liao, K., Chen, Q., Lu, K., Zhang, N., Cao, J., Zeng, L., and Zhang, Y.: Spatiotemporal variation, sources, and secondary transformation potential of volatile organic compounds in Xi'an, China, Atmos. Chem. Phys., 21, 4939–4958, <ext-link xlink:href="https://doi.org/10.5194/acp-21-4939-2021" ext-link-type="DOI">10.5194/acp-21-4939-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib48"><label>48</label><mixed-citation>Tan, Y., Han, S., Chen, Y., Zhang, Z., Li, H., Li, W., Yuan, Q., Li, X., Wang, T., and Lee, S.-C.: Characteristics and source apportionment of volatile organic compounds (VOCs) at a coastal site in Hong Kong, Sci. Total Environ., 777, 146241, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2021.146241" ext-link-type="DOI">10.1016/j.scitotenv.2021.146241</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib49"><label>49</label><mixed-citation>Venecek, M. A., Carter, W. P., and Kleeman, M. J.: Updating the SAPRC Maximum Incremental Reactivity (MIR) scale for the United States from 1988 to 2010, J. Air Waste Manage. Assoc., 68, 1301–1316, <ext-link xlink:href="https://doi.org/10.1080/10962247.2018.1498410" ext-link-type="DOI">10.1080/10962247.2018.1498410</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib50"><label>50</label><mixed-citation>Wang, H., Lyu, X., Guo, H., Wang, Y., Zou, S., Ling, Z., Wang, X., Jiang, F., Zeren, Y., Pan, W., Huang, X., and Shen, J.: Ozone pollution around a coastal region of South China Sea: interaction between marine and continental air, Atmos. Chem. Phys., 18, 4277–4295, <ext-link xlink:href="https://doi.org/10.5194/acp-18-4277-2018" ext-link-type="DOI">10.5194/acp-18-4277-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib51"><label>51</label><mixed-citation>Wang, S., Zhao, Y., Han, Y., Li, R., Fu, H., Gao, S., Duan, Y., Zhang, L., and Chen, J.: Spatiotemporal variation, source and secondary transformation potential of volatile organic compounds (VOCs) during the winter days in Shanghai, China, Atmos. Environ., 286, 119203, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2022.119203" ext-link-type="DOI">10.1016/j.atmosenv.2022.119203</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib52"><label>52</label><mixed-citation>Wang, Y., Jiang, S., Huang, L., Lu, G., Kasemsan, M., Yaluk, E. A., Liu, H., Liao, J., Bian, J., and Zhang, K.: Differences between VOCs and NOx transport contributions, their impacts on O<sub>3</sub>, and implications for O<sub>3</sub> pollution mitigation based on CMAQ simulation over the Yangtze River Delta, China, Sci. Total Environ., 872, 162118, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2023.162118" ext-link-type="DOI">10.1016/j.scitotenv.2023.162118</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib53"><label>53</label><mixed-citation>Washenfelder, R., Trainer, M., Frost, G., Ryerson, T., Atlas, E., De Gouw, J., Flocke, F., Fried, A., Holloway, J., and Parrish, D.: Characterization of NO<sub><italic>x</italic></sub>, SO<sub>2</sub>, ethene, and propene from industrial emission sources in Houston, Texas, J. Geophys. Res.-Atmos., 115, D16311, <ext-link xlink:href="https://doi.org/10.1029/2009JD013645" ext-link-type="DOI">10.1029/2009JD013645</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib54"><label>54</label><mixed-citation>Weiss, K. D.: Paint and coatings: A mature industry in transition, Prog. Polym. Sci., 22, 203–245, <ext-link xlink:href="https://doi.org/10.1016/S0079-6700(96)00019-6" ext-link-type="DOI">10.1016/S0079-6700(96)00019-6</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bib55"><label>55</label><mixed-citation>White, W. C.: Butadiene production process overview, Chem. Biol. Interact., 166, 10–14, <ext-link xlink:href="https://doi.org/10.1016/j.cbi.2007.01.009" ext-link-type="DOI">10.1016/j.cbi.2007.01.009</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib56"><label>56</label><mixed-citation>Wu, Y., Fan, X., Liu, Y., Zhang, J., Wang, H., Sun, L., Fang, T., Mao, H., Hu, J., and Wu, L.: Source apportionment of VOCs based on photochemical loss in summer at a suburban site in Beijing, Atmos. Environ., 293, 119459, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2022.119459" ext-link-type="DOI">10.1016/j.atmosenv.2022.119459</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib57"><label>57</label><mixed-citation>Xiao, Z., Yang, X., Gu, H., Hu, J., Zhang, T., Chen, J., Pan, X., Xiu, G., Zhang, W., and Lin, M.: Characterization and sources of volatile organic compounds (VOCs) during 2022 summer ozone pollution control in Shanghai, China, Atmos. Environ., 327, 120464, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2024.120464" ext-link-type="DOI">10.1016/j.atmosenv.2024.120464</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib58"><label>58</label><mixed-citation>Xu, Z., Zou, Q., Jin, L., Shen, Y., Shen, J., Xu, B., Qu, F., Zhang, F., Xu, J., and Pei, X.: Characteristics and sources of ambient Volatile Organic Compounds (VOCs) at a regional background site, YRD region, China: Significant influence of solvent evaporation during hot months, Sci. Total Environ., 857, 159674, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2022.159674" ext-link-type="DOI">10.1016/j.scitotenv.2022.159674</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib59"><label>59</label><mixed-citation>Yang, M., Li, F., Huang, C., Tong, L., Dai, X., and Xiao, H.: VOC characteristics and their source apportionment in a coastal industrial area in the Yangtze River Delta, China, J. Environ. Sci., 127, 483–494, <ext-link xlink:href="https://doi.org/10.1016/j.jes.2022.05.041" ext-link-type="DOI">10.1016/j.jes.2022.05.041</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib60"><label>60</label><mixed-citation>Yang, Y., Meng, X., Chen, Q., Xue, Q., Wang, L., Sun, J., Guo, W., Tao, H., Yang, L., and Chen, F.: Characteristics of volatile organic compounds under different operating conditions in a petrochemical industrial zone and their effects on ozone formation, Environ. Pollut., 363, 125254, <ext-link xlink:href="https://doi.org/10.1016/j.envpol.2024.125254" ext-link-type="DOI">10.1016/j.envpol.2024.125254</ext-link>, 2024. </mixed-citation></ref>
      <ref id="bib1.bib61"><label>61</label><mixed-citation>Yao, D., Tang, G., Wang, Y., Yang, Y., Wang, L., Chen, T., He, H., and Wang, Y.: Significant contribution of spring northwest transport to volatile organic compounds in Beijing, J. Environ. Sci., 104, 169–181, <ext-link xlink:href="https://doi.org/10.1016/j.jes.2020.11.023" ext-link-type="DOI">10.1016/j.jes.2020.11.023</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib62"><label>62</label><mixed-citation>Zhang, M., Liu, Y., Xu, X., He, J., Ji, D., Qu, K., Xu, Y., Cong, C., and Wang, Y.: A Systematic Review on Atmospheric Ozone Pollution in a Typical Peninsula Region of North China: Formation Mechanism, Spatiotemporal Distribution, Source Apportionment, and Health and Ecological Effects, Curr. Pollution Rep., 11, 9, <ext-link xlink:href="https://doi.org/10.1007/s40726-024-00338-2" ext-link-type="DOI">10.1007/s40726-024-00338-2</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib63"><label>63</label><mixed-citation>Zhang, Y., Xue, L., Carter, W. P. L., Pei, C., Chen, T., Mu, J., Wang, Y., Zhang, Q., and Wang, W.: Development of ozone reactivity scales for volatile organic compounds in a Chinese megacity, Atmos. Chem. Phys., 21, 11053–11068, <ext-link xlink:href="https://doi.org/10.5194/acp-21-11053-2021" ext-link-type="DOI">10.5194/acp-21-11053-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib64"><label>64</label><mixed-citation>Zhang, Y., Fu, Q., Wang, T., Huo, J., Cui, H., Mu, J., Tan, Y., Chen, T., Shen, H., and Li, Q.: A quantitative analysis of causes for increasing ozone pollution in Shanghai during the 2022 lockdown and implications for control policy, Atmos. Environ., 326, 120469, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2024.120469" ext-link-type="DOI">10.1016/j.atmosenv.2024.120469</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib65"><label>65</label><mixed-citation>Zhang, Z., Xu, J., Ye, T., Chen, L., Chen, H., and Yao, J.: Distributions and temporal changes of benzene, toluene, ethylbenzene, and xylene concentrations in newly decorated rooms in southeastern China, and the health risks posed, Atmos. Environ., 246, 118071, <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2020.118071" ext-link-type="DOI">10.1016/j.atmosenv.2020.118071</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib66"><label>66</label><mixed-citation>Zhao, D., Xin, J., Wang, W., Jia, D., Wang, Z., Xiao, H., Liu, C., Zhou, J., Tong, L., and Ma, Y.: Effects of the sea-land breeze on coastal ozone pollution in the Yangtze River Delta, China, Sci. Total Environ., 807, 150306, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2021.150306" ext-link-type="DOI">10.1016/j.scitotenv.2021.150306</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib67"><label>67</label><mixed-citation>Zhou, X., Sun, Z., Yan, H., Feng, X., Zhao, H., Liu, Y., Chen, X., and Yang, C.: Produce petrochemicals directly from crude oil catalytic cracking, a techno-economic analysis and life cycle society-environment assessment, J. Cleaner Prod., 308, 127283, <ext-link xlink:href="https://doi.org/10.1016/j.jclepro.2021.127283" ext-link-type="DOI">10.1016/j.jclepro.2021.127283</ext-link>, 2021.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Multi-machine-learning approaches to modeling small-scale source attribution of ozone formation</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
      
Baklanov, A. and Korsholm, U.: On-line integrated meteorological and
chemical transport modelling: advantages and prospectives, Air Pollution
Modeling and Its Application XIX, Springer Netherlands, 3–17,
<a href="https://doi.org/10.1007/978-1-4020-8453-9_1" target="_blank">https://doi.org/10.1007/978-1-4020-8453-9_1</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>
      
Burdett, I. D. and Eisinger, R. S.: Ethylene polymerization processes and
manufacture of polyethylene, Handbook of Industrial Polyethylene and
Technology: Definitive Guide to Manufacturing, Properties, Processing,
Applications and Markets, Wiley, 61–103,
<a href="https://doi.org/10.1002/9781119159797" target="_blank">https://doi.org/10.1002/9781119159797</a>,
2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>
      
Cao, X., Yi, J., Li, Y., Zhao, M., Duan, Y., Zhang, F., and Duan, L.:
Characteristics and Source Apportionment of Volatile Organic Compounds in an
Industrial Area at the Zhejiang–Shanghai Boundary, China, Atmosphere, 15,
237, <a href="https://doi.org/10.3390/atmos15020237" target="_blank">https://doi.org/10.3390/atmos15020237</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
      
Carter, W. P.: Development of the SAPRC-07 chemical mechanism, Atmos.
Environ., 44, 5324–5335, <a href="https://doi.org/10.1016/j.atmosenv.2010.01.026" target="_blank">https://doi.org/10.1016/j.atmosenv.2010.01.026</a>,
2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>
      
Chang, L., He, F., Tie, X., Xu, J., and Gao, W.: Meteorology driving the
highest ozone level occurred during mid-spring to early summer in Shanghai,
China, Sci. Total Environ., 785, 147253,
<a href="https://doi.org/10.1016/j.scitotenv.2021.147253" target="_blank">https://doi.org/10.1016/j.scitotenv.2021.147253</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>
      
Chen, D., Zhou, L., Wang, C., Liu, H., Qiu, Y., Shi, G., Song, D., Tan, Q.,
and Yang, F.: Characteristics of ambient volatile organic compounds during
spring O<sub>3</sub> pollution episode in Chengdu, China, J. Environ. Sci., 114,
115–125, <a href="https://doi.org/10.1016/j.jes.2021.08.014" target="_blank">https://doi.org/10.1016/j.jes.2021.08.014</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
      
Chen, W., Xu, X., and Liu, W.: Combined PMF modelling and machine learning
to identify sources and meteorological influencers of volatile organic
compound pollution in an industrial city in eastern China, Atmos. Environ.,
334, 120714, <a href="https://doi.org/10.1016/j.atmosenv.2024.120714" target="_blank">https://doi.org/10.1016/j.atmosenv.2024.120714</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>
      
Cheng, N., Jing, D., Gu, Z., Cai, X., Shi, Z., Li, S., Chen, L., Li, W., and
Wang, Q.: Observation-Based Ozone Formation Rules by Gradient Boosting
Decision Trees Model in Typical Chemical Industrial Parks, Atmosphere, 15,
600, <a href="https://doi.org/10.3390/atmos15050600" target="_blank">https://doi.org/10.3390/atmos15050600</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>
      
Cheng, Y., Huang, X.-F., Peng, Y., Tang, M.-X., Zhu, B., Xia, S.-Y., and He,
L.-Y.: A novel machine learning method for evaluating the impact of emission
sources on ozone formation, Environ. Pollut., 316, 120685,
<a href="https://doi.org/10.1016/j.envpol.2022.120685" target="_blank">https://doi.org/10.1016/j.envpol.2022.120685</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>
      
Choi, M. S., Qiu, X., Zhang, J., Wang, S., Li, X., Sun, Y., Chen, J., and
Ying, Q.: Study of secondary organic aerosol formation from chlorine
radical-initiated oxidation of volatile organic compounds in a polluted
atmosphere using a 3D chemical transport model, Environ. Sci. Technol., 54,
13409–13418, <a href="https://doi.org/10.1021/acs.est.0c02958" target="_blank">https://doi.org/10.1021/acs.est.0c02958</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>
      
Essamlali, I., Nhaila, H., and El Khaili, M.: Supervised Machine Learning
Approaches for Predicting Key Pollutants and for the Sustainable Enhancement
of Urban Air Quality: A Systematic Review, Sustainability, 16, 976,
<a href="https://doi.org/10.3390/su16030976" target="_blank">https://doi.org/10.3390/su16030976</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
      
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and
Pedreschi, D.: A survey of methods for explaining black box models, ACM.
Comput. Surv, 51, 1–42, <a href="https://doi.org/10.1145/3236009" target="_blank">https://doi.org/10.1145/3236009</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>
      
Guo, W., Yang, Y., Chen, Q., Zhu, Y., Zhang, Y., Zhang, Y., Liu, Y., Li, G.,
Sun, W., and She, J.: Chemical reactivity of volatile organic compounds and
their effects on ozone formation in a petrochemical industrial area of
Lanzhou, Western China, Sci. Total Environ., 839, 155901,
<a href="https://doi.org/10.1016/j.scitotenv.2022.155901" target="_blank">https://doi.org/10.1016/j.scitotenv.2022.155901</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>
      
He, L., Duan, Y., Zhang, Y., Yu, Q., Huo, J., Chen, J., Cui, H., Li, Y., and
Ma, W.: Effects of VOC emissions from chemical industrial parks on regional
O<sub>3</sub>-PM<sub>2.5</sub> compound pollution in the Yangtze River Delta, Sci. Total
Environ., 906, 167503, <a href="https://doi.org/10.1016/j.scitotenv.2023.167503" target="_blank">https://doi.org/10.1016/j.scitotenv.2023.167503</a>,
2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
      
Huang, B., Lei, C., Wei, C., and Zeng, G.: Chlorinated volatile organic
compounds (Cl-VOCs) in environment – sources, potential human health
impacts, and current remediation technologies, Environ. Int., 71, 118–138,
<a href="https://doi.org/10.1016/j.envint.2014.06.013" target="_blank">https://doi.org/10.1016/j.envint.2014.06.013</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
      
Hui, L., Liu, X., Tan, Q., Feng, M., An, J., Qu, Y., Zhang, Y., and Cheng,
N.: VOC characteristics, sources and contributions to SOA formation during
haze events in Wuhan, Central China, Sci. Total Environ., 650, 2624–2639,
<a href="https://doi.org/10.1016/j.scitotenv.2018.10.029" target="_blank">https://doi.org/10.1016/j.scitotenv.2018.10.029</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>
      
Hui, L., Liu, X., Tan, Q., Feng, M., An, J., Qu, Y., Zhang, Y., Deng, Y.,
Zhai, R., and Wang, Z.: VOC characteristics, chemical reactivity and sources
in urban Wuhan, central China, Atmos. Environ., 224, 117340,
<a href="https://doi.org/10.1016/j.atmosenv.2020.117340" target="_blank">https://doi.org/10.1016/j.atmosenv.2020.117340</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>
      
Kaur, H., Nori, H., Jenkins, S., Caruana, R., Wallach, H., and Wortman
Vaughan, J.: Interpreting interpretability: understanding data scientists'
use of interpretability tools for machine learning, Proceedings of the 2020
CHI conference on human factors in computing systems, 2020, Honolulu, HI, USA, 1–14,
<a href="https://doi.org/10.1145/3313831.3376219" target="_blank">https://doi.org/10.1145/3313831.3376219</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>
      
Kim, S.-J., Lee, H.-Y., Lee, S.-J., and Choi, S.-D.: Passive air sampling of
VOCs, O<sub>3</sub>, NO<sub>2</sub>, and SO<sub>2</sub> in the large industrial city of Ulsan, South Korea:
spatial–temporal variations, source identification, and ozone formation
potential, Environ. Sci. Pollut. Res., 30, 125478–125491,
<a href="https://doi.org/10.1007/s11356-023-31109-z" target="_blank">https://doi.org/10.1007/s11356-023-31109-z</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>
      
Kuo, C.-P. and Fu, J. S.: Ozone response modeling to NO<sub><i>x</i></sub> and VOC emissions:
Examining machine learning models, Environ. Int., 176, 107969,
<a href="https://doi.org/10.1016/j.envint.2023.107969" target="_blank">https://doi.org/10.1016/j.envint.2023.107969</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>
      
Li, M., Zhang, Q., Streets, D. G., He, K. B., Cheng, Y. F., Emmons, L. K., Huo, H., Kang, S. C., Lu, Z., Shao, M., Su, H., Yu, X., and Zhang, Y.: Mapping Asian anthropogenic emissions of non-methane volatile organic compounds to multiple chemical mechanisms, Atmos. Chem. Phys., 14, 5617–5638, <a href="https://doi.org/10.5194/acp-14-5617-2014" target="_blank">https://doi.org/10.5194/acp-14-5617-2014</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>
      
Li, M., Sun, H., Huang, Y., and Chen, H.: Shapley value: from cooperative
game to explainable artificial intelligence, Auton. Intell. Syst., 4, 2,
<a href="https://doi.org/10.1007/s43684-023-00060-8" target="_blank">https://doi.org/10.1007/s43684-023-00060-8</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>
      
Liu, B., Liang, D., Yang, J., Dai, Q., Bi, X., Feng, Y., Yuan, J., Xiao, Z.,
Zhang, Y., and Xu, H.: Characterization and source apportionment of volatile
organic compounds based on 1-year of observational data in Tianjin, China,
Environ. Pollut., 218, 757–769, <a href="https://doi.org/10.1016/j.envpol.2016.07.072" target="_blank">https://doi.org/10.1016/j.envpol.2016.07.072</a>,
2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>
      
Liu, X., Lu, D., Zhang, A., Liu, Q., and Jiang, G.: Data-driven machine
learning in environmental pollution: gains and problems, Environ. Sci.
Technol., 56, 2124–2133, <a href="https://doi.org/10.1021/acs.est.1c06157" target="_blank">https://doi.org/10.1021/acs.est.1c06157</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>
      
Liu, Y., Wang, H., Jing, S., Peng, Y., Gao, Y., Yan, R., Wang, Q., Lou, S.,
Cheng, T., and Huang, C.: Strong regional transport of volatile organic
compounds (VOCs) during wintertime in Shanghai megacity of China, Atmos.
Environ., 244, 117940, <a href="https://doi.org/10.1016/j.atmosenv.2020.117940" target="_blank">https://doi.org/10.1016/j.atmosenv.2020.117940</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>
      
Long, Y., Wu, Y., Xie, Y., Huang, L., Wang, W., Liu, X., Zhou, Z., Zhang,
Y., Hanaoka, T., and Ju, Y.: PM<sub>2.5</sub> and ozone pollution-related health
challenges in Japan with regards to climate change, Global Environ. Change,
79, 102640, <a href="https://doi.org/10.1016/j.gloenvcha.2023.102640" target="_blank">https://doi.org/10.1016/j.gloenvcha.2023.102640</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>
      
Louhichi, M., Nesmaoui, R., Mbarek, M., and Lazaar, M.: Shapley values for
explaining the black box nature of machine learning model clustering,
Procedia Comput. Sci., 220, 806–811,
<a href="https://doi.org/10.1016/j.procs.2023.03.107" target="_blank">https://doi.org/10.1016/j.procs.2023.03.107</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>
      
Lu, B., Zhang, Z., Jiang, J., Meng, X., Liu, C., Herrmann, H., Chen, J.,
Xue, L., and Li, X.: Unraveling the O<sub>3</sub>-NO<sub><i>x</i></sub>-VOCs relationships induced by
anomalous ozone in industrial regions during COVID-19 in Shanghai, Atmos.
Environ., 308, 119864, <a href="https://doi.org/10.1016/j.atmosenv.2023.119864" target="_blank">https://doi.org/10.1016/j.atmosenv.2023.119864</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>
      
Lu, X., Zhang, D., Wang, L., Wang, S., Zhang, X., Liu, Y., Chen, K., Song,
X., Yin, S., and Zhang, R.: Establishment and verification of anthropogenic
speciated VOCs emission inventory of Central China, J. Environ. Sci., 149,
406–418, <a href="https://doi.org/10.1016/j.jes.2024.01.033" target="_blank">https://doi.org/10.1016/j.jes.2024.01.033</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>
      
Lundberg, S. M. and Lee, S.-I.: A unified approach to interpreting model
predictions, arXiv [preprint], 30,
<a href="https://doi.org/10.48550/arXiv.1705.07874" target="_blank">https://doi.org/10.48550/arXiv.1705.07874</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>
      
Masui, N., Shiojiri, K., Agathokleous, E., Tani, A., and Koike, T.: Elevated
O<sub>3</sub> threatens biological communications mediated by plant volatiles: A review
focusing on the urban environment, Crit. Rev. Environ. Sci. Technol., 53,
1982–2001, <a href="https://doi.org/10.1080/10643389.2023.2202105" target="_blank">https://doi.org/10.1080/10643389.2023.2202105</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>
      
Mu, J., Zhang, Y., Xia, Z., Fan, G., Zhao, M., Sun, X., Liu, Y., Chen, T.,
Shen, H., Zhang, Z., Zhang, H., Pan, G., Wang, W., and Xue, L.: Two-year
online measurements of volatile organic compounds (VOCs) at four sites in a
Chinese city: Significant impact of petrochemical industry, Sci. Total
Environ., 858, 159951, <a href="https://doi.org/10.1016/j.scitotenv.2022.159951" target="_blank">https://doi.org/10.1016/j.scitotenv.2022.159951</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>
      
Mukhamatdinov, I. I., Salih, I. S., Khelkhal, M. A., and Vakhin, A. V.:
Application of aromatic and industrial solvents for enhancing heavy oil
recovery from the Ashalcha field, Energy Fuels, 35, 374–385,
<a href="https://doi.org/10.1021/acs.energyfuels.0c03090" target="_blank">https://doi.org/10.1021/acs.energyfuels.0c03090</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>
      
Nelson, D., Choi, Y., Sadeghi, B., Yeganeh, A. K., Ghahremanloo, M., and
Park, J.: A comprehensive approach combining positive matrix factorization
modeling, meteorology, and machine learning for source apportionment of
surface ozone precursors: Underlying factors contributing to ozone formation
in Houston, Texas, Environ. Pollut., 334, 122223,
<a href="https://doi.org/10.1016/j.envpol.2023.122223" target="_blank">https://doi.org/10.1016/j.envpol.2023.122223</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>
      
Ning, Z., Gao, S., Gu, Z., Ni, C., Fang, F., Nie, Y., Jiao, Z., and Wang,
C.: Prediction and explanation for ozone variability using cross-stacked
ensemble learning model, Sci. Total Environ., 935, 173382,
<a href="https://doi.org/10.1016/j.scitotenv.2024.173382" target="_blank">https://doi.org/10.1016/j.scitotenv.2024.173382</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>36</label><mixed-citation>
      
Paatero, P.: Least squares formulation of robust non-negative factor
analysis, Chemometrics Intell. Lab. Syst., 37, 1,
<a href="https://doi.org/10.1016/S0169-7439(96)00044-5" target="_blank">https://doi.org/10.1016/S0169-7439(96)00044-5</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>37</label><mixed-citation>
      
Pichler, M. and Hartig, F.: Machine learning and deep learning – A review
for ecologists, Methods Ecol. Evol., 14, 994–1016,
<a href="https://doi.org/10.1111/2041-210X.14061" target="_blank">https://doi.org/10.1111/2041-210X.14061</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>38</label><mixed-citation>
      
Pinthong, N., Thepanondh, S., Kultan, V., and Keawboonchu, J.:
Characteristics and impact of VOCs on ozone formation potential in a
petrochemical industrial area, Thailand, Atmosphere, 13, 732,
<a href="https://doi.org/10.3390/atmos13050732" target="_blank">https://doi.org/10.3390/atmos13050732</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>39</label><mixed-citation>
      
Ragothaman, A. and Anderson, W. A.: Air quality impacts of petroleum
refining and petrochemical industries, Environments, 4, 66,
<a href="https://doi.org/10.3390/environments4030066" target="_blank">https://doi.org/10.3390/environments4030066</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>40</label><mixed-citation>
      
Ren, H., Xia, Z., Yao, L., Qin, G., Zhang, Y., Xu, H., Wang, Z., and Cheng,
J.: Investigation on ozone formation mechanism and control strategy of VOCs
in petrochemical region: insights from chemical reactivity and photochemical
loss, Sci. Total Environ., 914, 169891,
<a href="https://doi.org/10.1016/j.scitotenv.2024.169891" target="_blank">https://doi.org/10.1016/j.scitotenv.2024.169891</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>41</label><mixed-citation>
      
Robin, Y., Amann, J., Baur, T., Goodarzi, P., Schultealbert, C., Schneider,
T., and Schütze, A.: High-performance VOC quantification for IAQ
monitoring using advanced sensor systems and deep learning, Atmosphere, 12,
1487, <a href="https://doi.org/10.3390/atmos12111487" target="_blank">https://doi.org/10.3390/atmos12111487</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>42</label><mixed-citation>
      
Salcedo-Sanz, S., Pérez-Aracil, J., Ascenso, G., Del Ser, J.,
Casillas-Pérez, D., Kadow, C., Fister, D., Barriopedro, D.,
García-Herrera, R., and Giuliani, M.: Analysis, characterization,
prediction, and attribution of extreme atmospheric events with machine
learning and deep learning techniques: a review, Theor. Appl. Climatol.,
155, 1–44, <a href="https://doi.org/10.1007/s00704-023-04571-5" target="_blank">https://doi.org/10.1007/s00704-023-04571-5</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>43</label><mixed-citation>
      
Sharma, A. K., Sharma, M., Sharma, A. K., and Sharma, M.: Mapping the impact
of environmental pollutants on human health and environment: A systematic
review and meta-analysis, J. Geochem. Explor., 255, 107325,
<a href="https://doi.org/10.1016/j.gexplo.2023.107325" target="_blank">https://doi.org/10.1016/j.gexplo.2023.107325</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>44</label><mixed-citation>
      
Sharma, S., Sharma, P., and Khare, M.: Photo-chemical transport modelling of
tropospheric ozone: A review, Atmos. Environ., 159, 34–54,
<a href="https://doi.org/10.1016/j.atmosenv.2017.03.047" target="_blank">https://doi.org/10.1016/j.atmosenv.2017.03.047</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>45</label><mixed-citation>
      
Sharma, S., Singhal, A., Venkatramanan, V., Verma, P., and Pandey, M.:
Variability in air quality, ozone formation potential by VOCs, and
associated air pollution attributable health risks for Delhi's inhabitants,
Environ. Sci.-Atmos., 4, 897–910, <a href="https://doi.org/10.1039/d4ea00064a" target="_blank">https://doi.org/10.1039/d4ea00064a</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>46</label><mixed-citation>
      
Sillman, S.: The relation between ozone, NOx and hydrocarbons in urban and
polluted rural environments, Atmos. Environ., 33, 1821–1845,
<a href="https://doi.org/10.1016/S1352-2310(98)00345-8" target="_blank">https://doi.org/10.1016/S1352-2310(98)00345-8</a>, 1999.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>47</label><mixed-citation>
      
Song, M., Li, X., Yang, S., Yu, X., Zhou, S., Yang, Y., Chen, S., Dong, H., Liao, K., Chen, Q., Lu, K., Zhang, N., Cao, J., Zeng, L., and Zhang, Y.: Spatiotemporal variation, sources, and secondary transformation potential of volatile organic compounds in Xi'an, China, Atmos. Chem. Phys., 21, 4939–4958, <a href="https://doi.org/10.5194/acp-21-4939-2021" target="_blank">https://doi.org/10.5194/acp-21-4939-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>48</label><mixed-citation>
      
Tan, Y., Han, S., Chen, Y., Zhang, Z., Li, H., Li, W., Yuan, Q., Li, X.,
Wang, T., and Lee, S.-C.: Characteristics and source apportionment of
volatile organic compounds (VOCs) at a coastal site in Hong Kong, Sci. Total
Environ., 777, 146241, <a href="https://doi.org/10.1016/j.scitotenv.2021.146241" target="_blank">https://doi.org/10.1016/j.scitotenv.2021.146241</a>,
2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>49</label><mixed-citation>
      
Venecek, M. A., Carter, W. P., and Kleeman, M. J.: Updating the SAPRC
Maximum Incremental Reactivity (MIR) scale for the United States from 1988
to 2010, J. Air Waste Manage. Assoc., 68, 1301–1316,
<a href="https://doi.org/10.1080/10962247.2018.1498410" target="_blank">https://doi.org/10.1080/10962247.2018.1498410</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>50</label><mixed-citation>
      
Wang, H., Lyu, X., Guo, H., Wang, Y., Zou, S., Ling, Z., Wang, X., Jiang, F., Zeren, Y., Pan, W., Huang, X., and Shen, J.: Ozone pollution around a coastal region of South China Sea: interaction between marine and continental air, Atmos. Chem. Phys., 18, 4277–4295, <a href="https://doi.org/10.5194/acp-18-4277-2018" target="_blank">https://doi.org/10.5194/acp-18-4277-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>51</label><mixed-citation>
      
Wang, S., Zhao, Y., Han, Y., Li, R., Fu, H., Gao, S., Duan, Y., Zhang, L.,
and Chen, J.: Spatiotemporal variation, source and secondary transformation
potential of volatile organic compounds (VOCs) during the winter days in
Shanghai, China, Atmos. Environ., 286, 119203,
<a href="https://doi.org/10.1016/j.atmosenv.2022.119203" target="_blank">https://doi.org/10.1016/j.atmosenv.2022.119203</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>52</label><mixed-citation>
      
Wang, Y., Jiang, S., Huang, L., Lu, G., Kasemsan, M., Yaluk, E. A., Liu, H.,
Liao, J., Bian, J., and Zhang, K.: Differences between VOCs and NOx
transport contributions, their impacts on O<sub>3</sub>, and implications for O<sub>3</sub>
pollution mitigation based on CMAQ simulation over the Yangtze River Delta,
China, Sci. Total Environ., 872, 162118,
<a href="https://doi.org/10.1016/j.scitotenv.2023.162118" target="_blank">https://doi.org/10.1016/j.scitotenv.2023.162118</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>53</label><mixed-citation>
      
Washenfelder, R., Trainer, M., Frost, G., Ryerson, T., Atlas, E., De Gouw,
J., Flocke, F., Fried, A., Holloway, J., and Parrish, D.: Characterization
of NO<sub><i>x</i></sub>, SO<sub>2</sub>, ethene, and propene from industrial emission sources in
Houston, Texas, J. Geophys. Res.-Atmos., 115, D16311,
<a href="https://doi.org/10.1029/2009JD013645" target="_blank">https://doi.org/10.1029/2009JD013645</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>54</label><mixed-citation>
      
Weiss, K. D.: Paint and coatings: A mature industry in transition, Prog.
Polym. Sci., 22, 203–245, <a href="https://doi.org/10.1016/S0079-6700(96)00019-6" target="_blank">https://doi.org/10.1016/S0079-6700(96)00019-6</a>,
1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>55</label><mixed-citation>
      
White, W. C.: Butadiene production process overview, Chem. Biol. Interact.,
166, 10–14, <a href="https://doi.org/10.1016/j.cbi.2007.01.009" target="_blank">https://doi.org/10.1016/j.cbi.2007.01.009</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>56</label><mixed-citation>
      
Wu, Y., Fan, X., Liu, Y., Zhang, J., Wang, H., Sun, L., Fang, T., Mao, H.,
Hu, J., and Wu, L.: Source apportionment of VOCs based on photochemical loss
in summer at a suburban site in Beijing, Atmos. Environ., 293, 119459,
<a href="https://doi.org/10.1016/j.atmosenv.2022.119459" target="_blank">https://doi.org/10.1016/j.atmosenv.2022.119459</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>57</label><mixed-citation>
      
Xiao, Z., Yang, X., Gu, H., Hu, J., Zhang, T., Chen, J., Pan, X., Xiu, G.,
Zhang, W., and Lin, M.: Characterization and sources of volatile organic
compounds (VOCs) during 2022 summer ozone pollution control in Shanghai,
China, Atmos. Environ., 327, 120464,
<a href="https://doi.org/10.1016/j.atmosenv.2024.120464" target="_blank">https://doi.org/10.1016/j.atmosenv.2024.120464</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>58</label><mixed-citation>
      
Xu, Z., Zou, Q., Jin, L., Shen, Y., Shen, J., Xu, B., Qu, F., Zhang, F., Xu,
J., and Pei, X.: Characteristics and sources of ambient Volatile Organic
Compounds (VOCs) at a regional background site, YRD region, China:
Significant influence of solvent evaporation during hot months, Sci. Total
Environ., 857, 159674, <a href="https://doi.org/10.1016/j.scitotenv.2022.159674" target="_blank">https://doi.org/10.1016/j.scitotenv.2022.159674</a>,
2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>59</label><mixed-citation>
      
Yang, M., Li, F., Huang, C., Tong, L., Dai, X., and Xiao, H.: VOC
characteristics and their source apportionment in a coastal industrial area
in the Yangtze River Delta, China, J. Environ. Sci., 127, 483–494,
<a href="https://doi.org/10.1016/j.jes.2022.05.041" target="_blank">https://doi.org/10.1016/j.jes.2022.05.041</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>60</label><mixed-citation>
      
Yang, Y., Meng, X., Chen, Q., Xue, Q., Wang, L., Sun, J., Guo, W., Tao, H.,
Yang, L., and Chen, F.: Characteristics of volatile organic compounds under
different operating conditions in a petrochemical industrial zone and their
effects on ozone formation, Environ. Pollut., 363, 125254,
<a href="https://doi.org/10.1016/j.envpol.2024.125254" target="_blank">https://doi.org/10.1016/j.envpol.2024.125254</a>, 2024.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>61</label><mixed-citation>
      
Yao, D., Tang, G., Wang, Y., Yang, Y., Wang, L., Chen, T., He, H., and Wang,
Y.: Significant contribution of spring northwest transport to volatile
organic compounds in Beijing, J. Environ. Sci., 104, 169–181,
<a href="https://doi.org/10.1016/j.jes.2020.11.023" target="_blank">https://doi.org/10.1016/j.jes.2020.11.023</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>62</label><mixed-citation>
      
Zhang, M., Liu, Y., Xu, X., He, J., Ji, D., Qu, K., Xu, Y., Cong, C., and
Wang, Y.: A Systematic Review on Atmospheric Ozone Pollution in a Typical
Peninsula Region of North China: Formation Mechanism, Spatiotemporal
Distribution, Source Apportionment, and Health and Ecological Effects, Curr.
Pollution Rep., 11, 9, <a href="https://doi.org/10.1007/s40726-024-00338-2" target="_blank">https://doi.org/10.1007/s40726-024-00338-2</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>63</label><mixed-citation>
      
Zhang, Y., Xue, L., Carter, W. P. L., Pei, C., Chen, T., Mu, J., Wang, Y., Zhang, Q., and Wang, W.: Development of ozone reactivity scales for volatile organic compounds in a Chinese megacity, Atmos. Chem. Phys., 21, 11053–11068, <a href="https://doi.org/10.5194/acp-21-11053-2021" target="_blank">https://doi.org/10.5194/acp-21-11053-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>64</label><mixed-citation>
      
Zhang, Y., Fu, Q., Wang, T., Huo, J., Cui, H., Mu, J., Tan, Y., Chen, T.,
Shen, H., and Li, Q.: A quantitative analysis of causes for increasing ozone
pollution in Shanghai during the 2022 lockdown and implications for control
policy, Atmos. Environ., 326, 120469,
<a href="https://doi.org/10.1016/j.atmosenv.2024.120469" target="_blank">https://doi.org/10.1016/j.atmosenv.2024.120469</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>65</label><mixed-citation>
      
Zhang, Z., Xu, J., Ye, T., Chen, L., Chen, H., and Yao, J.: Distributions
and temporal changes of benzene, toluene, ethylbenzene, and xylene
concentrations in newly decorated rooms in southeastern China, and the
health risks posed, Atmos. Environ., 246, 118071,
<a href="https://doi.org/10.1016/j.atmosenv.2020.118071" target="_blank">https://doi.org/10.1016/j.atmosenv.2020.118071</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>66</label><mixed-citation>
      
Zhao, D., Xin, J., Wang, W., Jia, D., Wang, Z., Xiao, H., Liu, C., Zhou, J.,
Tong, L., and Ma, Y.: Effects of the sea-land breeze on coastal ozone
pollution in the Yangtze River Delta, China, Sci. Total Environ., 807,
150306, <a href="https://doi.org/10.1016/j.scitotenv.2021.150306" target="_blank">https://doi.org/10.1016/j.scitotenv.2021.150306</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>67</label><mixed-citation>
      
Zhou, X., Sun, Z., Yan, H., Feng, X., Zhao, H., Liu, Y., Chen, X., and Yang,
C.: Produce petrochemicals directly from crude oil catalytic cracking, a
techno-economic analysis and life cycle society-environment assessment, J.
Cleaner Prod., 308, 127283, <a href="https://doi.org/10.1016/j.jclepro.2021.127283" target="_blank">https://doi.org/10.1016/j.jclepro.2021.127283</a>,
2021.

    </mixed-citation></ref-html>--></article>
