Advancing operational global aerosol forecasting with machine learning

AI Summary18 min read

TL;DR

AI-GAMFS, a machine learning system using vision transformers and U-Net, provides fast and accurate 5-day global aerosol forecasts, outperforming traditional models like CAMS and GEOS-FP in aerosol optical depth and dust predictions with significantly reduced computational costs.

Key Takeaways

•AI-GAMFS leverages a vision transformer and U-Net architecture to efficiently model complex aerosol-meteorology interactions, delivering 5-day forecasts in about 1 minute, 360 times faster than conventional methods.
•The system shows superior performance in forecasting aerosol optical depth (AOD) and dust components compared to state-of-the-art models like CAMS and GEOS-FP, with improved accuracy over regions like the USA and China.
•A relay forecasting strategy using multiple base models (3-h, 6-h, 9-h, 12-h) reduces error accumulation, enhancing forecast accuracy beyond 24 hours for various aerosol variables.
•AI-GAMFS provides reliable forecasts for key aerosol components (e.g., sulfate, black carbon) and surface concentrations, aiding in air-quality management and climate change mitigation.
•The model demonstrates robust performance in real-world operational settings, validated by independent ground-based observations from networks like AERONET and CARSNET.

Abstract

Aerosol forecasting is important for air-quality management, health risk assessment and climate change mitigation^1,2. However, it is more complex than weather forecasting, owing to the interactions between aerosol physicochemical processes and atmospheric dynamics, resulting in high uncertainty and computational costs^3,4. Here we develop a machine-learning-driven Global Aerosol–Meteorology Forecasting System (AI-GAMFS), which provides reliable 5-day, 3-hourly forecasts of aerosol optical components and surface concentrations. AI-GAMFS combines a vision transformer and U-Net in a backbone network, robustly capturing the complex aerosol–meteorology interactions via global attention and spatiotemporal encoding. Trained on 42 years of aerosol reanalysis data and initialized with Global Earth Observing System Forward Processing (GEOS-FP) analyses, AI-GAMFS delivers operational 5-day forecasts in 1 minute. Evaluation with independent ground-based observations suggests improved performance compared with the Copernicus Atmosphere Monitoring Service⁵ and regional dust models^6,7,8,9 in forecasting aerosol optical depth and dust components. Compared with GEOS-FP¹⁰, it has a lower root-mean-square error for global aerosol optical depth, with comparable dust forecasting skill and improved surface aerosol component forecasts over the USA and China. Our results provide a step forward in leveraging machine learning to refine aerosol forecasting and may help warn against aerosol pollution events such as dust storms and wildfires.

Main

Atmospheric aerosols have a critical role in Earth’s climate system, affecting radiative forcing, cloud microphysics and atmospheric chemistry^1,2. Owing to their diverse optical and microphysical properties, combined with complex chemical compositions, aerosols influence weather and climate in various ways^11,12. Key components, such as black carbon (BC) and dust, show considerable variability in terms of radiative forcing, making aerosols a major source of uncertainty in climate change assessments^13,14. In addition, the complex chemical reactivity and wide particle size ranges of aerosols can degrade air quality^15,16, posing health risks that include respiratory, cardiovascular and neurological diseases¹⁷. Accurate forecasting of aerosol distributions and compositions is therefore essential for improving air-quality management, protecting public health and mitigating climate change.

However, aerosol forecasting presents markedly greater complexity and cost than weather forecasting owing to the need to account for diverse aerosol sources and types, intricate chemical reactions, physical processes, and multiscale interactions with weather systems^3,4. These complexities result in nonlinear and highly variable processes for aerosol generation, transport, transformation and removal, contributing substantially to forecast uncertainty¹⁸. To enable short- to medium-term aerosol forecasting, traditional physics-based forecasting systems, such as the Copernicus Atmospheric Monitoring Service (CAMS) from the European Centre for Medium-Range Weather Forecasts⁵ and NASA’s Global Earth Observing System Forward Processing (GEOS-FP)¹⁰, couple numerical weather prediction (NWP) models with atmospheric chemical transport models. These systems must simultaneously resolve atmospheric dynamics and compute thousands of aerosol-related chemical reactions and microphysical interactions, further intensifying the already high computational cost of NWP^19,20. Recent advances in machine learning have opened new avenues of investigation, leading researchers to explore advanced neural networks as complementary tools for NWP^{21,22,23,24,25} and its downstream tasks, such as oceanic variables^26,27. These neural network models have shown considerable promise in enhancing computational efficiency and accuracy in weather forecasting; however, machine-learning research specifically targeting global aerosol forecasting remains notably underdeveloped. Although recent studies have begun applying deep learning to aerosol forecasting on both global and regional scales^28,29, these efforts depend largely on NWP inputs and are often restricted to single aerosol metrics such as total aerosol optical depth (AOD). The operational integration of machine-learning models for simultaneous global-scale aerosol component and meteorological forecasting remains incomplete, particularly because of the challenges in representing coupled aerosol–weather processes, generalizing across diverse aerosol types and addressing computational constraints.

Here we present a machine-learning-driven Global Aerosol–Meteorology Forecasting System (AI-GAMFS), designed to rapidly simulate complex aerosol–meteorology interactions across spatial and temporal scales. AI-GAMFS was trained on 42 years of Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2)³⁰ atmospheric reanalysis data. Evaluation with global Aerosol Robotic Network (AERONET)³¹ and Chinese Aerosol Remote Sensing Network (CARSNET)³² observations suggests improved performance of the operational AI-GAMFS, compared with the state-of-the-art CAMS and regional dust models, in forecasting AOD and dust components. Compared with GEOS-FP, it also provides improved global AOD forecasts, comparable dust forecasting skill and improved key surface aerosol component forecasts over the USA and China, with an order-of-magnitude reduction in computational cost.

AI-GAMFS

AI-GAMFS is designed to provide global 5-day aerosol–meteorology forecasts at approximately 50-km spatial resolution and 3-hourly temporal intervals (01:30, 04:30, 07:30, …, 22:30 UTC), forecasting AOD, the optical properties and surface concentrations of key aerosol components—including sulfate, dust, BC, organic carbon (OC) and sea salt (SS)—as well as surface and upper-level meteorological variables (Supplementary Table 1) that govern aerosol lifecycle dynamics. Its architecture comprises three core modules (Fig. 1a): (1) cube embedding, which extracts three-dimensional spatiotemporal features from the input feature matrix; (2) a vision transformer, which uses a multiheaded self-attention mechanism to process and understand complex relationships between features; and (3) cube unembedding, which reconstructs high-dimensional features back to the original spatial resolution using deconvolution and upsampling techniques. To ensure the accuracy and fidelity of the forecasts, skip connections are incorporated. Working synergistically, these modules accurately forecast the spatial fields of aerosol and meteorological states at the next time step, using the previous time step as input.

We trained four base models separately with a forecast lead time of 3 h, 6 h, 9 h and 12 h. Each base model was trained for 80 epochs using the same framework and settings, containing approximately 1.2 billion parameters, and was trained for 10 days on 8 L40 graphics processing units (GPUs). To mitigate error accumulation owing to long-term iterations in a single model, a temporal aggregation strategy was used to perform relay forecasting with the four base models (Fig. 1b). Once pretraining and relay connection are completed, the final AI-GAMFS model generates 5-day operational forecasts in approximately 39 s on a single L40 GPU, using real-time GEOS-FP analysis fields as input. This represents computational speed that is approximately 360-times faster than that associated with conventional GEOS-FP forecasts (which require approximately 4–6 h).

Relay forecasting reduces accumulation errors

Building on this relay architecture designed to curb error growth, we proceeded to systematically evaluate and optimize its configuration. Using 4 pretrained base AI-GAMFS models with forecast lead times of 3 h, 6 h, 9 h and 12 h, we designed 4 progressive forecasting schemes to identify the optimal relay forecasting strategy: the 3-h single model, the 3- and 6-h relay, the 3-h, 6-h and 9-h relay, and the 3-h, 6-h, 9-h and 12-h relay. Extended Data Fig. 1a illustrates the frequency with which the 4 pretrained base models (with lead times of 3 h, 6 h, 9 h and 12 h) are invoked under these 4 forecasting strategies. Notably, for forecasts with a specific lead time, when at least two pretrained base models are used in the relay, we prioritize models with longer lead times, iteratively using their forecast results as inputs for the next forecast time step, thereby minimizing the number of iterations as much as possible.

We compared the global 5-day forecasting accuracy of AI-GAMFS—initialized by MERRA-2 reanalysis and run daily at 22:30 UTC—for all 12 aerosol variables using different relay forecasting strategies, with the 2022 MERRA-2 data (test set) as a baseline. The aerosol variables include AOD, total scattering AOD (TSAOD), sulfate, dust, BC, OC and SS AOD (SUAOD, DUAOD, BCAOD, OCAOD and SSAOD, respectively), and sulfate, dust, BC, OC and SS surface mass concentration (SUSMC, DUSMC, BCSMC, OCSMC and SSSMC, respectively). Extended Data Fig. 1b,c shows time series of the global spatial correlation coefficient (R) and latitude-weighted root-mean-square error (RMSE) for these aerosol variables, respectively. The results indicate that within a 24-h forecast horizon, the performance of the 3-h, 6-h, 9-h and 12-h relay model is similar to that of the 3-h single model and the other relay models. However, for forecast lead times beyond 24 h, the 3-h, 6-h, 9-h and 12-h relay model shows superior accuracy for nearly all aerosol variables, in terms of both R and RMSE. For example, for all aerosol variables at a 120-h lead time, the average RMSE value for the 3-h, 6-h, 9-h and 12-h relay model is typically 15.1%, 5.6% and 3.2% lower than that of the 3-h single model, the 3-h and 6-h relay model, and the 3-h, 6-h and 9-h relay model, respectively. This advantage is also evident in global forecasts for various meteorological variables (Extended Data Fig. 2). The use of 4 base models in the relay strategy generally yields results comparable to or slightly better than those from the 3-h and 6-h relay and the 3-h, 6-h and 9-h relay models, but that substantially outperform those of the 3-h single model. However, we note that while the aggregation strategy helps alleviate short- to medium-term error accumulation, the improvement tends to plateau as the number of base models increases. Therefore, we ultimately selected the 3-h, 6-h, 9-h and 12-h relay model strategy as the final AI-GAMFS model, which was used in all subsequent evaluations and analyses.

Enhanced global aerosol forecasts

Globally, AOD is one of the most widely observed atmospheric aerosol parameters and is used extensively in climate change research, air-quality monitoring and environmental assessments. As a key component of AOD, DUAOD serves as an essential metric for monitoring the global dust cycle and its impacts. This study provided comprehensive evaluation of the 5-day, 3-hourly global AOD and DUAOD forecasts generated by AI-GAMFS, initialized daily at 22:30 UTC, utilizing MERRA-2 evaluation data from 2023. The performance of operational AI-GAMFS, initialized by GEOS-FP analyses, is compared with that of CAMS (run daily at 00:00 UTC), one of the leading global aerosol forecast models, as illustrated in Fig. 2a. Meanwhile, forecasts from AI-GAMFS initialized with MERRA-2 reanalysis are also presented to evaluate the impact of initial conditions. Over the 0–120-h forecast period, operational AI-GAMFS consistently outperforms CAMS in forecasting both AOD and DUAOD, as measured by both R and RMSE. Specifically, AI-GAMFS shows a clear advantage during the 0–2-day period, improving the average R value by 11.5% and 13.8%, and reducing the average RMSE by 22.3% and 37.3%, for AOD and DUAOD relative to CAMS, respectively. However, as the forecast lead time increases, the advantage of operational AI-GAMFS diminishes. Nevertheless, at a 120-h lead time, operational AI-GAMFS still produces a lower RMSE than CAMS, with reductions of approximately 11.3% and 25.2% for AOD and DUAOD, respectively. Our analysis also shows that the impact of initial conditions on AI-GAMFS is most pronounced within the first 48 h, with little to no effect thereafter.

**Fig. 2: Superior performance of AI-GAMFS in global AOD and DUAOD forecasting throughout 2023.**

Given the differences in initial conditions between AI-GAMFS and CAMS, AI-GAMFS might benefit from using MERRA-2 as the reference data for evaluation. To ensure a fairer comparison, we additionally used Level-2.0 instantaneous global aerosol observations from AERONET in 2023 to evaluate the 5-day, 3-hourly AOD and DUAOD forecast performance of both operational AI-GAMFS and CAMS, for which DUAOD was evaluated against the AERONET coarse-mode AOD (AODc). Figure 2b presents time series of the R and RMSE values, calculated from all matched global samples for 2023, at each forecast lead time. Operational AI-GAMFS shows high forecasting skill against AERONET observations, albeit with a predictable degradation over time. Specifically, the model maintains reasonable accuracy throughout the forecast period (days 1–5), with mean AOD R ranging from 0.57 to 0.78 (RMSE 0.12 to 0.15) and DUAOD R ranging from 0.65 to 0.73 (RMSE 0.04 to 0.06). Consistent with the evaluation using MERRA-2 as the reference, operational AI-GAMFS also provides more accurate AOD and DUAOD forecasts than CAMS. Statistically, across all 40 forecast steps (3-h intervals), operational AI-GAMFS outperforms CAMS at 31 and 36 steps for R, and at 37 and 40 steps for RMSE, for AOD and DUAOD, respectively. Importantly, this consistent accuracy between the MERRA-2-driven and operational (GEOS-FP-driven) configurations affirms the reliability and effectiveness of AI-GAMFS in a real-world operational environment.

Figure 2c further illustrates the spatial distribution of the average RMSE for each step of the 5-day AOD and DUAOD forecasts (a total of 40 steps) from operational AI-GAMFS at each AERONET site, alongside the RMSE difference between CAMS and operational AI-GAMFS. Overall, for AOD, operational AI-GAMFS shows lower RMSE values than those of CAMS at 61.6% of AERONET sites, located primarily in the USA, Europe, Africa and Southeast Asia. Given that China is one of the regions with considerable aerosol loading, yet critically lacks AERONET coverage, we conducted a complementary evaluation using continuous AOD observations from 26 CARSNET sites in 2023 (Extended Data Fig. 3a,b). The results show that operational AI-GAMFS provides acceptable forecasting skill over China, with the mean R ranging from 0.44 to 0.65 and the mean RMSE ranging from 0.26 to 0.34 throughout the forecast period (days 1–5). This evaluation also confirms the robust superiority of operational AI-GAMFS compared with CAMS, achieving a higher R at 63.3% of the forecast steps and a lower RMSE at 61.5% of the sites. Moreover, for global DUAOD forecasting, operational AI-GAMFS shows a clear advantage over CAMS, achieving a lower RMSE at 86.0% of the sites worldwide (Fig. 2c). These results robustly demonstrate the superior performance of operational AI-GAMFS compared with CAMS in terms of global AOD and DUAOD forecasting.

Regional dust storms

East Asia is one of the regions affected most severely by dust storms, highlighting the critical need for accurate forecasts of such events. The operational AI-GAMFS model forecasts both DUAOD and DUSMC, thereby presenting the opportunity to assess its performance relative to several well-established physics-based dust forecasting models. For this evaluation, we used East Asia dust forecast products for 2023 derived from forecasts of CAMS and four physics-based dust forecasting models deployed at the Sand and Dust Storm Warning Advisory and Assessment System (SDS-WAS) Asian Regional Centre. These models include SILAM from the Finnish Meteorological Institute (FMI-SILAM)⁶, CUACE/Dust from the China Meteorological Administration (CMA-CUACE/Dust)⁷, MASINGAR from the Japan Meteorological Agency (JMA-MASINGAR)⁸ and ADAM3 from the Korea Meteorological Agency (KMA-ADAM3)⁹. We evaluated the 5-day forecast accuracy of DUAOD and DUSMC from operational AI-GAMFS (initialized daily at 22:30 UTC), CAMS, FMI-SILAM, CMA-CUACE/Dust and KMA-ADAM3 (initialized daily at 00:00 UTC), relative to MERRA-2 data from 2023. Because JMA-MASINGAR initializes daily at 12:00 UTC and provides 3-day forecasts, we adjusted the initialization time for AI-GAMFS to 10:30 UTC for comparison. In addition, owing to differences in forecast coverage areas and temporal resolutions across models, we conducted the evaluation only for the overlapping East Asia region (Extended Data Fig. 4a) and applied temporal interpolation to the different models.

Extended Data Fig. 4b,c shows the time series of spatial R and latitude-weighted RMSE for these models. Consistent with its global performance, operational AI-GAMFS notably outperforms the 5 physics-based dust forecast models over East Asia across all forecast periods at 72 h (JMA-MASINGAR; Extended Data Fig. 4c) and 120 h (other 4 models; Extended Data Fig. 4b). Specifically, the spatial R for DUAOD at a 72-h lead time is improved by 12.0%, 21.4%, 34.2%, 105.1% and 199.7% relative to FMI-SILAM, CAMS, JMA-MASINGAR, CMA-CUACE/Dust and KMA-ADAM3, respectively. At the 120-h lead time (that is, 5 days), the improvement of operational AI-GAMFS is 4.9%, 16.9%, 90.4% and 133.5% relative to FMI-SILAM, CAMS, CMA-CUACE/Dust and KMA-ADAM3, respectively. For DUSMC, operational AI-GAMFS has a latitude-weighted RMSE of 82.5 μg m⁻³ at a 72-h lead time, which is approximately 34.4%, 42.7% and 60.3% lower than that of FMI-SILAM, KMA-ADAM3 and CMA-CUACE, respectively; likewise, a more substantial reduction of approximately 74.1% is observed against JMA-MASINGAR. This regional advantage of operational AI-GAMFS over physics-based dust models is further confirmed by 1-year records of AODc from the AERONET Beijing-CAMS site and AOD from four CARSNET sites in the northwestern desert region of China (Supplementary Figs. 1 and 2a).

Taking the mega dust storm in northern China in April 2023 as an example, we found that operational AI-GAMFS can reliably reproduce the entire dust transport process, including the affected areas and the intensity (Extended Data Fig. 5). This is further confirmed by better statistical metrics compared with those of other models. More importantly, operational AI-GAMFS not only forecasts dust transport paths within 1–2 days, but also forecasts enhanced dust emissions in the Gobi Desert up to 3–4 days in advance. Typically, it is a challenge for regional dust forecasting models to capture such features.

Aerosol component forecasting

In addition to forecasting AOD and dust-related properties, AI-GAMFS simultaneously forecasts TSAOD, the optical properties of other aerosol components (that is, sulfates, BC, OC and SS) and their surface concentrations. These component forecasts enable precise assessments of their specific impacts on climate, air quality and public health. We used conventional GEOS-FP as a reference baseline because it represents state-of-the-art atmospheric aerosol component forecasting and provides output configurations fully consistent with operational AI-GAMFS. Using MERRA-2 data collected from July to August 2024 as reference, we evaluate accuracy using the spatial R and latitude-weighted RMSE values, as shown in Fig. 3a,b. Additional metrics for surface and upper-level meteorological variables are provided in Extended Data Fig. 6.

**Fig. 3: Comparison of global aerosol component forecast accuracy between operational AI-GAMFS and GEOS-FP.**

The scorecards indicate that operational AI-GAMFS delivers exceptional forecasting performance across all 12 aerosol variables. For the first 1–3 days, operational AI-GAMFS outperforms GEOS-FP in terms of all variables and at all lead times, except for BCSMC and OCSMC at specific time points (based on the R value). At longer lead times, AI-GAMFS consistently outperforms GEOS-FP, except for two SS-related variables: SSAOD and SSSMC. Aerosol component forecasts are highly sensitive to the accuracy of weather forecasts. Although operational AI-GAMFS does not surpass GEOS-FP in terms of the accuracy of certain meteorological variables, such as wind speed, sea-level pressure and temperature, improvements in the forecast accuracy of key variables—such as specific humidity and precipitation—that influence aerosol emissions, transformation and deposition, enable AI-GAMFS to enhance its aerosol simulations (Extended Data Fig. 6). However, forecast accuracy for wind speed declines beyond 2 days, which negatively impacts the forecast of SS aerosols.

Independent evaluation beyond MERRA-2 was conducted using global aerosol observations from July–August 2024. Using ground-based observations of AOD and AODc from AERONET, AOD from CARSNET, and BCSMC, OCSMC and SUSMC from the Interagency Monitoring of Protected Visual Environments (IMPROVE) network