A new machine-learning-based analysis for improving satellite-retrieved...
Despite recent progress, satellite retrievals of anthropogenic SO2 still suffer from relatively low signal-tonoise ratios. In this study, we demonstrate a new machine learning data analysis method to improve the quality of satellite SO2 products. In the absence of large ground-truth datasets for SO2 , we start from SO2 slant column densities (SCDs) retrieved from the Ozone Monitoring Instrument (OMI) using a data-driven, physically based algorithm and calculate the ratio between the SCD and the root mean square (rms) of the fitting residuals for each pixel. To build the training data, we select presumably clean pixels with small SCD / rms ratios (SRRs) and set their target SCDs to zero. For polluted pixels with relatively large SRRs, we set the target to the original retrieved SCDs. We then train neural networks (NNs) to reproduce the target SCDs using predictors including SRRs for individual pixels, solar zenith, viewing zenith and phase angles, scene reflectivity, and O3 column amounts, as well as the monthly mean SRRs. For data analysis, we employ two NNs: (1) one trained daily to produce analyzed SO2 SCDs for polluted pixels each day and (2) the other trained once every month to produce analyzed SCDs for less polluted pixels for the entire month. Test results for 2005 show that our method can significantly reduce noise and artifacts over background regions. Over polluted areas, the monthly mean NN-analyzed and original SCDs generally agree to within ±15 %, indicating that our method can retain SO2 signals in the original retrievals except for large volcanic eruptions. This is further confirmed by running both the NN-analyzed and original SCDs through a topdown emission algorithm to estimate the annual SO2 emissions for ∼ 500 anthropogenic sources, with the two datasets yielding similar results. We also explore two alternative approaches to the NN-based analysis method. In one, we employ a simple linear interpolation model to analyze the original SCD retrievals. In the other, we develop a PCA–NN algorithm that uses OMI measured radiances, transformed and dimension-reduced with a principal component analysis (PCA) technique, as inputs to NNs for SO2 SCD retrievals. While the linear model and the PCA–NN algorithm can reduce retrieval noise, they both underestimate SO2 over polluted areas. Overall, the results presented here demonstrate that our new data analysis method can significantly improve the quality of existing OMI SO2 retrievals. The method can potentially be adapted for other sensors and/or species and enhance the value of satellite data in air quality research and applications.