Deep learning (DL) has become a mainstream method of hyperspectral image (HSI) classification. Many DL-based methods exploit spatial-spectral features to achieve better classification results. However, due to the complex backgrounds in HSIs, existing methods usually show unsatisfactory performance for the class pixels located on the land-cover category boundary area. In large part, this is because the network is susceptible to interference by the irrelevant information around the target pixel in the training stage, resulting in inaccurate feature extraction. In this article, a new multibranch transformer architecture (spectral spatial transformer (SST)-M) that assembles spatial attention and extracts spectral features is proposed to address this problem. The transformer model has a global receptive field and thus can integrate global spatial position information in the HSI cube. Meanwhile, we design a spatial sequence attention model to enhance the useful spatial location features and weaken invalid information. Considering that HSIs contain considerable spectral information, a spectral feature extraction model is designed to extract discriminative spectral features, replacing the widely used principal component analysis (PCA) method and obtaining better classification results than it. Finally, inspired by semantic segmentation, a mask prediction model is designed to classify all of the pixels in the HSI cube; this guides the neural network to learn precise pixel characteristics and spatial distributions. To verify the effectiveness of our algorithm (SST-M), quantitative experiments were conducted in three well-known datasets, namely, Indian Pines (IP), University of Pavia (PU), and Kennedy Space Center (KSC). The experimental results demonstrate that the proposed model achieves better performance than the other state-of-the-art methods.
|Number of pages||17|
|Journal||IEEE Transactions on Geoscience and Remote Sensing|
|Publication status||Published - Aug 2022|