Advancing Hyperspectral and Multispectral Image Fusion: An Information-Aware Transformer-Based Unfolding Network

Information-aware Transformer Unfolding Network for Hyperspectral and Multispectral Image Fusion

Background Introduction

Hyperspectral images (HSIs) play a crucial role in remote sensing applications such as material identification, image classification, target detection, and environmental monitoring, due to their spectral information across multiple bands. However, due to sensor hardware limitations, there is a trade-off between spatial resolution and spectral resolution in actual imaging processes. Specifically, imaging sensors can only provide images with rich spectral information (Low-Resolution Hyperspectral Image, LR-HSI) or images with high spatial resolution but less spectral information (High-Resolution Multispectral Image, HR-MSI). To obtain High-Resolution HSI (HR-HSI), researchers have proposed methods to fuse LR-HSI and HR-MSI, known as MSI-HSI fusion. MSI-HSI fusion has garnered extensive attention in remote sensing image processing.

Structure of ITU-Net

Paper Source

The paper “Advancing Hyperspectral and Multispectral Image Fusion: An Information-aware Transformer-based Unfolding Network” was published in IEEE Transactions on Neural Networks and Learning Systems, authored by Jianqiao Sun, Bo Chen, Ruiying Lu, Ziheng Cheng, Chunhui Qu from Xi’an University of Electronic Science and Technology, and Xin Yuan from Westlake University. The paper was received on June 20, 2023, revised on January 18, 2024, and accepted on May 1, 2024.

Research Procedure

Detailed Research Procedure

In hyperspectral image processing research, deep unfolding methods based on Convolutional Neural Networks (CNNs) have shown relatively good performance. However, due to the limited receptive field of CNNs, there are limitations in capturing long-range spatial features. Additionally, the inherent characteristics of input and output images at each stage limit feature transmission, thereby restricting overall performance. To address these issues, this paper proposes a novel information-aware Transformer-based unfolding network (ITU-Net) for modeling long-range dependencies and transmitting more information at each stage. Specifically, ITU-Net uses custom Transformer blocks to learn representations from both the spatial and frequency domains, while avoiding quadratic complexity with respect to input length. To extract spatial features, this paper develops Information Transmission-guided Linear Attention (ITLA) to transmit high-throughput information between adjacent stages and extract contextual features along the spatial dimension under linear complexity. Additionally, this paper introduces frequency domain learning in the Feed-Forward Network (FFN) to capture token variations in images and reduce frequency gaps.

Experimental Design

The research subjects include synthetic and real hyperspectral datasets. The experiments include the following steps:

  1. Dataset Selection and Preprocessing: Select three synthetic datasets (CAVE, Chikusei, and Harvard) and two real datasets (Worldview-3 and Worldview-2). Preprocess the synthetic data to generate training and validation samples.

  2. Feature Extraction and Model Training: Use a lightweight network to extract spatial-spectral exterior features from LR-HSI and HR-MSI to provide inputs for each unfolding stage. Replace the conventional FFN with a frequency domain learning module that introduces Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT) to improve nonlinear mapping capabilities. Design the optimization problem and use the variable splitting algorithm to unfold the reconstruction process, transforming it into a network structure to achieve progressive refinement of HR-HSI.

  3. Performance Evaluation: Conduct extensive quantitative and qualitative evaluations of the proposed model on synthetic and real datasets and compare it with 17 state-of-the-art methods.

Research Results

Compared with other methods, the proposed model shows outstanding performance on multiple datasets, as follows:

  1. Synthetic Datasets: On the CAVE and Chikusei datasets, the method performs best or competitively across different scale factors (4, 8, 16, 32). On the Harvard dataset, the model demonstrates good generalization ability when directly using CAVE dataset’s trained model without fine-tuning.

  2. Real Datasets: On the Worldview-3 and Worldview-2 datasets, ITU-Net also outperforms other leading methods.

Main Findings and Conclusions

Research Conclusions

The proposed information-aware Transformer-based unfolding network (ITU-Net) effectively addresses the long-range dependency modeling and feature transmission issues in HSI and MSI fusion by extracting features from both spatial and frequency domains and transmitting high-throughput information at each stage. The experimental validation shows that the proposed method achieves superior quantitative and qualitative performance on both synthetic and real datasets. The research results indicate that the Transformer-based unfolding framework not only demonstrates excellent performance in hyperspectral and multispectral image fusion but also provides meaningful technical support for practical remote sensing applications.

Research Value

The method proposed in this paper not only excels in hyperspectral and multispectral image fusion tasks but also demonstrates remarkable generalization ability. Notably, the paper leveraged Transformer blocks within an unfolding framework, significantly enhancing feature extraction and information transmission efficiency and accuracy. This novel approach offers new perspectives and methods for remote sensing image processing, possessing significant scientific and practical value.

Highlights

  1. Novelty: Introduced an information-aware linear attention mechanism that retains traditional quantitative features while significantly enhancing computational efficiency.
  2. Practicality: Demonstrated outstanding performance across both synthetic and real datasets, verifying the proposed method’s broad adaptability and generality.
  3. Technical Innovation: Enhanced FFN performance with frequency domain learning modules and combined Transformer with unfolding frameworks to achieve higher precision and lower computational complexity.