Salient Object Detection in Low-Light RGB-T Scene via Spatial-Frequency Cues Mining

Target Detection in RGB-T Scenarios

Salient Object Detection in Low-Light RGB-T Scenarios by Mining Spatial-Frequency Cues

Salient Object Detection (SOD) holds a significant position in the field of computer vision. Its main task is to identify the most visually attractive regions or objects in an image. Although SOD models have made certain progress in normal lighting environments over the past decades, they still face severe challenges under low-light conditions. In low-light environments, the lack of photons leads to a loss of image details, severely impacting SOD performance. This challenge is especially prominent in practical applications such as intelligent surveillance and autonomous driving.

In recent years, RGB-T (visible light and thermal infrared images) systems have attracted increasing attention from researchers due to their invariant properties of thermal infrared under low-light conditions. By leveraging RGB-T images, researchers have developed some SOD models that, by fusing visible light and thermal infrared cues, have alleviated target detection issues under low-light conditions to some extent. However, most existing models largely focus only on the fusion of spatial features, neglecting information from frequency differences. To address this issue, a collaborative research team proposed a novel SOD model—SFMNet, which improves SOD performance under low-light conditions by mining spatial-frequency cues.

Source and Author Information

This paper is co-authored by Huihui Yue, Jichang Guo, Xiangjun Yin, Yi Zhang, and Sida Zheng from the School of Electrical and Information Engineering at Tianjin University. They are active in fields related to computer vision, pattern recognition, and deep learning. This paper will be published in the Neural Networks journal in 2024. The paper was received, revised, and accepted on April 27, 2023, January 26, 2024, and May 21, 2024, respectively.

Research Background and Problem

The performance of existing RGB-T SOD models under low-light conditions is limited by the fusion of spatial features and has not fully utilized information in the frequency domain. Research indicates that capturing frequency domain features can retain effective information about object distribution. Considering these shortcomings, the research team proposed a new model that improves SOD performance by mining spatial-frequency cues.

Research Process

Spatial-Frequency Feature Exploration Module (SFFE)

To acquire both spatial and frequency cues simultaneously, the researchers designed the SFFE module. The SFFE module separates spatial and frequency features from RGB and thermal infrared images, and adaptively selects high-frequency and low-frequency cues. Specifically, by using frequency decoupling and adaptive dynamic feature selection strategies, high-frequency and low-frequency information are separated and selected.

  1. Frequency Decoupling

    • The research team first extracted frequency domain results through the Discrete Cosine Transform (DCT) and then extracted high-frequency and low-frequency information from the feature maps using threshold functions.
  2. Adaptive Dynamic Feature Selection

    • The most advantageous high-frequency and low-frequency information is selected as needed. The Channel-Spatial Attention (CSA) mechanism enhances auxiliary features, and the final frequency domain features are generated through a progressive fusion approach.

Spatial-Frequency Feature Interaction Module (SFFI)

The SFFI module aims to fuse spatial-frequency information from RGB and thermal infrared images. By integrating cross-modal and cross-domain information, it progressively generates accurate saliency predictions.

  1. Hybrid Modality Double Phase

    • Multimodal input from the spatial and frequency domains are feature-fused through multi-scale and multi-group fusion. Multimodal features of the same scale across all channels are fused using convolutional kernels, and the final result is obtained through adaptive fusion.
  2. Multi-Domain Fusion Phase

    • Each level of feature fusion involves integrating both spatial and frequency domain information to fully capture multi-domain information. The final output is generated by fusing the features from the previous level along with foreground and background features.

Experimental Results

To validate the new model, the research team constructed the first SOD dataset for low-light RGB-T scenarios and conducted extensive experiments. The experimental results show that SFMNet significantly outperforms existing models in detection accuracy under low-light conditions. Specifically, across different datasets, SFMNet achieved the highest accuracy on multiple evaluation metrics such as the maximum Fβ score and mean absolute error.

  1. Quantitative Evaluation

    • By comparing 13 state-of-the-art SOD methods, SFMNet demonstrated superior performance across five metrics: PR curve, maximum Fβ score, E-measure, structural similarity, and mean absolute error.
  2. Qualitative Evaluation

    • Under various complex backgrounds, diverse object sizes, and cluttered edges in low-light environments, SFMNet showed stronger target detection capabilities, with more accurate and complete saliency predictions.
  3. Complexity Analysis

    • Although SFMNet has a moderate number of parameters, it excels in computational complexity, demonstrating high computational efficiency.

Contributions and Highlights

  1. Innovative Model

    • Proposed a new RGB-T SOD model, SFMNet, which achieves high-accuracy target detection under low-light conditions by mining spatial-frequency cues.
  2. New Module Design

    • Designed SFFE and SFFI modules for mining spatial-frequency features and integrating cross-domain information, respectively.
  3. New Dataset

    • Constructed the first low-light RGB-T SOD dataset, providing a benchmark for related research.

Conclusion

This research offers new approaches and methods for enhancing salient object detection under low-light conditions by introducing frequency cues and adaptive dynamic feature selection. SFMNet not only holds significant value in scientific research but also provides effective support for industrial applications in intelligent surveillance and disaster prevention. Future research could further optimize the model’s performance in extremely complex scenarios, enhancing its robustness and practicality.

This study provides new perspectives and breakthroughs in the field of salient object detection. It is anticipated that more research and applications will benefit from these findings in the future.