Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning

Academic Background and Problem Statement

Underwater images have significant application value in fields such as marine exploration, underwater robotics, and marine life identification. However, due to the refraction and absorption of light by water, underwater images often suffer from low contrast and color distortion, which severely impacts the accuracy of subsequent perception tasks (e.g., object detection, semantic segmentation). Existing underwater image enhancement methods primarily focus on improving visual quality while neglecting the practical implications of enhanced images in downstream tasks. Therefore, finding a balance between visual quality improvement and practical application has become a significant challenge in current research.

To address this issue, this paper proposes a heuristic invertible network for underwater perceptual enhancement (HUPE). This method not only enhances the visual quality of underwater images but also extracts task-oriented semantic features through a semantic collaborative learning module, thereby better serving subsequent perception tasks.

Paper Source and Author Information

This paper is co-authored by Zengxi Zhang, Zhiying Jiang, Long Ma, Jinyuan Liu, Xin Fan, and Risheng Liu, affiliated with the School of Software Engineering at Dalian University of Technology, the College of Information Science and Technology at Dalian Maritime University, and the Pazhou Laboratory (Huangpu). The paper was accepted on November 26, 2024, and published in the International Journal of Computer Vision.

Research Process and Methodology

1. Heuristic Invertible Network (HIN)

One of the core innovations of this paper is the introduction of a heuristic invertible network, which achieves information-preserving enhancement by constructing a bidirectional mapping between underwater images and their clear counterparts. Specifically, the network transforms underwater images into enhanced images through forward mapping, while reverse mapping reduces artifacts and prevents information loss through constraints. Additionally, the network incorporates heuristic prior information (e.g., depth and gradient information) to enhance its adaptability to complex underwater environments.

1.1 Hybrid Invertible Block (HIB)

The hybrid invertible block is the core component of the heuristic invertible network, responsible for embedding heuristic prior information during the enhancement process. Each HIB consists of multiple operations, including ActNorm, 1×1 invertible convolution, heuristic prior injector, frequency-aware affine coupling layer, and feature expansion/compression operations. Through these operations, the network can simultaneously characterize the intrinsic relationship between underwater images and their clear counterparts in both spatial and frequency domains.

1.2 Frequency-Aware Affine Coupling

To enhance the transformation capability of the network, this paper proposes a frequency-aware affine coupling layer. This layer converts the input image from the spatial domain to the frequency domain using Fourier transform, separately processing phase and amplitude information to better capture the semantic and style features of the image.

2. Semantic Collaborative Learning Module (SCL)

To bridge the feature gap between visual enhancement tasks and downstream tasks, this paper introduces a semantic collaborative learning module. This module embeds a meta-feature generator and a feature transition block between the enhancement network and the downstream task network, achieving feature-level collaborative learning. In this way, the enhancement network not only generates visually pleasing images but also extracts high-level semantic information from the images.

2.1 Meta-Feature Generator (MFG)

The meta-feature generator generates meta-features from task-aware features and enhanced features, guiding the enhancement network to extract more semantic information.

2.2 Feature Transition Block (FTB)

The feature transition block converts meta-features into enhanced features by generating feature bridges, further optimizing the output of the enhancement network.

3. Loss Functions

This paper employs multiple loss functions during training, including guide loss (Lg), enhancement loss (Le), and task loss (Lt). The guide loss measures the guiding effect of meta-features on the enhancement network, while the enhancement loss ensures the similarity between enhanced images and reference images through contrastive learning, frequency loss, and bilateral constraints. The task loss is used to optimize the performance of specific perception tasks (e.g., object detection and semantic segmentation).

Experimental Results and Analysis

1. Underwater Image Enhancement Performance

This paper conducts extensive experiments on multiple public datasets (e.g., UIEBD, UCCS, U45, and EUVP) to validate the effectiveness of the HUPE method. The experimental results show that HUPE outperforms state-of-the-art methods in terms of both visual quality and quantitative metrics (e.g., PSNR, SSIM, UCIQE, UIQM, and CEIQ). Particularly in color correction and contrast restoration, HUPE performs exceptionally well, effectively reducing color distortion and artifacts in underwater images.

2. Downstream Perception Task Performance

To verify the applicability of HUPE in downstream perception tasks, this paper conducts experiments on object detection and semantic segmentation tasks. The results show that the enhanced images generated by HUPE perform well in both object detection and semantic segmentation tasks, significantly improving detection accuracy and segmentation precision. Especially in complex underwater environments, HUPE can effectively extract semantic information, thereby better serving downstream tasks.

Conclusion and Significance

This paper proposes a heuristic invertible network for underwater perceptual enhancement (HUPE), achieving the dual goals of visual quality improvement and task-oriented semantic feature extraction through information-preserving invertible transformation and a semantic collaborative learning module. Experimental results demonstrate that HUPE not only outperforms existing methods in visual enhancement but also significantly improves the performance of subsequent perception tasks. The proposed method provides new insights for the field of underwater image processing, with significant scientific value and application prospects.

Research Highlights

  1. Information-Preserving Invertible Network: By constructing a bidirectional mapping between underwater images and their clear counterparts, HUPE can preserve key information during enhancement, reducing artifacts and information loss.
  2. Heuristic Prior Information: By embedding depth and gradient information, HUPE can better adapt to complex underwater environments, enhancing the robustness of the network.
  3. Semantic Collaborative Learning Module: Through feature-level collaborative learning, HUPE not only generates visually pleasing images but also extracts task-oriented semantic information, thereby better serving downstream perception tasks.
  4. Extensive Experimental Validation: HUPE has been extensively validated on multiple public datasets, demonstrating its superiority in both visual enhancement and downstream perception tasks.