Boosting Few-Shot Semantic Segmentation with Prior-Driven Edge Feature Enhancement Network

A New Approach to Enhance Few-Shot Semantic Segmentation: Prior-Driven Edge Feature Enhancement Network

In the field of artificial intelligence, semantic segmentation is a core technology in computer vision that aims to assign semantic category labels to every pixel in an image. However, traditional semantic segmentation methods rely on large amounts of labeled data for training, limiting their applicability in scenarios where annotated samples are scarce. For example, in medical imaging and autonomous driving, there is often a need to segment specific categories with limited data while achieving accurate results. Against this backdrop, few-shot semantic segmentation (FSS) has emerged as a promising technology, which aims to perform high-quality semantic segmentation with only a few labeled samples.

Prior-Driven Edge Feature Enhancement Network
However, compared to general semantic segmentation models, FSS still performs insufficiently in the accurate prediction of target boundaries. This is because, when the sample size is extremely small, the features extracted by the model from query images often lack sufficient details to focus on the boundary regions of the target. To address this challenge, this paper proposes a Prior-Driven Edge Feature Enhancement Network (PDEFE), which utilizes prior information of object boundaries to enhance query features, thereby improving the accuracy of target segmentation.

This article, authored by Jingkai Ma, Shuang Bai, and Wenchao Pan from Beijing Jiaotong University, was published in the January 2025 issue of IEEE Transactions on Artificial Intelligence. The work has drawn widespread attention in academia by introducing a novel method to address boundary prediction challenges in FSS, offering innovative ideas and results.


Research Background and Challenges

Recent years have seen significant progress in semantic segmentation technology due to the rapid development of deep learning, including classic models such as Fully Convolutional Networks (FCNs), DeepLab, and UNet. However, these methods heavily rely on large-scale labeled datasets for training, which limits their usability in data-scarce scenarios. To address this challenge, few-shot semantic segmentation (FSS) was introduced.

Current mainstream FSS methods are primarily based on the meta-learning paradigm and can be categorized into two directions: (1) Prototype-Based methods, which generate class prototypes from support images and match them to query image features; and (2) Spatial Correlation-Based methods, which explore feature spatial relationships between support and query images. However, both approaches struggle to extract sufficient details of target boundaries in few-shot scenarios, leading to suboptimal boundary segmentation accuracy.

To address the above issues, prior studies attempted to incorporate edge information to enhance segmentation accuracy. For example, Mceenet introduced an edge-assisted network to enhance query features, but it directly extracts all edges from the query image, including background edges, which may interfere with segmentation results. In comparison, the PDEFE proposed in this paper effectively suppresses the interference of background edges while providing more accurate target-related edge information.


Methodology and Workflow

1. Overview of the Framework

The PDEFE model consists of two core modules: - Edge Feature Enhancement Module (EFEM): Enhances the boundary regions of query features by utilizing target edge information. - Edge Prior Mask Generator (EPMG): Generates edge prior masks based on image gradient information to guide the model in focusing on detailed target edges.

The entire process is integrated into a classic meta-learning framework (e.g., PFENet). The middle and high-level features of the support and query images are extracted via a backbone network (e.g., ResNet). With the incorporation of EFEM and EPMG modules, highly accurate segmentation results are ultimately produced through a decoder.


2. Edge Feature Enhancement Module (EFEM)

The primary goal of the EFEM is to enhance the query feature details near target boundaries by leveraging edge information. Specifically:

  1. Edge Information Extraction: A pretrained Holistically-Nested Edge Detection (HED) model is used to extract binary edge masks from query images.
  2. Foreground Edge Filtering: As edge detection may include background clutter, EFEM utilizes a classification head (sharing parameters with the decoder) to generate coarse segmentation results, which are then used to filter out irrelevant background edges.
  3. Multi-Scale Fusion: An Atrous Spatial Pyramid Pooling (ASPP) module is introduced to extract rich object edge information from multi-scale query features.
  4. Edge Enhancement: The extracted edge information is fused with the original query features using convolution operations, resulting in enhanced query features with more expressive boundary details.

This module specifically addresses the challenge of insufficient boundary detail extraction caused by few-shot constraints.


3. Edge Prior Mask Generator (EPMG)

Since high-level features are semantically rich but lack boundary details, the EPMG generates edge prior masks (EPMs) based on image gradient information, providing additional edge details to guide the model in better segmenting targets. The workflow includes:

  1. Gradient Information Extraction: Gradients in the x and y directions are computed using the Sobel operator for both the support and query images. The support image gradients are computed based on annotated masks to eliminate background interference.
  2. Edge Similarity Calculation: An Edge Similarity Calculator (ESC) computes pixel-wise edge similarities based on the gradient information of the support and query images, generating a relevance mask for query image edges.
  3. Mask Normalization: The computed relevance mask is normalized to form the target’s Edge Prior Mask (EPM), which, combined with other features, helps guide the model in segmenting target areas more accurately.

The innovation of this module lies in leveraging gradient information to capture target edges, compensating for the limitations of traditional methods in capturing details.


Experiments and Results

1. Datasets and Evaluation Metrics

This study evaluates the proposed model on two standardized few-shot segmentation datasets: Pascal-5i and COCO-20i. The evaluation metrics include: - Mean Intersection over Union (mIoU): - Foreground-Background IoU (FB-IoU): Measures overlap between foreground and background pixels, addressing imbalances that occur with traditional mIoU.

2. Experimental Results

Pascal-5i Dataset

Under both one-shot and five-shot segmentation settings, the PDEFE achieves significant improvements compared to mainstream methods (e.g., Mceenet and CFENet). For instance, with ResNet-50 as the backbone, PDEFE achieves an mIoU of 68.9%, outperforming Mceenet by 5.4%.

COCO-20i Dataset

Compared to classic models like DBMNet and RIFENet, the PDEFE demonstrates superior performance across various conditions. In the five-shot support setting, the new method achieves a peak mIoU of 55.9%, showcasing excellent generalization capabilities.


3. Ablation Study and Method Validation

To validate the contributions of EFEM and EPMG, ablation experiments were conducted. The results show: - Incorporating EFEM significantly improves the boundary segmentation accuracy of the model. - Combining EPMG further enhances the boundary details in query features, achieving precise target segmentation.


Research Significance

This study not only surpasses existing edge-assistance methods in methodological innovation but also demonstrates substantial application potential. Specifically: 1. Scientific Value: Provides a clear technical roadmap for leveraging edge information in FSS. 2. Practical Value: Offers guidance for tasks requiring precise boundary segmentation, such as medical imaging and autonomous driving.


Conclusion

By introducing EFEM and EPMG modules, PDEFE provides a novel solution for FSS, excelling particularly in enhancing target boundary details. This work not only advances FSS technology but also offers valuable insights for other fields such as salient object detection and edge detection. Future directions include leveraging stronger pretrained models (e.g., SAM) to enhance edge extraction or exploring automated mechanisms for optimal edge selection.