Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

A New Perspective on Domain Adaptive Semantic Segmentation: T2S-DA Study

Background and Significance

Semantic segmentation plays a crucial role in computer vision, but its performance often relies on extensive labeled data. However, acquiring labeled data is costly, especially in complex scenarios. To address this, many studies turn to synthetic datasets to reduce annotation demands. However, due to domain discrepancies (domain gap), models trained on synthetic data struggle to generalize to real-world scenarios. In this context, unsupervised domain adaptation (UDA) methods have become effective, aiming to transfer knowledge from labeled source domains to unlabeled target domains.

Traditional UDA methods are mainly divided into two categories: adversarial training and self-training. Adversarial training reduces domain differences through distribution alignment, while self-training uses pseudo-labels for direct supervision in the target domain. However, these methods have limitations, such as noisy pseudo-labels or difficulty ensuring cross-domain features are clearly separated by category.

This paper proposes a new approach: by “pulling target features closer to source features,” it leverages source domain data to build category-discriminative feature representation spaces, indirectly improving the target domain’s feature expression capabilities. Based on this idea, the authors propose T2S-DA (Pulling Target to Source for Domain Adaptation), providing a more general and efficient solution for domain adaptive semantic segmentation.

Study Origin

This paper was published in the International Journal of Computer Vision, authored by researchers from the Institute of Automation, Chinese Academy of Sciences, Hong Kong Institute of Science & Innovation, and SenseTime Research. The initial draft was received on December 28, 2023, and the final version was accepted on October 22, 2024. Authors include Haochen Wang, Yujun Shen, Jingjing Fei, and others.

Methodology and Research Process

Framework and Innovations

Overview of T2S-DA

The core idea of T2S-DA is to explicitly bring target domain features closer to source domain features using the source domain as an anchor rather than directly supervising the target domain. Key components of the framework include:

  1. Pseudo-Target Image Generation: Using image translation engines (e.g., FDA, Fourier Domain Adaptation) to convert source domain data into pseudo-target images in the target domain style while retaining their annotations to ensure accurate cross-domain feature matching.
  2. Dynamic Reweighting Strategy: Addressing class imbalance in semantic segmentation datasets by dynamically adjusting class weights in the loss function, prioritizing poorly performing classes.
  3. Contrastive Learning Objective: Calculating feature similarity between the source and pseudo-target domains and optimizing the model’s feature learning using mean squared error (MSE) or information gain (InfoNCE).

Dynamic Reweighting Strategy for Class Imbalance

Semantic segmentation often suffers from highly imbalanced class distributions. For instance, classes like “sky” and “road” occupy a large proportion of pixels, while classes like “pole” or “sign” are scarce. T2S-DA proposes a dynamic reweighting strategy based on class confidence, focusing more optimization efforts on underperforming classes to improve the model’s overall generalization performance.

Datasets and Experimental Design

The effectiveness of T2S-DA was validated on several benchmarks:

  • GTA5 → Cityscapes: Transferring from synthetic urban scenes to real-world urban scenes.
  • SYNTHIA → Cityscapes: Transferring from synthetic virtual city images to real data.

Two network structures were used for comparison: convolution-based DeepLab-V2 (with ResNet-101 as encoder) and Transformer-based DAFormer (with MIT-B5 as encoder).

Data Processing and Training Details

  • Image Preprocessing: Source domain images were resized, randomly cropped, and pseudo-target images were generated using Fourier Domain Adaptation.
  • Optimization and Training Parameters: AdamW optimizer was used with different learning rates and weight decay strategies, linear learning rate warmup, and dynamic updates to improve model performance.

Experimental Results and Analysis

Performance on Domain Adaptation Tasks

T2S-DA significantly outperformed existing state-of-the-art methods on both GTA5 → Cityscapes and SYNTHIA → Cityscapes benchmarks:

  • On the GTA5 → Cityscapes task, T2S-DA achieved 75.1% mIoU, surpassing the current SOTA method HRDA by +1.3%.
  • On the SYNTHIA → Cityscapes task, T2S-DA improved mIoU by +2.5% (16 classes) and +2.1% (13 classes).

Further analysis revealed that T2S-DA excelled particularly in long-tail categories (e.g., “train” and “sign”), which benefited directly from its dynamic reweighting strategy.

Performance on Domain Generalization Tasks

In domain generalization tasks, where target domain images are inaccessible during training, T2S-DA also demonstrated outstanding performance, further verifying its domain-invariant properties. Compared to other methods (e.g., ISW and SHADE), T2S-DA significantly improved mIoU on the Cityscapes dataset.

Ablation Studies and Feature Visualization

  1. Contrastive Learning Direction: Experiments showed that “pulling target to source” outperformed “pulling source to target,” as source domain features are inherently more category-discriminative.
  2. Feature Distribution Analysis: Using t-SNE visualization, T2S-DA was shown to construct more strongly separated feature spaces for the target domain categories.
  3. Dynamic Reweighting and Sampling Strategies: Both strategies significantly improved performance, especially for underperforming classes.

Implications and Future Directions

The T2S-DA method not only enhances performance in domain adaptation semantic segmentation tasks but also exhibits strong domain generalization capabilities, providing valuable insights for future research.

Potential future directions include:

  1. Optimization of Pseudo-Target Generation Models: Improving the realism and semantic alignment of pseudo-target images, possibly using GANs or diffusion models.
  2. Cross-Task Transfer: Applying T2S-DA to other tasks (e.g., object detection or instance segmentation) to validate its generalizability.
  3. Dynamic Optimization Strategies: Developing more refined dynamic adjustment mechanisms to balance model performance across classes better.

T2S-DA offers a novel perspective for domain adaptation and generalization research. Its significant performance improvements and broad applicability are expected to have a profound impact on the field of computer vision.