Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person Re-Identification

Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person Re-Identification

Background Introduction

Visible-Infrared Person Re-Identification (VI-ReID) is an important research direction in the field of computer vision, aiming to retrieve images of the same pedestrian from different modalities (visible and infrared). This task has broad application prospects in intelligent surveillance systems, especially in low-light or nighttime conditions where infrared images can provide additional information. However, most existing VI-ReID methods rely on annotated data, which is time-consuming and labor-intensive to obtain. Therefore, unsupervised VI-ReID methods have become an important research direction.

Existing unsupervised VI-ReID methods mainly focus on establishing cross-modal pseudo-label associations to bridge the modality gap. However, these methods often ignore the homogeneous and heterogeneous consistency between the feature space and the pseudo-label space, resulting in coarse pseudo-label associations. To address this issue, this paper proposes a Modality-Unified Label Transfer (MULT) module, which simultaneously considers homogeneous and heterogeneous fine-grained instance-level structures to generate high-quality cross-modal pseudo-label associations.

Paper Source

This paper is co-authored by Lingfeng He, De Cheng, Nannan Wang, and Xinbo Gao, from Xidian University and Chongqing University of Posts and Telecommunications. The paper was submitted on April 25, 2024, and accepted on November 29, 2024, published in the International Journal of Computer Vision.

Research Process and Experimental Design

1. Modality-Unified Label Transfer (MULT) Module

The core idea of the MULT module is to model the homogeneous and heterogeneous affinities between instances, quantify the inconsistency between the pseudo-label space and the feature space, and generate high-quality cross-modal pseudo-labels by minimizing this inconsistency. Specifically, the MULT module achieves this through the following steps:

  1. Affinity Modeling: MULT first models homogeneous and heterogeneous affinities through instance-pair relationships in the feature space. Homogeneous affinities are calculated using Jaccard similarity, while heterogeneous affinities are modeled through the Optimal Transport (OT) problem.

  2. Inconsistency Formulation: Based on the affinity matrices, MULT defines homogeneous and heterogeneous inconsistency terms. These inconsistency terms are used to measure the discrepancy between the pseudo-label space and the feature space.

  3. Label Transfer: MULT alternately updates pseudo-labels to gradually minimize the inconsistency terms. In each iteration, the pseudo-label information of an instance interacts with its intra-modal and cross-modal counterparts, resulting in more accurate pseudo-labels.

2. Online Cross-Memory Label Refinement (OCLR) Module

To further reduce the negative impact of noisy pseudo-labels, this paper proposes an Online Cross-Memory Label Refinement (OCLR) module. OCLR learns self-consistency among predictions from multiple memory prototypes, thereby reducing the modality gap. Specifically, the OCLR module optimizes pseudo-labels by leveraging predictions from different memory prototypes through contrastive learning.

3. Alternative Modality-Invariant Representation Learning (AMIRL) Framework

To fully utilize the pseudo-labels generated by MULT, this paper proposes an Alternative Modality-Invariant Representation Learning (AMIRL) framework. AMIRL conducts contrastive learning using both intra-modal and cross-modal memory banks. Additionally, AMIRL introduces an auxiliary memory bank to learn the structural information of cross-modal pseudo-labels, further optimizing feature representations.

Experimental Results and Conclusions

Experiments were conducted on two public visible-infrared datasets, SYSU-MM01 and RegDB. The results show that the proposed method significantly outperforms state-of-the-art methods in unsupervised VI-ReID tasks. Specifically, the proposed method achieves Rank-1 accuracy and mAP of 64.77% and 59.23% on the SYSU-MM01 dataset, and 89.95% and 82.09% on the RegDB dataset, respectively.

Main Contributions

  1. MULT Module: The proposed MULT module generates homogeneous and heterogeneous consistent cross-modal pseudo-labels through instance-level contextual structures. The generated pseudo-labels not only maintain cross-modal alignment but also contain rich intra-modal information.

  2. OCLR Module: The designed OCLR module learns cross-memory self-consistency online, effectively reducing the negative impact of noisy labels and further narrowing the modality gap.

  3. AMIRL Framework: The proposed AMIRL framework fully exploits the pseudo-labels generated by MULT, further optimizing feature representations through contrastive learning.

Research Highlights

  1. High-Quality Cross-Modal Pseudo-Label Associations: The pseudo-labels generated by the MULT module are of high quality and can effectively guide the network to learn cross-modal feature representations.

  2. Reducing the Impact of Noisy Labels: The OCLR module optimizes pseudo-labels online, effectively reducing the negative impact of noisy labels on model training.

  3. Modality-Invariant Feature Learning: The AMIRL framework leverages cross-modal pseudo-labels through an alternative training scheme, further improving model performance.

Future Work

Future research will explore more robust cross-modal label association methods based on the proposed MULT module to further enhance the performance of unsupervised VI-ReID tasks.

Through this research, we not only propose a new unsupervised VI-ReID method but also provide new ideas and approaches for future cross-modal learning tasks.