Balancing Feature Alignment and Uniformity for Few-Shot Classification

Solving Few-Shot Classification Problems with Balanced Feature Alignment and Uniformity

Background and Motivation

The goal of Few-Shot Learning (FSL) is to correctly recognize new samples with only a few examples from new classes. Existing few-shot learning methods mainly learn transferable knowledge from base classes by maximizing the information between feature representations and their corresponding labels. However, this approach may lead to a “supervision collapse” problem due to biases in the base classes. This paper proposes a solution by preserving the intrinsic structure of the data and learning a generalized model applicable to new classes. The study follows the principle of information maximization to balance the capture of class-specific information and cross-class general features by maximizing the mutual information (MI) between samples and their feature representations, as well as between feature representations and their class labels.

Paper Source

Neural Network Structure This paper was authored by Yunlong Yu, Dingyi Zhang, Zhong Ji (IEEE Senior Member), Xi Li (IEEE Senior Member), Jungong Han (IEEE Senior Member), and Zhongfei Zhang (IEEE Fellow). It was published in IEEE Transactions on Image Processing in August 2023 and has been officially accepted for publication.

Research Workflow

The research workflow includes the following steps:

  1. Data and Sample Selection: Extensive experiments were conducted on multiple few-shot classification benchmark datasets, including MiniImagenet and CIFAR-FS. The 5-way 1-shot and 5-way 5-shot tasks were selected for testing.

  2. Method Overview:

    • A unified framework was employed, using two low-bias estimators to perturb the feature embedding space. The first estimator maximizes the MI between pairs of samples within the same class. The second estimator maximizes the MI between a sample and its augmented view.
    • Detailed mathematical formulations described the objective functions at each stage by combining inter-class knowledge distillation and diversity expansion of feature representations.
  3. Experimental Methods:

    • During training, cross-entropy loss, feature alignment loss, mutual knowledge distillation loss, and self-supervised loss were used.
    • ResNet12 and ResNet18 were used as feature extractors to evaluate the effectiveness of the method.

Research Results

Significant results were achieved in the following aspects:

  1. Model Performance:

    • On the 5-way 1-shot task on the MiniImagenet dataset, the BF competitive model achieved an accuracy of 69.53%, and on the CIFAR-FS dataset, it achieved an accuracy of 77.06%. The performance is very close to or even better than the current best methods.
  2. Results Interpretation and Logical Relationships:

    • By maximizing different types of mutual information, the study demonstrates the importance of balancing retaining the intrinsic structure of data and capturing class-specific information, which is significant for addressing the “supervision collapse” issue.
    • Experimental results show that combining knowledge distillation and feature perturbation effectively enhances the generalization ability of the model.

Conclusion and Research Significance

The proposed method achieves a good balance between feature alignment and uniformity, successfully solving the “supervision collapse” problem in traditional FSL methods. The results indicate that the proposed method not only has theoretical novelty and effectiveness but also significantly improves model performance in few-shot tasks in practice.

  1. Scientific Value:

    • Proposes a method that combines information theory with few-shot learning, enhancing model performance by maximizing mutual information.
    • Provides a new approach to solving the “supervision collapse” problem, aiding the research and development of future FSL methods.
  2. Application Value:

    • The proposed method has good transferability and practical application value, enabling efficient application in computer vision and other fields requiring few-shot learning.

Research Highlights

  1. By using two low-bias estimators for feature embedding perturbation, this work achieved inter-class knowledge distillation and feature alignment of augmented views based on mutual information for the first time.
  2. Proposed a simple and efficient framework that does not require pre-training of a teacher model, simplifying the training process and improving computational efficiency.

Other Valuable Information

The study also conducted multiple data augmentation experiments, showing that using rotation augmentation can effectively enhance the generalization ability of the model. Further qualitative analysis indicated that perturbing the feature embedding space and introducing self-supervised methods can significantly alleviate the “supervision collapse” problem, offering better robustness and generalization performance when the model faces new class samples.