Explaining the Better Generalization of Label Distribution Learning for Classification

Understanding Why Label Distribution Learning Exhibits Better Generalization in Classification

Background Introduction

In the fields of artificial intelligence and machine learning, classification problems have always been a central focus for researchers. With the continuous development of multi-label learning (MLL) and single-label learning (SLL), how to effectively handle the complex relationships between labels has become an important challenge. However, traditional single-label learning models often only focus on the most relevant label and neglect the ambiguity and correlation information between labels. This limitation hinders the analysis and resolution of many complex tasks in real-world scenarios.

To address this issue, Label Distribution Learning (LDL) was proposed. Unlike SLL and MLL, LDL assigns a label distribution to each data instance, where each label is associated with a real-valued number indicating its degree of relevance. LDL comprehensively depicts the relationship between instances and labels. By leveraging the rich supervision information in label distributions, LDL effectively addresses the problem of label ambiguity and is particularly suited for applications such as age estimation, emotion recognition, head-pose estimation, noisy label learning, and skin disease severity classification.

Although LDL has demonstrated remarkable advantages in the aforementioned applications, its superior generalization performance compared to SLL has not been thoroughly understood, and the underlying reasons have yet to be explored in depth. To fill this knowledge gap, Jing Wang and Xin Geng published a study titled “Explaining the Better Generalization of Label Distribution Learning for Classification” in the Science China Information Sciences journal in May 2025, systematically investigating this topic.


Paper Source

The study was authored by Jing Wang and Xin Geng, both affiliated with the School of Computer Science and Engineering at Southeast University, and they are also part of the Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications. This research spanned two years, from submission (April 22, 2023), revision (September 10, 2023), to acceptance (September 21, 2023), and was finally published online on January 17, 2025.


Research Process

1. Research Objectives and Innovations

The paper aims to address why LDL demonstrates better generalization in classification compared to SLL, with three core research objectives:

  1. Analyze the fundamental reasons behind the superior generalization of LDL over SLL.
  2. Propose a supporting theory—Label Distribution Margin Theory.
  3. Design a novel LDL-based approach, LDL-LDML (Label Distribution Margin Loss), based on the theory.

2. Methodology and Research Workflow

The study is structured into three main components: theoretical derivation, algorithm development, and experimental validation.

Theoretical Foundation: Label Distribution Margin Theory

The authors investigated the intrinsic relationship between label distribution and sub-optimal labels by introducing the concept of “Label Distribution Margin” and conducted the following theoretical studies:

  • Definition of Label Distribution Margin:
    The label distribution margin calculates the difference in description degrees between the k-th optimal label and the (k+1)-th optimal label in a label distribution. This quantifies the conditions under which a model can correctly approach the k-th optimal label.

  • Proposal of the Label Distribution Margin Theory (Theorem 2):
    The theory proves that when a sufficient condition is met, an LDL model can accurately predict sub-optimal labels even if the optimal label is missed.

  • Generalization Improvement Theory for LDL (Theorem 3):
    The theory further demonstrates that LDL, due to its ability to consider the details in multi-label distributions, always achieves equal or smaller prediction errors compared to SLL.


Algorithm Design: LDL-LDML Method

To validate the theoretical findings, the authors proposed a novel LDL algorithm: LDL-LDML. Its core lies in introducing a Label Distribution Margin Loss (LDML). The optimization objective consists of two parts:

  • Cross-Entropy Loss (CE):
    Ensures the capability to learn the optimal label.

  • Label Distribution Margin Loss (LDML):
    Balances the distribution information between the optimal and sub-optimal labels, enabling the model to make reliable predictions based on sub-optimal labels when the optimal label is missed.

The final optimization objective is defined as: [ l = \sum{i=1}^{N} -\ln p(y{1, xi}) + \lambda \sum{i=1}^{N} \ell_{LDML}(p, x_i) ] where $\lambda$ is a hyperparameter to balance the loss components.


Experimental Validation: Dataset Construction and Comparative Benchmarks

The study selected 16 datasets with label distributions for experiments, including genomic expression data (e.g., Alpha, CDC, and HEAT), image scene datasets (e.g., Scene), emotion recognition databases (e.g., SBU 3DFE and SJAFFE), and aesthetic prediction datasets (e.g., SCUT-FBP and FBP5500).

The experiment was designed with the following comparison schemes:

  1. Comparison of LDL and SLL algorithms:
    Benchmarked standard algorithms like AA-KNN, SA-BFGS against SLL baseline algorithms like KNN and LR to compare generalization performance.

  2. Comparison with existing LDL algorithms:
    Compared against state-of-the-art methods such as LDL-SCL, LDL-LDM, and RWLM-LDL.

  3. Ablation Study:
    Removed the LDML loss term (retaining only CE) to validate its effectiveness.


Research Findings

Through systematic experiments, the authors arrived at the following key conclusions:

1. Theoretical Insights Into Why LDL Surpasses SLL

  • Advantage of Rich Supervision Information:
    In LDL, all labels contain supervised information. Even when the model misses the optimal label, it can still predict sub-optimal labels, significantly improving generalization.

  • Theoretical Support:
    The data validation shows that LDL methods outperform SLL counterparts in most datasets. For example, SA-BFGS performed better than LR in 75% of the 16 datasets.


2. High Efficiency of the LDL-LDML Algorithm

The experimental findings reveal that LDL-LDML demonstrated leading performance across all 16 datasets. It effectively reduces the error probability loss and significantly outperforms existing methods on specific tasks. For example, on the SCUT-FBP dataset, LDL-LDML achieved a 54.05% error rate, surpassing AA-KNN’s 55.10%.


3. Independent Validation of LDML’s Effectiveness

The ablation study highlighted the critical role played by LDML in improving generalization. By comparing models with only cross-entropy loss (CE), LDL-LDML exhibited statistically significant advantages across almost all datasets. This demonstrates that the loss term focusing on sub-optimal labels is essential for LDL’s efficacy.


Research Significance and Academic Value

1. Scientific Significance

  • Filling Theoretical Gaps:
    This study is the first to theoretically explain why LDL exhibits better generalization, addressing a critical missing component in the label distribution learning framework.

  • Introduction of New Theory:
    The label distribution margin theory provides a novel theoretical tool for subsequent LDL studies, helping to analyze the complexity of multi-label distributions.


2. Application Prospects

  • Diverse Applications:
    LDL-LDML shows potential in solving label ambiguity problems in practical tasks such as emotion recognition and skin disease grading.

  • Inspiration for New Model Design:
    The conceptual framework of LDML can be extended to complex tasks, promoting further research and development in multi-label classification.


This paper, with its rigorous logic and innovative theory, represents a significant contribution to the field of LDL. It not only addresses critical academic challenges but also provides new technical pathways for the engineering applications of multi-label learning.