A Multi-Scale Feature Fusion Network Focusing on Small Objects in UAV-View
Background Introduction
With the rapid development of unmanned aerial vehicle (UAV) technology, low-altitude remote sensing images captured by UAVs have been widely used in tasks such as disaster management, search and rescue. However, small object detection in UAV images remains a challenging problem. Due to the fact that small objects occupy only a few pixels in the image and are irregularly distributed, the performance of existing object detection algorithms in these scenarios is often unsatisfactory. In particular, although some existing detectors have introduced multi-scale feature fusion modules to improve detection accuracy, these traditional methods often overlook the weight relationship between the target and the background, resulting in the diminished importance of small objects in deep feature maps. Additionally, the widely used Intersection over Union (IoU) metric and its variants are particularly sensitive to the positional errors of small objects, which significantly affects the label assignment effectiveness of anchor-based detectors.
To address these issues, this paper proposes a novel detector named AFF-YOLO, which is based on the network architecture of YOLOv8 and specifically designed to enhance the detection capability of small objects in UAV images. Specifically, the paper introduces three key modules: the Attention Feature Fusion Module (AFFM), the Small Object Feature Layer (SOFL), and the Triangular Centroid-based IoU Loss (TriC-IoU Loss). These modules collectively improve the accuracy and robustness of small object detection.
Source of the Paper
This paper is co-authored by Jiantao Li, Chenbin Yu, Wenhui Wei, and others, who are affiliated with the Suzhou Institute of Nano-Tech and Nano-Bionics of the Chinese Academy of Sciences, the University of California San Diego, Duke Kunshan University, and other institutions. The paper was published on March 13, 2025, in the journal Cognitive Computation, under the title “A Multi-Scale Feature Fusion Network Focusing on Small Objects in UAV-View”.
Research Process and Results
1. Research Process
a) Attention Feature Fusion Module (AFFM)
The AFFM module aims to enhance the effectiveness of multi-scale feature fusion by introducing an attention mechanism. Specifically, the AFFM module first converts feature maps from different scales into feature maps with the same channel dimension through convolutional layers. Then, it calculates weights using an attention module, and finally generates the final feature map through weighted fusion. This process not only enhances the feature representation of small objects but also reduces the interference of background information.
b) Small Object Feature Layer (SOFL)
The SOFL module further enhances the semantic and geometric information of small objects by introducing an additional feature extraction layer. This module improves the detection capability of small objects by fusing feature maps from shallow and deep networks. Experiments show that the SOFL module significantly improves detection accuracy, especially in handling small objects.
c) Triangular Centroid-based IoU Loss (TriC-IoU Loss)
The TriC-IoU Loss improves the traditional IoU loss function by introducing the triangular centroid distance as a penalty term. Specifically, the TriC-IoU Loss not only considers the overlap between the predicted box and the target box but also introduces the triangular centroid distance and the right-angle side ratio, thereby better reflecting the positional and shape information of small objects. Experiments show that the TriC-IoU Loss performs exceptionally well in small object detection tasks, significantly improving detection accuracy.
2. Main Results
Experiments were conducted on two UAV image datasets: VisDrone2019 and UAVDT. The results show that the proposed AFF-YOLO achieved a 52.5% mAP50 on the VisDrone2019 dataset, which is 30.6% higher than existing YOLO-based detectors. Additionally, on the UAVDT dataset, AFF-YOLO also performed well, achieving a 34.2% mAP50, significantly outperforming other algorithms.
3. Conclusions and Value
By introducing the AFFM, SOFL, and TriC-IoU Loss modules, this paper significantly improves the accuracy and robustness of small object detection in UAV images. These modules not only enhance the feature representation of small objects but also optimize the bounding box regression loss function, enabling the model to perform exceptionally well in handling small objects. The research findings of this paper have broad application prospects in fields such as UAV image analysis, disaster management, and search and rescue.
Research Highlights
- Attention Feature Fusion Module (AFFM): By introducing an attention mechanism, it enhances the feature representation of small objects and reduces the interference of background information.
- Small Object Feature Layer (SOFL): By fusing shallow and deep features, it improves the detection capability of small objects.
- Triangular Centroid-based IoU Loss (TriC-IoU Loss): By introducing the triangular centroid distance and the right-angle side ratio, it improves the traditional IoU loss function, significantly enhancing the accuracy of small object detection.
Other Valuable Information
This paper also conducted ablation experiments to evaluate the impact of each module on detection accuracy. The results show that the AFFM and SOFL modules significantly improve detection accuracy, while the TriC-IoU Loss performs exceptionally well in small object detection tasks. Additionally, the paper compares other commonly used IoU loss functions, further validating the superiority of the TriC-IoU Loss.
Summary
This paper proposes a novel detector named AFF-YOLO, which significantly improves the accuracy and robustness of small object detection in UAV images by introducing the AFFM, SOFL, and TriC-IoU Loss modules. These modules not only enhance the feature representation of small objects but also optimize the bounding box regression loss function, enabling the model to perform exceptionally well in handling small objects. The research findings of this paper have broad application prospects in fields such as UAV image analysis, disaster management, and search and rescue.