Field-of-View IoU for Object Detection in 360° Images

Object Detection in 360° Images using FOV IoU

360° cameras have gained widespread use in various fields such as virtual reality, autonomous driving, and security monitoring in recent years. With the increase in 360° image data, the demand for 360° image recognition tasks, especially object detection, is continuously growing. Due to the shortcomings of traditional methods in processing 360° images, researchers Miao Cao, Satoshi Ikehata, and Kiyoharu Aizawa have proposed two foundational techniques: Field-of-View Intersection over Union (referred to as FOV-IoU) and 360-degree augmentation (360augmentation) to improve object detection in 360° images.

Background and Research Motivation

Most modern object detection neural networks are primarily designed for perspective images. However, when applied to EquiRectangular Projection (ERP) format 360° images, the detection performance significantly degrades due to image distortion. Traditional methods include projecting 360° information onto multiple perspective images or directly using perspective object detectors on ERP images. The former faces difficulties in detecting objects along the boundaries and high computational costs, while the latter suffers performance degradation due to severe distortion in ERP images and improper overlap area calculation (IoU). Consequently, researchers have proposed object detection models based on Spherical Convolution (SphConv). Yet, experimental results indicate that when integrated into state-of-the-art perspective object detectors, these models still perform poorly.

Another key issue is the improper calculation of IoU in 360° images. Traditional rectangular boxes on 2D image coordinates cannot effectively constrain objects on a sphere, especially at high latitudes. Thus, Field-of-View Bounding Box (FOV-BB) has been gradually adopted, but its area calculation is very complex. To address these issues, this study proposes the FOV-IoU calculation method and 360augmentation data augmentation techniques. Multiple experiments on the 360-indoor dataset validate their effectiveness and superiority.

Authors and Publication Source

This paper is a collaborative work by Miao Cao, Satoshi Ikehata, and Kiyoharu Aizawa from Tokyo Metropolitan University and the National Institute of Informatics in Japan. The related research has been published in the IEEE Transactions on Image Processing journal (August 2023).

Research Workflow

1. Introduction of FOV-IoU

The study first introduces the basic concept of Field-of-View Bounding Box (FOV-BB) and its application in extreme images. The traditional IoU calculation method performs poorly in handling 360° images, notably in high-latitude regions. Therefore, FOV-IoU adopts a new calculation method to more accurately approximate the IoU between two FOV-BBs.

Specifically, the study proposes the Field-of-View Distance, a measure to prevent calculation errors. It uses spherical formulas and great-circle distance (the shortest distance between two points on the sphere) to calculate the intersection area and thereby obtains an accurate IoU value. Compared to traditional Spherical IoU (sph-iou) methods, FOV-IoU can more effectively handle object detection in high-latitude areas, greatly improving accuracy and calculation efficiency.

2. 360augmentation Data Augmentation Technique

Due to the uniqueness of 360° images, traditional geometric transformations (such as rotation and translation) are not suitable. The study proposes the 360augmentation technique, which includes vertical rotation and horizontal translation strategies to increase the diversity of training data while maintaining the spherical coordinate mapping of the ERP images.

Specifically, 360augmentation simulates the process of humans rotating their heads to view different directions when using VR devices by randomly selecting angles for image and bounding box conversion in the horizontal and vertical directions. This processing allows the training data to better retain the characteristics of 360° images and enhances the accuracy of object detection in high-latitude regions.

Major Experimental Results and Analysis

1. Field-of-View IoU vs. Spherical IoU

The study verifies the accuracy and efficiency of the FOV-IoU calculation method through multiple experiments. The results show that FOV-IoU is not only more accurate across different latitudes but also comparable to or even better than sph-iou in computational efficiency. Additionally, object detection models integrated with FOV-IoU can better filter redundant predictions during the non-maximum suppression (NMS) stage, improving the reliability of prediction results.

2. FOV-GIoU Loss Function

The study integrates FOV-IoU into the Generalized IoU (GIoU) loss function, proposing the FOV-GIoU loss for training object detection models. Experimental results indicate that models adopting FOV-GIoU loss significantly improve detection accuracy in high-latitude regions, performing better than traditional sph-giou loss.

3. Effectiveness of 360augmentation

The study combines the 360augmentation technique with FOV-GIoU loss to train various state-of-the-art object detectors, such as Faster R-CNN and YOLOv3. Experimental results demonstrate that 360augmentation significantly enhances the diversity of training data and detection accuracy, especially in high-latitude regions.

4. Comparison with Other 360° Object Detection Methods

The study also compares the proposed method with other object detection architectures designed for 360° images, such as s2cnn and spherenet. The results show that perspective object detectors using FOV-IoU and 360augmentation significantly outperform other methods in overall accuracy, with a pronounced detection capability in high-latitude regions.

Conclusion and Research Value

The FOV-IoU calculation method and 360augmentation data enhancement technique proposed in this study provide new ideas and technical support for object detection in 360° images. These methods not only significantly improve detection precision and computational efficiency but also possess strong generality, easily integrating with existing perspective object detectors. Although there remain certain limitations in handling severely distorted objects in high-latitude regions, as pioneering work addressing the challenges of 360° image detection, their scientific value and application prospects are considerable.