Dynamic Attention Vision-Language Transformer Network for Person Re-Identification

Dynamic Attention Vision-Language Transformer Network for Person Re-Identification Research Report In recent years, multimodal person re-identification (ReID) has gained increasing attention in the field of computer vision. Person ReID aims to identify specific individuals across different camera views, serving as a critical technology in security ...

Day2Dark: Pseudo-Supervised Activity Recognition Beyond Silent Daylight

Research Highlights: Low-Light Activity Recognition Based on Pseudo-Supervision and Adaptive Audio-Visual Fusion Academic Context This paper investigates the challenges of recognizing activities under low-light conditions. While existing activity recognition technologies perform well in well-lit environments, they often fail when dealing with low-l...

EfficientDeRain+: Learning Uncertainty-Aware Filtering via RainMix Augmentation for High-Efficiency Deraining

EfficientDeRain+: A High-Efficiency Image Deraining Method Enhanced by RainMix Augmentation Background Rain significantly affects the quality of images and videos captured by computer vision systems, with raindrops and streaks impairing clarity and degrading performance in tasks like pedestrian detection, object tracking, and semantic segmentation....

Adaptive Middle Modality Alignment Learning for Visible-Infrared Person Re-Identification

Adaptive Middle Modality Alignment Learning for Visible-Infrared Person Re-Identification

Research on Adaptive Middle-Modality Alignment Learning for Visible-Infrared Cross-Modality Learning Background and Problem Statement Driven by the needs of intelligent surveillance systems, visible-infrared person re-identification (VIReID) has gradually become a prominent research topic. This task aims to achieve around-the-clock person recogniti...

A Weakly Supervised Collaborative Procedure Alignment Framework for Procedural Video Analysis

Achieving Procedure-Aware Instructional Video Correlation Learning Under Weak Supervision: Summary and Evaluation In recent years, instructional videos have garnered significant attention due to their goal-driven characteristics and intrinsic connections to human learning processes. Compared to general videos, instructional videos contain multiple ...

One-Shot Generative Domain Adaptation in 3D GANs

One-shot Generative Domain Adaptation in 3D GANs In recent years, Generative Adversarial Networks (GANs) have achieved remarkable progress in the field of image generation. While traditional 2D generative models exhibit impressive performance across various tasks, extending this technology to 3D domains (3D-aware image generation) remains challengi...

Reliable Evaluation of Attribution Maps in CNNs: A Perturbation-Based Approach

Deep Learning Explainability Research: A Perturbation-Based Evaluation Method for Attribution Maps Background and Motivation With the remarkable success of deep learning models across various tasks, there is growing attention on the interpretability and transparency of these models. However, while these models excel in accuracy, their decision-maki...

Cross-Scale Co-Occurrence Local Binary Pattern for Image Classification

Research on Cross-Scale Co-Occurrence Local Binary Pattern (CS-COLBP) for Image Classification Image classification is a key area in computer vision, with feature extraction being its core research focus. The Local Binary Pattern (LBP), due to its efficiency and descriptive power, has been widely used in tasks such as texture classification and fac...

Warping the Residuals for Image Editing with StyleGAN

GAN Inversion and Image Editing New Method: Warping the Residuals for Image Editing with StyleGAN Background and Research Problem Generative Adversarial Networks (GANs) have made remarkable progress in the field of image generation, enabling the synthesis and editing of high-quality images. StyleGAN models, known for their semantically interpretabl...

Transformer for Object Re-Identification: A Survey

Background and Significance Object re-identification (Re-ID) is an essential task in computer vision aimed at identifying specific objects across different times and scenes. Driven by deep learning, particularly convolutional neural networks (CNNs), this field has made significant strides. However, the emergence of vision transformers has opened ne...