Image Synthesis under Limited Data: A Survey and Taxonomy

Image Synthesis Under Limited Data: A Survey Research Background and Problem Statement In recent years, deep generative models have achieved unprecedented progress in intelligent creation tasks, especially in areas such as image and video generation, and audio synthesis. However, the success of these models relies heavily on large amounts of traini...

Self-Supervised Shutter Unrolling with Events

Event Camera-Based Self-Supervised Shutter Unrolling Method Research Background and Problem Statement In the field of computer vision, recovering undistorted global shutter (GS) videos from rolling shutter (RS) images has been a highly challenging problem. RS cameras, due to their row-by-row exposure mechanism, are prone to spatial distortions (e.g...

Dual-Space Video Person Re-Identification

Dual-Space Video Person Re-Identification Research Background Introduction Person Re-Identification (ReID) technology aims to identify specific individuals through images or video sequences captured by different cameras. In recent years, with the rapid development of deep learning technology, ReID has shown great application potential in areas such...

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

TryOn-Adapter——Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On Research Background and Problem Virtual try-on technology has gained widespread attention in recent years. Its core objective is to seamlessly adjust given clothing onto a specific person while avoiding distortion of the garment’s patterns and textur...

Contrastive Decoupled Representation Learning and Regularization for Speech-Preserving Facial Expression Manipulation

Contrastive Decoupled Representation Learning in Speech-Preserving Facial Expression Manipulation Background Introduction In recent years, with the rapid development of virtual reality, film and television production, and human-computer interaction technologies, facial expression manipulation has become a research hotspot in the fields of computer ...

Sample-Cohesive Pose-Aware Contrastive Facial Representation Learning

Enhancing Pose Awareness in Self-Supervised Facial Representation Learning Research Background and Problem Statement In the field of computer vision, facial representation learning is a crucial research task. By analyzing facial images, we can extract information such as identity, emotions, and poses, thereby supporting downstream tasks like facial...

A Mutual Supervision Framework for Referring Expression Segmentation and Generation

A Mutual Supervision Framework for Referring Expression Segmentation and Generation

A Mutual Supervision Framework for Referring Expression Segmentation and Generation Research Background and Problem Statement In recent years, vision-language interaction technology has made remarkable progress in the field of artificial intelligence. Among these advancements, referring expression segmentation (RES) and referring expression generat...

Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection

Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection

GL-MCM: Global and Local Maximum Concept Matching for Zero-Shot Out-of-Distribution Detection Research Background and Problem Statement In real-world applications, machine learning models often face changes in data distribution, such as the emergence of new categories. This phenomenon is known as “Out-of-Distribution Detection (OOD).” To ensure the...

Lidar-guided Geometric Pretraining for Vision-centric 3D Object Detection

Lidar-guided Geometric Pretraining for Vision-centric 3D Object Detection

Lidar-Guided Geometric Pretraining Enhances Performance of Vision-Centric 3D Object Detection Background Introduction In recent years, multi-camera 3D object detection has garnered significant attention in the field of autonomous driving. However, vision-based methods still face challenges in precisely extracting geometric information from RGB imag...