One-Shot Generative Domain Adaptation in 3D GANs

One-shot Generative Domain Adaptation in 3D GANs

In recent years, Generative Adversarial Networks (GANs) have achieved remarkable progress in the field of image generation. While traditional 2D generative models exhibit impressive performance across various tasks, extending this technology to 3D domains (3D-aware image generation) remains challenging. This extension requires the simultaneous generation of 2D images and learning of 3D structures. This report reviews a study titled “One-shot Generative Domain Adaptation in 3D GANs”, published in the International Journal of Computer Vision by researchers Ziqiang Li, Yi Wu, Chaoyue Wang, et al., from institutions including Nanjing University of Information Science and Technology, the University of Sydney, and the University of Science and Technology of China.


Background and Problem Statement

3D image generation requires extensive training data to ensure stable performance and reduce the risk of overfitting. However, in many practical scenarios, acquiring sufficient training data is often infeasible. For example, specific styles of images, such as sketches or ukiyo-e art, are scarce, making large-scale data training difficult. Thus, developing techniques capable of adapting 3D generators to new domains using minimal data (as little as a single reference image) is crucial.

The authors introduce a novel task, One-shot 3D Generative Domain Adaptation (GDA), which transfers a pre-trained 3D generator to a new domain using only one reference image. This task presents challenges, including high fidelity, diversity, cross-domain consistency, and multi-view consistency. To address these, the authors propose a method called 3D-Adapter, achieving significant advancements in one-shot 3D GDA.


Study Context and Method Overview

The study, published in 2024, builds on the popular 3D generative network EG3D. By leveraging techniques such as selective weight fine-tuning, advanced loss functions, and progressive training strategies, the authors successfully adapt pre-trained models to new domains. The code is publicly available at GitHub.


Detailed Methodology

1. Workflow

The proposed 3D-Adapter method comprises three core components:

  1. Selective Weight Fine-tuning
    Through detailed ablation studies, the authors identify the key components in the pre-trained generator critical for adaptation. They find that fine-tuning the entire model leads to significant performance degradation. Instead, selectively tuning specific modules, such as the tri-plane decoder (Tri-D) and style-based super-resolution module (G2), enhances stability and mitigates training challenges.

  2. Advanced Loss Functions
    To achieve high fidelity, diversity, cross-domain consistency, and multi-view consistency, the study introduces four loss functions:

    • Domain Direction Regularization (DDR): Utilizes the pre-trained CLIP model to ensure the generator learns target domain features while maintaining diversity.
    • Target Distribution Learning (TDL): Optimizes relaxed earth mover’s distance (REMD) to capture domain-specific characteristics from the reference image.
    • Image-level Source Structure Maintenance (ISSM): Preserves domain-independent attributes (e.g., pose, identity) between the adapted and source images.
    • Feature-level Source Structure Maintenance (FSSM): Maintains consistency in 3D feature space.
  3. Progressive Fine-tuning Strategy
    To address potential overfitting or underfitting issues in direct model tuning, the authors propose a two-step progressive training strategy:

    • Step 1: Fine-tune the tri-plane decoder with DDR, TDL, and structure maintenance losses.
    • Step 2: Fine-tune the super-resolution module for further refinement.

2. Experiments and Data Analysis

Datasets

The study utilizes multiple target domain datasets, including sketches, ukiyo-e, and cartoons, with the FFHQ dataset serving as the source domain.

Quantitative and Qualitative Analysis

  • Quantitative Metrics: Metrics such as FID and KID are used to evaluate synthesis quality, while identity similarity (ID) and depth difference measure cross-domain and geometric consistency.
  • Qualitative Results: Comparative experiments show that 3D-Adapter outperforms existing methods (e.g., DIFA, DORM) in fidelity, diversity, and consistency. It successfully learns target domain textures while preserving source domain geometry and identity consistency.

User Study

Participants compared reference, source, and generated images based on quality, style similarity, and attribute consistency. Results showed a strong preference for 3D-Adapter across all criteria.


Findings and Implications

Key Contributions: 1. First Exploration of One-shot 3D GDA: This study fills a research gap by introducing the one-shot 3D GDA task. 2. Progressive Training Strategy: The proposed approach mitigates overfitting and underfitting issues during adaptation. 3. Advanced Loss Functions: These enable the adapted generator to achieve the desired adaptation properties with just one reference image. 4. Superior Performance: Quantitative and qualitative results demonstrate significant improvements over existing methods.

Broader Impacts: 3D-Adapter opens new avenues for cross-domain adaptation in 3D generation. Its ability to adapt with minimal data provides a valuable tool for fields such as virtual reality, film production, and digital human modeling. Future work includes improving cross-domain consistency and exploring multi-domain integration.


Highlights of the Study

  1. Innovative Methodology: Introduced a progressive training strategy tailored to 3D generation.
  2. Efficiency: Achieved domain adaptation using only a single reference image.
  3. Practical Application: Applicable to high-fidelity, diverse 3D target domain image generation.
  4. Extensibility: Supports zero-shot domain adaptation and latent space editing.

This research not only provides a solid foundation for small-sample 3D generation but also offers advanced solutions for real-world applications.