Spiking Diffusion Models
Brain-Inspired Low-Power Generative Model: A Review on Spiking Diffusion Models
Background Overview
In recent years, the artificial intelligence field has seen a surge in cutting-edge technologies, with deep generative models (DGMs) demonstrating exceptional capabilities in producing images, text, and other types of data. However, these generative models typically rely on artificial neural networks (ANNs) as their backbone. Their dependency on computational power and memory resources poses significant energy consumption challenges, hindering their scalability in large-scale applications. When compared to the human brain—operating efficiently at a power level of just 20 watts—the energy efficiency of ANNs falls far behind. This disparity has sparked interest in exploring neural architectures with higher energy efficiency.
Unlike ANNs, spiking neural networks (SNNs), inspired by the way human brain neurons operate, process information in an event-driven manner. SNNs boast advantages such as high energy efficiency, low latency, and strong biological plausibility. Core to SNNs is the representation of information using binary spikes (0 or 1) while employing spike-based accumulation (AC) instead of traditional high-energy multiply-accumulate (MAC) mechanisms, significantly reducing computational costs. However, this reduction often comes at the expense of representational power, particularly in generative tasks.
In their effort to balance high-quality data generation with energy efficiency, researchers from The Hong Kong University of Science and Technology (Guangzhou) and other institutions propose an innovative study: Spiking Diffusion Models (SDMs). The study aims to tackle the challenges of high energy consumption in existing generative models and the limited generative quality of SNNs. Titled Spiking Diffusion Models, the paper was published in IEEE Transactions on Artificial Intelligence (Vol. 6, No. 1, January 2025) and represents collaboration among scientists from The Hong Kong University of Science and Technology (Guangzhou), Renmin University of China, and North Carolina State University.
Research Process and Innovative Methods
The primary goal of this study is to combine diffusion models with SNNs to achieve high-quality, low-power generative tasks. Below, we provide a detailed overview of the research process.
1. Framework Design and Core Mechanism Innovations
The researchers designed a universal spiking diffusion model architecture compatible with various diffusion solvers (e.g., DDPM, DDIM, Analytic-DPM) and introduced the following two key mechanisms:
Temporal-Wise Spiking Mechanism (TSM):
In traditional SNNs, input at each time step is computed based on fixed synaptic weights, which deviates from real neural systems. Inspired by the dynamic characteristics of biological neurons, TSM was developed to allow dynamic updating of neuron membrane potentials across time steps, capturing more temporal dynamic features and significantly improving the quality of generated images.Threshold Guidance (TG):
For the first time, the authors proposed a threshold adjustment method that requires no additional training. By adjusting the spiking neuron’s firing threshold during the inference phase, either lowering the threshold (inhibitory guidance) or raising it (excitatory guidance), the quality of generated images significantly improves as measured by the Fréchet Inception Distance (FID).
2. Experimental Design and Algorithm Optimization
The experiments involve two main stages:
Stage One: Training with Standard Prespike Residual Blocks
The researchers developed a Prespike residual learning method to address information overflow issues present in traditional SNN residual structures. Unlike the residual network designs in ANNs, the Prespike structure ensures output values are accumulated as floating-point numbers within residual blocks, thereby avoiding biologically implausible overflow conditions.Stage Two: Fine-Tuning with TSM Mechanisms
Building on the pre-trained model, the Prespike residual blocks were replaced with TSM blocks, and temporal parameters were optimized to capture richer dynamic features. Notably, this fine-tuning stage required only a small number of iterations to achieve significant improvements.
Research Results and Analysis
Experimental Data and Results
The study evaluated SDMs using multiple benchmark datasets, including MNIST, Fashion-MNIST, CIFAR-10, Celeba, and LSUN Bedroom. By comparing the performance of SDMs with traditional ANNs and other SNN-based generative models, the key results are as follows:
- On the CIFAR-10 dataset, SDMs achieved an FID of 19.73 using only four time steps, approaching the ANN-based DDPM’s score of 19.04. When the number of time steps increased to eight, the FID further improved to 15.45, surpassing some ANN models.
- On the Fashion-MNIST dataset, SDMs consumed only about 30% of the energy required by ANNs, while delivering up to 11× better generative quality compared to similar SNN models.
- The introduction of the TSM module improved FIDs by an average of 18.4%, with only a negligible increase in model parameters (0.0002M), and without introducing significant additional energy consumption compared to existing SNN methods.
Method Comparisons and Extensibility
The researchers also compared direct training methods with ANN-to-SNN conversion methods for generation tasks. While ANN-to-SNN methods have proven effective in classification, they performed slightly less effectively in generative tasks. However, with additional fine-tuning (FT), the method achieved an FID improvement from 51.18 to 29.53.
Conclusion and Significance
Scientific Value of the Research
The introduction of Spiking Diffusion Models marks a breakthrough in using SNNs for generative tasks, achieving performance on par with traditional ANN models under low-energy conditions. This work not only provides innovative insights into algorithm design but also highlights the vast potential of SNNs in the generative domain.
Application Prospects
The high efficiency of SDMs lays a foundation for image generation and inference on low-power devices, such as neuromorphic hardware. Furthermore, their potential applications extend to text generation and audio generation. By integrating SDMs with large language models (e.g., GPT), more complex tasks like text-to-image generation could also be realized in the future.
Research Highlights
- The first implementation of the Temporal-Wise Spiking Mechanism (TSM) in SNN-based generative tasks.
- The proposal of a training-free Threshold Guidance (TG) strategy that significantly improves generative quality without requiring additional training.
- Comprehensive empirical analysis shows that SDMs consume only 37.5% of the energy required by traditional ANNs while achieving performance that can surpass some ANN models.
Outlook
Despite its significant achievements, there are still limitations in the current model, such as relatively small time steps and limited resolution adaptability. Future research should focus on high-resolution datasets (e.g., ImageNet) and explore multimodal generative tasks to achieve stronger generalizability and practicality. Additionally, SDMs are poised to play a crucial role in sustainable computing and energy-efficient AI applications.